Regex: Find all html tags that contain some other particular html tags
Andrew Henderson
I have some html tags starting with <p> and end with </p> The first tag has in it some other tags such as </li> </ul> </div> and spaces and \n as you can see.
<p></g></svg> </a> </li> </ul> </div> </p> <p>Foarte frumos lucru</p>
<p>I love cars</p>I want to find and delete all html tags such as the first one that contain </li> </ul> </div>
The Output should be:
<p>Foarte frumos lucru</p>
<p>I love cars</p>My solution is not good:
FIND: (?=<p>)[\s\S]*?</li></div>|</ul>[\s\S]*?</p>
REPLACE BY: LEAVE EMPTY
3 Answers
This will remove all <p> tags that contain only empty tags or spaces:
- Ctrl+H
- Find what:
<p>(?:<.+?>|\s)+?</p>\R* - Replace with:
LEAVE EMPTY - CHECK Wrap around
- CHECK Regular expression
- Replace all
Explanation:
<p> # start tag (?: # non capture group <.+?> # any tag | # OR \s # any kind of space )+? # end group, must appear 1 or more times, not greedy
</p> # end tag
\R* # 0 or more any kind of linebreakScreenshot (before):
Screenshot (after):
8Use the following:
- Ctrl+H
- Find what:
^(?!.*(</p>)).*|\s+</p> - Replace with:
LEAVE EMPTY - CHECK Match case
- CHECK Wrap around
- CHECK Regular expression
- UNCHECK
. matches newline - Replace all
Another solution more complete that can deal with other tags inside the<p> ... </p>.
<p>[^<>]*</p>\R?(*SKIP)(*F)|.You can try it here with explanation.
Be aware that all your file will be deleted! Except <p> ... </p>