Regex to check broken anchor tag.

By : FiveMinute  |  Updated On : 23 Mar, 2021

Regex to check broken anchor tag.

Below are some scenarios to check wrong anchor tag present and want to remove it from the content.

Content : 

<p>Some dummy text</p> <a href=" <p> this is dummy text <a href ="www.fiveminute.in" </p> 

<div>Dummy text present <a href="www.fiveminute.in">fiveminute</a> </div>

<div> this is dummy text <a href ="www.fiveminute.in" </div> 

 

Below regex will remove the broken anchor tag from the above content.

Regex : /<a[^><]+((?=<)|(?!.*>))/gm

const regex = /<a[^><]+((?=<)|(?!.*>))/gm;
const str = '<p>Some dummy text</p> <a href=" <p> this is dummy text <a href ="www.fiveminute.in" </p> 

<div>Dummy text present <a href="www.fiveminute.in">fiveminute</a> </div>

<div> this is dummy text <a href ="www.fiveminute.in" </div> ';
const subst = ``;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Output : 

<p>Some dummy text</p> <p> this is dummy text </p> 

<div>Dummy text present <a href="www.fiveminute.in">fiveminute</a> </div>

<div> this is dummy text </div> 

Explanation : 

Regex :  /<a[^><]+((?=<)|(?!.*>))/gm

<a[^><]+ : It start from the anchor tag goes till not found these character ><

((?=<)|(?!.*>)) : Here is 1 positive lookahead and 1 negative lookahead. (?=<) Posititive lookahead is for stop checking if another tag start and (?!.*>) Negative lookahead is for do not match anchor tag if it closes with this character >