in Uncategorized

How to remove font tags by using regular expressions

I found article from which explains how you can remove font tag by using reg.exp. This is especially good for web designers who fight old HTML code and needs to convert large content pages to XHTML.
Also this can be a nice little feature for WYSIWYG editors, bunch this piece into your c# or php script when users saves document and get rid of extra tags.

Text below respectfully stoled from evolt

Remove FONT tags from your web pages:

<(FONT|font)([ ]([a-zA-Z]+)=("|')[^"']+("|'))*[^>]+>([^<]+)(|) and replace with the backreference 6 – This expression looks quite complicated, but I wanted to show an example with some more involved logic. A simpler example that finds the same string will follow this explanation. <(FONT|font) accounts for an upper or lower case tag. ([ ]([a-zA-Z]+)= matches a space followed by any attribute name and an =. The next subexpression, ("|')[^"']+("|'), finds the leading double or single quotes on the attribute(s), then any attribute value that's not a double or single quote, i.e. Arial, +5, #c3d4ff, etc., then the closing double or single quote. Notice that the subexpression for the entire attribute is enclosed in parentheses and followed by an asterisk - ([ ]([a-zA-Z]+)=("|')[^"']+("|'))*. This allows you to find a tag with either no attributes or any number that may exist. [^>]+> then matches anything up to the first > (similar to the “greediness” example above). The backreference is defined next as ([^<]+), which will capture any text between the opening and closing font tags, and is referred to as 6 because it's the sixth parenthetical group in the entire expression. Then (|) accounts for the closing font tag in either case.

<(FONT|font)[^>]*>[^<]*(|) is a simpler example that accomplishes the same thing as the expression explained above. The difference is that it is much less picky about what is between the font tags, so if you have inconsistent tag syntax, it will probably capture the various instances you may have. On the other hand, if you have any extra junk characters in your search data, you may catch things you didn’t intend, which is why you should test your expressions ahead of time.