Hi Adam,
Thanks for the reply.....
Here I am using a regular expression given by you for cleaning inline styles and css classes. It works great.
private string CustomCleanHTML(string
html)
{
return Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase);
}
But when this applies to following type of tags ( wierd but donno my users are putting this kind of dirty data)
<p class="MsoNormal" style="margin: 0in 0in 0pt; align: " center?="">
This is observed in tables pasted from ms word.
Now the above regular expression removes class and style but residues the following
<p center?=""><strong><span>Gr. Level</span> </strong></p>
Now this center?="" is causing the problem for us, becoz we should generate PDF out of this, and that is not able to understand.
Can you give me another regular expression which will clean up all the attributes of <p> tag ? only for p tags.
Final output I want is
<p><strong><span>Gr. Level</span> </strong></p>
Thanks in advance,
I appreciate all your support to us.
Rama