Not cleaning up all Word specific code

Last post 06-20-2006, 7:41 AM by PaulNewRiver. 0 replies.
Sort Posts: Previous Next
  •  06-20-2006, 7:41 AM 20328

    Not cleaning up all Word specific code

    When I copy the following paragraph from Word to Editor, it carries a lot of Word-specific code. We have a strict limit in the number of characters that can go into the database field and so I use the "Remove all Word
    specific markup" button in Clean Up HTML button and a lot of it goes away -- except what highlighted in red in the HTML below. About half of the characters in that paragraph are still Word code -- so it could be a lot fewer characters if ALL Word code were removed.
     
    Reoccurring coding like that from Word should also be removed by the "Remove all Word
    specific markup" .  Yes, it is enclosed in angle brackets so it might look like HTML tags, but inside the brackets the words are exact -- st1:personname and st1:place etc -- and so you could easily recognize this as coming from Word and remove these strings too. 
     
    Text pasted from Word:
     
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vivamus pulvinar. Vestibulum varius elit et dui. Ut sit amet nulla. Quisque diam est, feugiat sit amet, egestas nec, fringilla porta, dui. Nam turpis.
     
    Cleaned with "Remove all Word  Specific markup" button and Word codes in red are still in HTML:
     
    <p>&nbsp;Lorem ipsum dolor sit a<st1:personname w:st="on">me</st1:personname>t, consectetuer adipiscing elit. Vivamus pulvinar. Vestibulum varius elit et dui. Ut sit a<st1:personname w:st="on">me</st1:personname>t nulla. Quisque diam est, feugiat sit a<st1:personname w:st="on">me</st1:personname>t, egestas nec, fringilla porta, dui. <st1:country-region w:st="on"><st1:place w:st="on">Nam</st1:place></st1:country-region> turpis.</p>
View as RSS news feed in XML