Wednesday, May 14, 2014

publishing page content returns unclosed html tags and unicode characters sharepoint 2013

I was writing a console application in SharePoint 2013 to fetch the content of PublishingPageContent in wiki page.

I had converted the page from Word document to Web page using SharePoint 2013 OOTB "Convert To" option in Wiki Site.

However , When i fetch the content of PublishingPageContent it returns unicode characters and unclosed html tags in SharePoint 2013 which is unexpected. 

The same is working perfectly in SharePoint 2010. Not sure why this change happened in SharePoint 2013. 

This is strange in SharePoint 2013 wiki site.


Sample HTML with unclosed html tags and unicode characters

 Resolution:

After a month of research found and answer with HTMLSANITIZER. Refer the below link

This helped in fixing the above issue. I hope this information helps someone else. :)