I had to add a character limit to an XHtmlString property in Episerver on a publish event the other day, and knew I had get rid of the markup to get an accurate count. I was pleased to find that the good old TextIndexer was still in there to clean the HTML for me.
Here is a short reminder on how to use it:
// Markup with encoding as a string instead of fragments.
string htmlText = myPage.HtmlTextProperty.ToString();
// Encoded text with markup removed.
string plainText = EPiServer.Core.Html.TextIndexer .StripHtml(htmlText, maxTextLengthToReturn: htmlText.Length);
// Decoded and readable text.
string decodedText = System.Web.HttpUtility .HtmlDecode(plainText);