International UTF-8 Characters in Windows Phone 7 WebBrowser Control

I haven’t blogged for a while not because I haven’t had anything to say but because I felt I need time to triage all cool stuff I’ve been learning about Windows Phone 7 Silverlight development. However, one thing that I’ve learned cannot wait. That is support for international characters in the WebBrowser control.

Basically, the problem is as follows: We want to show HTML that uses international characters. The most straightforward way to show HTML in a Windows Phone 7 app is to use the WebBrowser control and the "NavigateToString(string myString)” method to input the HTML.

However, when we hook international text (like Japanese, Arabic, Korean, Russian or Chinese characters) using this method, we get a mess. The following code:

string testString = "<html><body>日本列島の占領の最初の兆候が縄文時代で約14,000のBC、竪穴住居の中石器時代新石器時代
に半定住狩猟採集文化と農業の初歩的なフォームから続いて、30,000年頃旧石器文化と登場しました。</body></html>"
; BrowserControl.NavigateToString(testString);
produces the following result:

image

In case you’re not familiar with Japanese, this is not Japanese. This is, instead, the ASCII version of the Japanese characters we want to see. Why does it do this? I’m not sure. But my effort to show the actual international text in the WebBrowser control was met with tears time and time again.

Until I found this post unhelpfully titled “Windows Phone 7 Character Testing…”. Here the author gives us this extremely helpful method for delivering the string we need to show international characters:

private static string ConvertExtendedASCII(string HTML)
{
    string retVal = "";
    char[] s = HTML.ToCharArray();

    foreach (char c in s)
    {
        if (Convert.ToInt32(c) > 127)
            retVal += "&#" + Convert.ToInt32(c) + ";";
        else
            retVal += c;
    }

    return retVal;
}


With this in place, we can very simply run our string through the method to give us properly encoded HTML so that

BrowserControl.NavigateToString(ConvertExtendedASCII(testString));

gives us:

image

And we’re happy. Very happy.

18 thoughts on “International UTF-8 Characters in Windows Phone 7 WebBrowser Control

  1. This is a process in which the hip of the patients is replaced
    with prosthesis. Exercise techniques and stretches to
    increase flexibility and ROM. How do you measure the amount of inward
    curve or kissing knees a person has.

  2. retVal += c; instantiates new string every iteration that is significantly slows dows conversion.

    Following function works thousand times faster:

    private static string FastConvertExtendedASCII(string HTML)
    {
    char[] s = HTML.ToCharArray();

    // Getting number of characters to be converted
    // and calculate extra space
    int n = 0;
    int value;
    foreach (char c in s)
    {
    if ((value = Convert.ToInt32(c)) > 127)
    {
    if (value > 9999)
    n += 7;
    else if (value > 999)
    n += 6;
    else
    n += 5;
    }
    }

    // To avoid new string instantiating
    // allocate memory buffer for final string
    char[] res = new char[HTML.Length + n];

    // Conversion
    int i = 0;
    int div;
    const int zero = (int)’0′;
    foreach (char c in s)
    {
    if ((value = Convert.ToInt32(c)) > 127)
    {
    res[i++] = ‘&’;
    res[i++] = ‘#’;

    if (value > 9999)
    div = 10000;
    else if (value > 999)
    div = 1000;
    else
    div = 100;

    while (div > 0)
    {
    res[i++] = (char)(zero + value / div);
    value %= div;
    div /= 10;
    }

    res[i++] = ‘;’;
    }
    else
    {
    res[i] = c;
    i++;
    }
    }

    return new string(res);
    }

  3. Pingback: UGG 5815
  4. I found another way to reslove the problem;

    StreamReader reader = new StreamReader(TitleContainer.OpenStream(“731999031.htm”), Encoding.GetEncoding(“unicode”));

    I works very well!

  5. It works!
    But it will take a very long time to work with the ConvertExtendedASCII, Any idea to reslove it.

  6. OOPS – the HTML got eaten up – what I said after “and tags?” was this:

    <head><meta content=”text/html; charset=utf-16”/>&lt/head>

  7. Very helpful post! I’ve tuned the code sample:

    private static string ConvertExtendedAscii(string html)
    {
    StringBuilder sb = new StringBuilder();

    foreach (var c in html)
    {
    int charInt = Convert.ToInt32(c);
    if (charInt > 127)
    sb.AppendFormat(“&#{0};”, charInt);
    else
    sb.Append(c);
    }

    return sb.ToString();
    }

Comments are closed.