Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Saturday, October 29, 2016

HttpUtility.HtmlEncode doesn't encode everything

HttpUtility.HtmlEncode doesn't encode everything


I am interacting with a web server using a desktop client program in C# and .Net 3.5. I am using Fiddler to see what traffic the web browser sends, and emulate that. Sadly this server is old, and is a bit confused about the notions of charsets and utf-8. Mostly it uses Latin-1.

When I enter data into the Web browser containing "special" chars, like "? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?" fiddler show me that they are being transmitted as follows from browser to server: "? ? ? ? ? ? ? ? ? ? ? ? "

But for my client, HttpUtility.HtmlEncode does not convert these characters, it leaves them as is. What do I need to call to convert "?" to ? and so on?

Answer by bdukes for HttpUtility.HtmlEncode doesn't encode everything


Rich Strahl just posted a blog post, Html and Uri String Encoding without System.Web, where he has some custom code that encodes the upper range of characters, too.

///   /// HTML-encodes a string and returns the encoded string.  ///   /// The text string to encode.   /// The HTML-encoded text.  public static string HtmlEncode(string text)  {      if (text == null)          return null;        StringBuilder sb = new StringBuilder(text.Length);        int len = text.Length;      for (int i = 0; i < len; i++)      {          switch (text[i])          {                case '<':                  sb.Append("<");                  break;              case '>':                  sb.Append(">");                  break;              case '"':                  sb.Append(""");                  break;              case '&':                  sb.Append("&");                  break;              default:                  if (text[i] > 159)                  {                      // decimal numeric entity                      sb.Append("&#");                      sb.Append(((int)text[i]).ToString(CultureInfo.InvariantCulture));                      sb.Append(";");                  }                  else                      sb.Append(text[i]);                  break;          }      }      return sb.ToString();  }  

Answer by AnthonyWJones for HttpUtility.HtmlEncode doesn't encode everything


The return value type of HtmlEncode is a string, which is of Unicode and hence has not need to encode these characters.

If the encoding of your output stream is not compatible with these characters then use HtmlEncode like this:-

 HttpUtility.HtmlEncode(outgoingString, Response.Output);  

HtmlEncode with then escape the characters appropriately.

Answer by Rick for HttpUtility.HtmlEncode doesn't encode everything


It seems horribly inefficient, but the only way I can think to do that is to look through each character:

public static string MyHtmlEncode(string value)  {     // call the normal HtmlEncode first     char[] chars = HttpUtility.HtmlEncode(value).ToCharArray();     StringBuilder
encodedValue = new StringBuilder(); foreach(char c in chars) { if ((int)c > 127) // above normal ASCII encodedValue.Append("&#" + (int)c + ";"); else encodedValue.Append(c); } return encodedValue.ToString(); }

Answer by Matt for HttpUtility.HtmlEncode doesn't encode everything


It seems like HtmlEncode is just for encoding strings that are put into HTML documents, where only / < > & etc. cause problems. For URL's, just replace HtmlEncode with UrlEncode.

Answer by Joel Fillmore for HttpUtility.HtmlEncode doesn't encode everything


The AntiXSS library from Microsoft correctly encodes these characters.

AntiXSS on Codeplex

Nuget package (best way to add as a reference)

Answer by Oliver Bock for HttpUtility.HtmlEncode doesn't encode everything


@bdukes response above will do the job, but we can make it much faster if we assume that most characters will not be in this range. Note the quoted '?' (unicode 0x0100)

/// .Net 2.0's HttpUtility.HtmlEncode will not properly encode  /// Unicode characters above 0xFF.  This may be fixed in newer   /// versions.  public static string HtmlEncode(string s)  {      // Let .Net 2.0 get right what it gets right.      s = HttpUtility.HtmlEncode(s);        // Search for first non-ASCII.  Hopefully none and we can just       // return s.      int num = IndexOfHighChar(s, 0);      if (num == -1)          return s;      int old_num = 0;      StringBuilder sb = new StringBuilder();      do {          sb.Append(s, old_num, num - old_num);          sb.Append("&#");          sb.Append(((int)s[num]).ToString(NumberFormatInfo.InvariantInfo));          sb.Append(';');          old_num = num + 1;          num = IndexOfHighChar(s, old_num);      } while (num != -1);      sb.Append(s, old_num, s.Length - old_num);      return sb.ToString();  }    static unsafe int IndexOfHighChar(string s, int start)  {      int num = s.Length - start;      fixed (char* str = s) {          char* chPtr = str + start;          while (num > 0) {              char ch = chPtr[0];              if (ch >= '?')                  return s.Length - num;              chPtr++;              num--;          }      }      return -1;  }  

Answer by Devdude for HttpUtility.HtmlEncode doesn't encode everything


You can always replace the unwanted ASCII as follows: When this is encoded without the if statement the result string is "This means I am crying :'&'#39;) For whatever reason 'special characters' are handled and replaced with HTML char.

string text = "This means I am crying :'(";    string encoded = HttpUtility.HtmlEncode(text);  if(encoded.Contains("'"))  {      encoded = encoded.Replace("'", "'");  }  


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

Related Posts:

0 comments:

Post a Comment

Popular Posts

Fun Page

Powered by Blogger.