Posts tagged with UTF-8

On CSV

July 25th, 2011

CSV, or comma-separated values, is a non-standard which due to its apparent simplicity people think is a good data interchange format. It’s not. In this blog post I will attempt to tell you more than you ever wanted to know about CSV.

tl;dr: CSV is a a HORRIBLE ‘format’ and should NEVER be used. You will regret it. If you need a simple data exchange format that is easy to for lay people to read and write and widely supported by programming languages, use JSON (my first choice) or something like YAML.
Read more »

Displaying Unicode Characters in the Scala REPL on OS X

October 25th, 2010

So simple but unforuntately I missed it until now:

scala -Dfile.encoding="UTF-8"

Now you can go to town with characters like ø and ∫! Oh, and you may want to alias this command in your ~/.profile:

alias scala='scala -Dfile.encoding="UTF-8"'

HTML and XML Character Encoding Gotchas in Javascript

December 8th, 2009

Recently I was trying to execute the following Javascript with jQuery: $("#someid").append("<div>...&deg;C</div>");

I was going crazy because it worked (a degrees symbol – ° – was shown) on one page but not another, where nothing was displayed or returned by the append method. After much frustration I stumbled on a solution and I’m sharing it here to hopefully save others some time.

I was stumped but luckily I ended up reading the Wikipedia page on character encoding in HTML and learned that XML has a much smaller set of character entity references. In fact, there are only five: &amp; → &, &lt; → <, &gt; → >, &quot; → “, &apos; → ‘. Makes sense, since you should be using UTF-8.

As it turns out, my working example was an HTML page while the non-working one was XHTML. Because of the XHTML content-type declaration the parser (I’m not sure whether in jQuery or my browser) was choking on the invalid character entity reference and failing completely. So, problem solved, though I wish the single offending entity was dropped, not the whole string!