When some browsers include JavaScript using <script> tags will assume the file is encode using the character set of the requesting document rather than believe the HTTP headers of the JavaScript file.
| e1 (set in a local script tag served as ISO-8859-1): | |
| e2 (set from a JS script, served as utf-8): | |
| e3 (set from a JS script, served as latin-1): | |
| e4 (set from an XMLHTTPRequest served as utf-8): | -loading- |
| e5 (set from an XMLHTTPRequest served as latin-1): | -loading- |
serve this page as utf-8 / serve this page as latin-1
All of the letters above should be an e-acute. The first is defined in the local HTML page using embedded javascript, the second and third are set in externally-referenced JavaScript files and served in 2 different character sets. I'd expect the locally set e-acute to be interpreted from the bytes in the file using the character set that the page is served as, and the two externally referenced values should be interpreted using the character sets that their respective JavaScript files are served with. The XMLHTTPRequest-served files should also be interpreted according to their respective character sets.
| browser | results |
|---|---|
| firefox 1.0.6 / mac | works perfectly. |
| camino / mac | works perfectly. |
| firefox / win | (thanks to gareth) works perfectly. |
| safari 1 and 2 / mac | (thanks to blech for safari 1) Loses - interprets externally-referenced JavaScript files using the character set of the referencing file. If this page is served as utf-8, you'll see a bad character for e3, and if it is served as latin-1, you'll see a bad character for e2. Weirdly, I find that both XMLHTTPRequest entries work fine. |
| IE / mac | (thanks to blech) Handles e1-3 fine in both latin-1 and utf-8, so it's doing the Right Thing as well. Doesn't do e4/5, but that doesn't surprise me much. |
| IE 6 / win | (thanks to gareth) Also loses - interprets external files in the character set of the requesting document. |
| Opera 9 / win | (thanks to Már Örlygsson) Works perfectly. |
| IE7 / win | (thanks to Már Örlygsson) Fails test e2 (and presumably e3, I lack sufficient data). |
It's been pointed out to me that if the <script> tag contains a charset attribute specifying the charset of the remote JS file, everything works fine. This isn't really the point for me, though. I envisage hitting this bug when using the Yahoo of Delicious JSON APIs, which you call by inserting a script tag into your DOM that calls a local callback function once loaded. I shouldn't have to know what charset this stuff is being served as - if I did, it would have to be part of the published API, which is just ugly. The transport layer is supposed to deal with this stuff.
any comments? More test results? Please email them to me. Want the source to this page? (it's nasty) Get it here.