jerakeen.org

by Tom Insam

notes☴

code☷

links☲

photos☵

8.9. unicodedata — Unicode Database — Python v2.6.2 documentation

8.9. unicodedata — Unicode Database — Python v2.6.2 documentation

created 03 June 2009 in links tagged python and unicode.

Two python unicode strings in different normalization formats aren’t considered identical by the Python interpreter (pre 3000, anyway. Don’t know about post). This is annoying. At least there’s a core language module for converting strings between normalized forms.

http://docs.python.org/library/unicodedata.html

Twitter

New twitter: @simonw Something with unicode in it. Breaks more things..

created 11 May 2009 in stream tagged unicode.

New twitter: @simonw Something with unicode in it. Breaks more things..

http://twitter.com/jerakeen/statuses/1764082950

Understanding Bidirectional (BIDI) Text in Unicode

Understanding Bidirectional (BIDI) Text in Unicode

created 03 March 2009 in links tagged cal, scary and unicode.

I have long since given up on thinking that I understand everything important about unicode, so when something like this arrives I’m no longer forced to forget everything and start again. But even so, this is complicated stuff. Thanks, cal.

http://www.iamcal.com/understanding-bidirectional-text/

MIKROSKOP

MIKROSKOP

created 09 January 2009 in photos tagged london, unicode and unitedkingdom and is geotagged

Character sets, eh? Turn out to be hard.

http://flickr.com/photos/jerakeen/3181965609

Unicode HOWTO — Python v3.0 documentation

Unicode HOWTO — Python v3.0 documentation

created 05 December 2008 in links tagged docs, python and unicode.

I’m very happy with the way that Python 3 deals with the whole string/unicode/bytes mess.

http://docs.python.org/3.0/howto/unicode.html

Issue 157 - googleappengine - Google Code

Issue 157 - googleappengine - Google Code

created 04 May 2008 in links tagged appengine, google and unicode.

the google appengine uploader isn’t unicode aware. Grrr. Easily fixable, but still an irritating oversight.

http://code.google.com/p/googleappengine/issues/detail?id...

Main Page - DejaVuWiki

Main Page - DejaVuWiki

created 01 May 2008 in links tagged fonts, free and unicode.

Free fonts with pretty good Unicode coverage. Useful.

http://dejavu.sourceforge.net/wiki/index.php/Main_Page

UnicodeChecker 1.12 (Quarter Life Crisis)

UnicodeChecker 1.12 (Quarter Life Crisis)

created 22 February 2007 in links tagged macos and unicode.

This utility is great and I could not live without it.

http://earthlingsoft.net/ssp/blog/2007/02/unicodechecker_112

Deskbar plugins - CatmurWiki

Deskbar plugins - CatmurWiki

created 06 September 2006 in links tagged deskbar, linux and unicode.

Search for unicode characters with deskbar

http://wiki.catmur.co.uk/Deskbar_plugins#Unicode_plugin

encoding::warnings - Warn on implicit encoding conversions - search.cpan.org

encoding::warnings - Warn on implicit encoding conversions - search.cpan.org

created 22 April 2006 in links tagged perl and unicode.

Looks like it solves my favourite all-time problem in perl - no clear distinction between characters and byte sequences. Horay.

http://search.cpan.org/dist/encoding-warnings/lib/encodin...

Homepage of Crimson Editor - Free Text Editor, Html Editor, Programmers Editor for Windows

Homepage of Crimson Editor - Free Text Editor, Html Editor, Programmers Editor for Windows

created 28 January 2006 in links tagged editor, text, unicode and windows.

Does everything I want - file browser, UNIX line endings, it even gives me the level of control I need over file encodings. and it’s free. Perfect.

http://www.crimsoneditor.com/

Unicode Code Charts

Unicode Code Charts

created 27 October 2005 in links tagged reference and unicode.

http://www.unicode.org/charts/

Perl Loves UTF-8

created 16 October 2005 in talks tagged perl and unicode.

Given for the first time at the london.pm tech-meet at the Fotango offices on 2005/02/24, this was a 5-minute rant about perl, character sets, and why noone can ever get them right. The slides were written in OmniGraffle for some bizarre reason, but I think it worked quite well, and may use the technique again some time.

JSON Examples

JSON Examples

created 16 October 2005 in links tagged javascript and unicode.

..and not a single non-ascii character on the page.

http://www.crockford.com/JSON/example.html

Sorting It All Out : Stripping diacritics….

Sorting It All Out : Stripping diacritics....

created 08 September 2005 in links tagged unicode.

http://blogs.msdn.com/michkap/archive/2005/02/19/376617.aspx

Unicode HOWTO

Unicode HOWTO

created 07 August 2005 in links tagged programming, python and unicode.

http://www.amk.ca/python/howto/unicode

Copia

Copia

created 06 August 2005 in links tagged programming, python and unicode.

http://copia.ogbuji.net/blog/2005/08/04#alt_unicod

using utf-8 in irssi under screen

created 23 June 2005 in blog tagged linux, screen, unicode and utf8.

Firstly, tell your local terminal application that you want a utf-8 window. This is left to you, but under macos (which I use), right click the window, select ‘Window settings’, pick the ‘Display’ option from the drop-down, and pick utf-8 under ‘Character set encoding’.

Next, when you start the screen session, pass the ‘-U’ flag. This has to be passed to a new screen session - you can’t connect to an existing one this way.

screen -U

Alternatively, you can turn on the utf-8 flag for a single existing screen window by typing your hotkey (ctrl-a by default), then ‘:utf8 on’. This is good if you don’t want all of your windows to be utf now.

On the remote machine, make sure that the ‘LANG‘ environment variable is set to something UTF-8 like, for instance, I use

export LANG=en_GB.UTF-8

in my .bashrc.

Finally, you need to tell irssi to use UTF-8. Start it up in your new utf-8 window, and type

/set term_type utf-8

Hopefully everything should work now.

python and unicode

created 20 January 2005 in blog tagged python and unicode.

I like python’s unicode handling. Instead of perl’s situation, where file handles are assumed, by default, to be latin-1, python file handles (including STDIN/OUT) are assumed, by default, to be ASCII. Forget nasty things like ‘☃’, in python, you can’t even print ‘é’ without explicitly telling it how. Lovely.

More UTF8 pain

created 15 December 2004 in blog tagged browser, unicode and utf8.

Does no-one in the world care about non-ASCII characters? It’s pathetic. I’m trying to make HTML form uploads work for files with non-ASCII characters in their names, and I’m hitting the stupidest problems.

The main bugbear is mozilla - you can’t upload files with wide characters in their names. At all. Piece of shit. Safari seems to be encoding the upload filenames with some made-up encoding that I can’t figure out, so that’s out of luck. At least safari sends the actual contents of the files.

The one browser I’ve tried that works flawlessly is Internet Explorer. Microsoft, at least, seem to care about the non-US market.

UTF8 Openguides

created 13 December 2004 in blog tagged perl, unicode, utf8 and wiki.

I foolishly offered to make OpenGuides UTF-8 safe. Because I don’t do that enough at work, or something. Anyway, it’s going quite well - because I did all the grunt work in CGI::Wiki a while ago, it’s just a matter of finding all the inputs and outputs and making sure they’re encoded properly. So far, the page contents and names are utf-8 safe, along with the cookie preferences, so your username is good. The search stuff looks scary, and there are various broken plugins, etc, etc, so there’s still stuff to do. I should also do the hooks properly - CGI::Wiki should offer nice functions for this stuff.

Anyway, there’s a demo site here in case you feel like trying to break it. The patch against OG is here, out of my svn repository, of course.

safari and password fields

created 09 December 2004 in blog tagged browser, macos and unicode.

Today I discovered that safari ‘magically’ downgrades latin-1 input in form password fields to their nearest ascii equivalents - typing ‘pásswörd’ into a password box actually submits ‘password’. But you can cut and paste non-ascii in and it works fine. I’m very confused.

ASCII is not enough

created 24 November 2004 in blog tagged tests and unicode.

I need a rule. All testing MUST BE DONE WITH NON-ASCII CHARACTERS.

I’m just fed up with things breaking the moment someone foreign touches them.

JavaScript Unicode Charts

JavaScript Unicode Charts

created 09 October 2004 in links tagged unicode.

http://www.macchiato.com/unicode/charts.html