jerakeen.org

by Tom Insam

notes☴

code☷

links☲

photos☵

Warcraft guild achievements as RSS

created 10 February 2009 in notes tagged achievements, guild, programming, python, rss and warcraft.

I play World of Warcraft. Oh, the shame. But I play it because I’m in a fun guild - we do science!. Well, actually they do science. I’m still at the ‘cleaning the glassware afterwards’ stage, but a tauren can dream..

Anyway, I code. It’s what I do. So once WoLK came out and half the guild went completely insane and started chasing the really silly achievements, it was clear we were going to need an RSS feed of the things. So I built one. It’s based on the Armory, like most WoW tools, and is a complete kludge, like most of my tools. But here are my notes anyway.

The trick to scraping the Armoury is pretending to be Firefox. If you visit as a normal web browser, they serve you a traditional HTML page with some Ajax, and it’s all quite normal and boring. If you visit the armoury in firefox they return an XML document with an XSL stylesheet referenced in the header that transforms the XML into a web page. Why are they doing this? It must be a huge amount of work compared to just serving HTML, I don’t get it. Let’s ignore that. Fake a firefox user agent, and you can fetch lovely XML documents that describe things! There’s no ‘guild achievement’ page, alas, so let’s start by fetching the page that lists the people in the guild. Using Python.

import urllib, urllib2
opener = urllib2.build_opener()
# Pretend to be firefox
opener.addheaders = [ ('user-agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4') ]
url = "http://eu.wowarmory.com/guild-info.xml?r=%s&n=%s&p=1"%( urllib.quote(realm,''), urllib.quote(guild,'') )
req = urllib2.Request(url)
data = opener.open(req)

(This is the EU armoury, because that’s where I am). The armoury is a really unreliable site, so in practice I put lots more error handling round this. But error handling makes for very hard-to-read example code. The XML looks like this:

<page globalSearch="1" lang="en_us" requestUrl="/guild-info.xml">
  <guildKey factionId="1" name="unassigned variable" nameUrl="unassigned+variable" realm="Nordrassil" realmUrl="Nordrassil" url="r=Nordrassil&amp;n=unassigned+variable"/>
  <guildInfo>
    <guild>
      <members filterField="" filterValue="" maxPage="1" memberCount="66" page="1" sortDir="a">
        <character achPoints="2685" class="Hunter" classId="3" gender="Male" genderId="0" level="80" name="Munchausen" race="Tauren" raceId="6" rank="0" url="r=Nordrassil&amp;n=Munchausen"/>
        <character achPoints="1175" class="Paladin" classId="2" gender="Male" genderId="0" level="80" name="Jonadin" race="Blood Elf" raceId="10" rank="1" url="r=Nordrassil&amp;n=Jonadin"/>
        ...

I parse XML using xmltramp, because I’m very lazy and it works. I use xmltramp for all my XML parsing needs. It’s old, and there might be something better, but I don’t really care. This is a toy.

import xmltramp
xml = xmltramp.seed( data )
toons = xml['guildInfo']['guild']['members']['character':]

That gets us a list of people in the guild. The rendered web page has pagination, but the underlying XML seems to have all characters in a single document, so no messing around fetching multiple pages here. (I’ve tried this on a guild of 350ish people. Maybe it paginates beyond that. Don’t use this script on a guild that big, it won’t make you happy.)

Alas, the next thing we have to do is loop over every character and fetch their achievements page (that’s why you shouldn’t run this script over a large guild). This is extremely unpleasant and slow.

for character in toons:
    char_url = "http://eu.wowarmory.com/character-achievements.xml?r=%s&n=%s"%( urllib.quote(realm,''), urllib.quote(character('name'),'') )
    char_req = urllib2.Request(char_url)
    char_data = opener.open(char_req)
    char_xml = xmltramp.seed( char_data )

The achievement XML looks like this:

...
<achievement categoryId="168" dateCompleted="2009-02-08+01:00" desc="Defeat Shade of Eranikus." icon="inv_misc_coin_07" id="641" points="10" title="Sunken Temple"/>
<achievement categoryId="168" dateCompleted="2009-01-31+01:00" desc="Defeat the bosses in Gundrak." icon="achievement_dungeon_gundrak_normal" id="484" points="10" title="Gundrak"/>
<achievement categoryId="155" dateCompleted="2009-01-31+01:00" desc="Receive a Coin of Ancestry." icon="inv_misc_elvencoins" id="605" points="10" title="A Coin of Ancestry"/>
...

My biggest annoyance here is that there’s no timestamp on these things better than ‘day’, so you don’t get very good ordering when you combine them later. I could solve this by storing some state myself, remembering the first time I see each new entry, etc, etc, but I’m trying to avoid keeping any state here, so I don’t do that. The XML also lists only 5 achievements per character, and getting more involves fetching a lot more pages, so the final feed includes only the 5 most recent achievements per character. Again, something I could solve with local storage.

Anyway, now I have a list of everyone in the guild, and their last 5 achievements. It’s pretty trivial building a list of these and outputting Atom or something. I do it using ‘print’ statements, myself, because I’m inherently evil. You can’t deep-link to the achievement itself on the Armoury, so I link to the wowhead page for individual achievements.

Because the Armoury is unreliable, and my script is slow, I don’t use this thing to generate the feed on demand. I have a crontab call the script once an hour, and if it doesn’t explode, it copies the result into a directory served by my web browser. If it does explode, then meh, I’ll try again in an hour. The feed isn’t exactly timely, but we’re not controlling nuclear power stations here, we’re tracking a computer game. It’ll do.

The code I actually run to generate the feed can be found in my repository here, and the resulting feed (assuming you care, which you shouldn’t, you’re not in the guild..) is here. feel free to steal the code and do your own guild feeds.

Google Reader API

Google Reader API

created 10 December 2008 in links tagged api, google, reader and rss.

unofficial api access to Google Reader data. Not that I have a use for it, but it’s nice to see it

http://www.niallkennedy.com/blog/2005/12/google-reader-ap...

iPhone use while in the middle of nowhere

created 18 August 2008 in notes tagged apple, byline, iphone, offline, rss, software, twitter and twitterrific.

I should write up ‘things learned from taking only an iPhone to the middle of nowhere where there’s no internet access‘. One of those things was, I really want a ‘that worked’ for updating my twitter status using Twitterrific. And anything else that does a write over the network.

Avoid notifying users of success.

If a read operation fails, meh. But if I just wrote a twitter update, and it doesn’t go through, I want to know. Twitter might fail, the app might fail, the connection might fail. I want success notification, rather than 1 minute of waiting for a failure message that might not arrive. THIS IS NOT A NORMAL SITUATION. But nevertheless. Maybe the rule should be ‘avoid notifying users of success where success is expected‘.

Another useful app - Byline is great when there’s wobbly bandwidth - usable even when the only connection is a spotty non-edge GSM link. Admittedly, you have to just put the phone down somewhere with a connection for 10 minutes while it slurps. But things stay slurped. It’ll pull the associated images of RSS items too, so I can look at my Flickr feeds easily.

It’s got disadvantages - you have to switch to Google Reader to read your feeds for a start. In the absence of a local Mac GUI client to rival NetNewsWire, this is painful (Fluid helps). And Byline doesn’t do ‘folders’ (tags? what does google reader call them? I’m new to this), so you just get a big flat list of unread items, which could be annoying if you subscribe to lots of feeds. I’ve recently gone through a grand purge of all my feeds and mailing lists, so my traffic levels are pretty controllable.

Except that my Economist subscription feeds did their weekly ‘the magazine shipped’ thing, and dumped 90 unread items in the list. And these are unread items that are interesting and might need reading. Unlike with the iPhone NNW client, I can’t selectively drop subscriptions from being visible on the phone - it’s all or nothing here, and Byline loads only 25 (I think) entries at a time for off-line reading. The Economist provides only a partial feed, so I had to sit where there was bandwidth and go through them in batches, ‘starring’ the ones that looked interesting then hitting ‘fetch more’ and waiting. Once I’d done this, and it didn’t take too long, the experience was great - I had the full content of the Economist articles synched locally for convenient reading (and the Economist has a nice one-narrow-column layout that lends itself well to iPhone reading).

Irritating RSS feed links

created 06 February 2008 in blog tagged discovery, feed, googlesocialapi, rss and shelf.

A side-effect of all this Google Social lunacy is that I’m seeing a lot of URLs for people that I wouldn’t normally have put in their Address Book entries. For instance, Simon Wistow’s Vox page links to his gestalt page which in turn links to his use.perl page, so I see all of these URLs in Shelf. It fetches the pages, and discovers that there’s a single RSS feed advertised on the use.perl page - http://use.perl.org/index.rss. But this RSS feed is nothing to do with Simon’s page - it’s the main use.perl article feed. Shelf doesn’t know this, of course, so Simon’s display in my Shelf window contains all recent use.perl articles.

The HTML spec seems to imply to me that rel=”alternate” links are for linking to the same content, but represented in a different way, not some completely unrelated content that happens to be hosted on the same domain. This is very annoying.

I’m picking on use.perl unreasonably here, of course. Lots of people do it. use.perl is just the first one I noticed. Followed by search.cpan.org (author modules pages have an RSS feed of the master module upload list). But there are others.

Universal Feed Parser in Ruby

Universal Feed Parser in Ruby

created 03 January 2008 in links tagged atom, feed, parser, rss and ruby.

A port of the Python Universal Feed Parser to Ruby. Lots of deps, alas, which is annoying, but it does work.

http://rfeedparser.rubyforge.org/

Triplr

Triplr

created 30 March 2007 in links tagged convert, data, json, rdf, rss and webservice.

Web service to convert from one data type to another - RSS to JSON, Triples to RSS, etc, etc. Shiny.

http://triplr.org/

Encoding RSS Titles ・ 詹姆斯

Encoding RSS Titles ・ 詹姆斯

created 17 June 2006 in links tagged annoying and rss.

Quite aside from the url, whic is both awesome and breaks things, this is Yet Another Thing for me to e annoyed about

http://www.xn--8ws00zhy3a.com/blog/2006/06/encoding-rss-t...

mozdev.org - forumzilla

mozdev.org - forumzilla

created 22 May 2006 in links tagged firefox, rss and thunderbird.

Syndicate RSS feed entries into thunderbird folders. Seems to work better than the built-in thunderbird RSS support (which plain Doesn’t Work for me)

http://forumzilla.mozdev.org/

NewsGator API Homepage

NewsGator API Homepage

created 27 November 2005 in links tagged blog, programming, reference, rss, web and xml.

http://www.newsgator.com/ngs/api/default.aspx

jerakeen on SuprGlu

jerakeen on SuprGlu

created 04 November 2005 in links tagged rss.

me. aggregated. How meta.

http://jerakeen.suprglu.com/

Universal Feed Parser docs

Universal Feed Parser docs

created 08 October 2004 in links tagged docs, python, rss and xml.

http://feedparser.org/docs/

NNW subscriptions

created 16 August 2004 in blog tagged macos and rss.

So I wanted to see which of my NNW () subscriptions were dead. And I wanted to get the hang of AppleScript. Right.

set errorlog to ""

tell application "NetNewsWire"
  repeat with check in subscriptions
    set err to error string of check as string
    if length of err > 1 then
      set errorlog to errorlog & “Error for ‘” & ((display name of check) as string) & “’ (” & (RSS URL of check as string) & “): ‘” & ((error string of check) as string) & “’\r”
    end if
  end repeat
end tell

tell application “BBEdit”
  make new text window with properties {contents:errorlog}
end tell

Pretty nifty. Course, you have to have BBEdit. But making it use TextEdit shouldn’t be hard.

referrer and agent mixup

created 15 August 2004 in blog tagged rss.

The blogging/RSS community has discovered HTTP headers actually have a defined purpose. Amazing. It’s like when they discovered that HTTP actually allows you to see if a page has changed since you last downloaded it and not get the whole thing. That was fun, too.

Ok, that’s a little bit too bitter. But I can name one linux RSS reader that’s done the Right Thing here for months. </smug>

sharpreader

created 13 August 2004 in blog tagged rss and windows.

sharpreader - a windows RSS feed reader. Uses .NET, which is all the rage nowadays, apparently.

It’s beautiful, easily the best RSS reader I’ve ever seen, and that includes the one I wrote :-). Proper OPML export / import (It’s amazing how meny readers get this wrong), the interface, although slightly hard to figure out makes a lot of sense once you get the hang of it, and frankly usability and learning curves can go hang once I can use the thing.

The nicest feature, though, is the threading. I’ll notice which other blogs you read have linked to this one, and will do the litte ‘+’ symbol thing so you can expand them and see all the interlinks. It’s niiiiiiiice. I’m suddenly tempted to go back to “lectern”:/programming/lectern and hack this in somehow, though it’ll be hard. Maybe I’ll write a mac one and steal the niche of NNW. Maybe I’ll write a bad alpha and get distracted by some other project. Yes, that seems to be the best idea.

Software interfaces evolve like this, it’s wonderful to watch. Web browsers are another fairly immature tech that grow “tabs” and other interface things, and that’s nice to watch too, even if they’re stupid. Genuinely new types of apps are rare, I can’t think of many off the top of my head, although obviously once they’re pointed out, it’s obvious…