jerakeen.org contains just about everything I produce, and I try to use sensible urls - every page has a unique url, of the form /category/slug/ or /category/yyyy/mm/dd/slug/. The index for a category is just /category/. Pages have tags, and you can see all pages belonging to a tag at /tags/tagname/ as well. You can search using /search?q=search_term. Finally, you can get a feed of any category at a /category-name/feed page.
Enough back-story. This system is just about sensible, but only because it’s so simple, and even then there are odd bits. For instance, why category/feed? Why not feed/category? Why does the search page have a CGI parameter in it, but not the tag pages? Suppose I wanted an RSS feed of pages tagged with ‘python’, what would that URL look like? /feed/tag/python? /tag/python/feed/? Feed types are not sub-headings under categories. The single-page permalinks make sense, but why do they encode only the category, date and slug? Why not the tags?
Essentially, I want to describe several dimensions of filtering as well as a view type in a single URL, and I’m feeling constrained by the requirement to have a linear path. I want to describe lists of pages filtered by category, tags, search terms and dates, and I’d like to view this list of page as an HTML file, or as RSS, or Atom, or JSON… Essentially, I’d like the innards of my site to be a pipeline - I perform searches, get a list of pages, sort them, then render the list. Each step has little to do with the other steps. I also shouldn’t have to do anything special to my code to add an RSS feed to a tagged page list - it should have a feed automatically just because it’s a list of pages.
At this point, something similar to ?category=blog&tag=python&view=atom makes a lot more sense as a sensible URL. It’s actually describing intent properly. I could always just put each of those words into a normal URL path, but there’s an implicit assumption that the ordering means something there, and it’s not true. There are many way of ordering parameters, of course, so the uniqueness of the URL is broken to a certain extent, but anyone trying to uniquely identify URLs really should be normalizing parameter order anyway.
This leaves me with a few problems. Parameter-using URLs are certainly a lot uglier than path-based ones. Google seem to be ok with parameters in URLs now, but only up to a point, and I have three parameters on the trivial example above already. As an alternative, blech suggests something akin to Perl’s hash interpolation - an url like /category/blog/tag/python/view/html, which is an interesting idea, but still falls prey to the ordering problem. The ordering of the path atoms in an URL implies a strong hierarchy that doesn’t exist here.
Essentially, URLs suck. They’re not dimensional enough for my needs already, and this site is utterly trivial.
aaah, you have me torn here. In a perfect world, yes. But almost noone actually _does_ this. For a start, how do I link to it? I have to have a resource that my web server serves to a web browser with an RSS mime type, so that the browser can automatically open it in the newsreader.
Hmm, sounds like you've actually got too much dimensionality. If the implied hierachy of paths isn't needed, then why not serve the same stuff at tag/python and python/tag.
As Phil suggests, the specific representation served should depend on the headers, not the URI.
Ideally each page would have only one url pointing to it, if only so links to it from del.icio.us all link the same thing. Every actual 'page' on this site has a single unique url, so it seems a shame to lose that for the indexes.
Personally, I'd have /search/search_term for the search, and have all feeds under /feeds/ - as you say, "feed" is not a sub-category (obviously for feeds it would be perfect if it was the same URL and all clients sent correct Accept headers, but I sadly don't live in magicland :) ). That seems to flow better in my head anyway.
I think if you don't want a long query string, and I agree it's ugly, then you just pick an ordering, even if it'll never be perfect (e.g. if you want all tagged posts in a particular category, I'd put the category first). And have /tags/python/feed redirect to /feeds/tags/python etc.
But approaching the question a completely different way, perhaps the little-known and never-used path parameters of URLs - see section 3.3 of RFC 2396 - might be interesting. Then you could have something mad like:
example.org/search;feed/kittens (RSS feed of a search for kittens)
example.org/blog;years=2006,2007/tag;python/ (entries in the blog category in 2006 or 2007 tagged with python)
On http://landmarktrust.dracos.co.uk/ just for fun, I've gone for path components of the form /people=4/to=2008-02-01/ (order normalised) which I think looks quite nice. And /vaguely/ hierarchical ;)
Well, the best thing to do IMHO is to have the feeds available as a seperate representation of the HTML resources.
The standard way to request a particular representation is by using the Accept headers - though nothing is stopping you from also accepting an ?accept= query string to override the HTTP headers. This provides you with a method to link to the feeds.
The "correct" way to handle it would be to use content negotiation to output different views of a single resource. So instead of jamming "feed" or "json" into the URL, you'd just do a "Accept: application/atom+xml" or "Accept: application/json" in the UA.
However, this leads to bad autodiscovery mechanisms as there is no way to tell an UA to actually inject that stuff into its request before it stroles along to fetch a resource, so perhaps the Rails people have found a nice compromise by separating the resource and its representation type with a semicolon.
This is a compromise and somewhat of a hack, though, and your URL's should theoretically look the same regardless of what representation you return to a user agent. So as far as you can, you should try to use 'Accept' in your requests to control the representation you receive, and only apply ';something' to the URL when it's absolutely necessary.
Phil Wilson
2007-10-25 20:53 on URLs suck in Blog
Isn't the desire for a feed rather than an HTML page meant to be expressed in the request?