A Call for Responsible URL Shortening Services (RUSS)
The rising popularity of twitter has given rise to a whole new genre of services centered around URL link shortening. Link shortening services such as http://bit.ly and http://is.gd have garnered a strong following. Twitter’s 140 character limit on message length or tweets is the primary reason for the growth of these services. Besides saving precious char length, these shortened URLs also reduce transcription or copy/paste errors in emails/IM. tinyurl.com the granddaddy of URL shorteners simply provided what’s termed a 301 redirect to the given address. Over time however these services have become highly innovative and provide many more features. A recent article in searchengineland discusses the features of various link shortening services and in comparative detail.
One of the most attractive features is link analytics provided by bit.ly and several others. The link tracking is unobtrusive even as I am redirected to the target web page and the quality of reporting varies by service provider.
However, some services, most notably ow.ly and digg.com have taken a more extreme role. Instead of redirecting the URL, they keep the user on their domain and display the target website in a frame. One of the biggest problem with this approach is that, inserting the shortened link in my twitter message or other posts will not count to that website’s incoming link. Trackbacks, on your blog will not work with these links. Google’s crawler will not give credit to the target website, for this shortened URL as it technically does not point to that target web page. This is a serious limitation in a SEO conscious world. Secondly, they use up pricey real estate on your browser, for free, by placing a top bar with their own branding . Talk about being intrusive!? And finally do we want to be reminded yet again, against the usage of frames?
Digg succumbed to pressure from user complaints and removed the top bar for regular users. A Google search on “remove digg bar” shows that people just don’t want this feature, and in fact a lot of those motivated people have designed ways to disable it.
For applications on Twitter that analyse what links people mostly talking about, URL shortening introduces a whole lot of challenges. Typically to just know what a shortened URL is pointing at, You need to issue a HEAD request to the service provider (bit.ly/ow.ly). Since usually it’s a redirection, just taking the ‘location’ value from the response is adequate. The HEAD request just gets the header response as against the whole body with a GET request. This is optimal if you have to process say a half a million links a day.
Now, if you apply the same logic to digg.com or ow.ly, things get squirrely. You get a 200 response from digg.com and a 403 (unauthorized access) response from ow.ly. A 200 response is essentially the html document itself, whereas 403 error prohibits that http request (HEAD). In either of these cases do we get information about the target page? NO. For these services, you’d actually need to download the page (a GET request), parse the HTML, and extract the target address. Obviously this model is not only sub-optimal, it is highly nonscalable. You’d have to manually add parsing code for each such service. Moreover, if they change their layout tomorrow, your parser will fail. There are numerous other services which behave this way.
URL shortening services are here to stay and touches almost every web user. If you are providing such a service you are owning up a certain responsibility towards people using your service. You just cannot be insensitive to them.


