It’s not a stream of water, its a stream of data. Rather, if you think of your Twitter feed as a (more or less) gently flowing brooklet of data, the massive totality of Twitter data at any given moment is akin to the water brutally gushing out of a firehose. This firehose streams data – historical and/or real time tweets – to partners who have a commercial or intellectual use for it.
Who would have such a use? Academics, for one. A number of recent papers presented at the Web Science Conference have examined the efficiency and accuracy of tweets in spreading information. The University of Southampton has created a searchable ‘Tweepository’ of archived data.
It’s not just Twitter which has a firehose, it’s almost any social networking site (including those that might not spring to mind as social networks, such as Tumblr and WordPress). Which brings us to our second group of consumers – commercial organisations. Yandex, a Russian search engine, hopes to improve the efficacy and accuracy of its search results by adding Facebook posts to their results. Klout examines individuals’ social networking and ranks them on various dimensions, assessing who has the greatest networking ‘clout’. It’s essentially a market research company, and these individuals will often find themselves recipients of freebies from marketing departments.
So, if you’re feeling like you have an excellent idea for an app, can you just wander along and ask Twitter for access to their firehose? Unless you’re Sergei or Bill, not usually. However, you can purchase certain data sets from another group of users, firehose resellers. These provide added value services, such as parsing data from several firehoses together and adding enrichment metadata such as language and geolocation.