Yesterday Dataminr, a big data startup based in New York, announced something pretty extraordinary: that it would become the news discovery platform for CNN. This seems like one of those watershed moments in the history of the news industry that could change the industry’s dynamics fundamentally, like the advent of news agencies or the launch of CNN itself.
How can a technology startup become the news discovery platform for the world’s leading news organization? Because today, breaking events typically leave discoverable digital signatures before they become news, and Dataminr discovers these signatures as soon as they become algorithmically recognizable.
Most of these signatures are on Twitter, since Twitter has become the natural place that hundreds of millions of people post things they deem interesting, important, surprising, funny, scary, scandalous, or otherwise worth sharing – anything, in short, they deem newsworthy. No matter how effective any company’s news-gathering organization, it simply can’t beat the scale of this discovery system.
Most interestingly, Dataminr algorithmically discovers, qualifies, categorizes and communicates breaking events in real time. As they happen. This is an extremely difficult technological feat to pull off. There are half a billion to a billion tweets per day, and Dataminr’s algorithms process this stream of data and associated metadata in real-time to discover even the smallest micro-events as they happen and determine their significance, relevance and actionability.
How Well Does It Work?
How well does this work? In short, very well, both because there is so much signal on Twitter and because Dataminr has developed and honed its algorithms with an outstanding team of data scientists over the past three years.
One particularly memorable example of the kind of event discovery Dataminr excels at is the assassination of Osama bin Laden. Dataminr’s algorithms discovered the news on the basis of 19 tweets in a 5-minute period on May 1, 2011. The algorithms used signal pattern recognition, linguistic analysis, sentiment classification and cross-referencing with third-party data sources to identify the news. Dataminr alerted its clients of the news at 10:20pm. At 10:24pm, Keith Urbahn, the former Chief of Staff to Defense Secretary Donald Rumsfeld (not the country music singer), provided partial confirmation in his own tweet: “So I’m told by a reputable person they have killed Osama bin Laden. Hot damn.” The first move in S&P Futures caused by the news occurred at 10:39pm, and Bloomberg and the New York Times began reporting the news at 10:43pm. Quickly the news spiraled into one of the most viral events in Twitter’s history, with messages increasing from 19 in a 5-minute period to 20,000 per minute 30 minutes later.
Through its use of very sophisticated event discovery technology, Dataminr beat major news sources to the punch by 23 minutes on the biggest story of the year, and one of the biggest of the decade. Pretty cool stuff.