Yesterday Dataminr, a big data startup based in New York, announced something pretty extraordinary: that it would become the news discovery platform for CNN. This seems like one of those watershed moments in the history of the news industry that could change the industry’s dynamics fundamentally, like the advent of news agencies or the launch of CNN itself.
How can a technology startup become the news discovery platform for the world’s leading news organization? Because today, breaking events typically leave discoverable digital signatures before they become news, and Dataminr discovers these signatures as soon as they become algorithmically recognizable.
Most of these signatures are on Twitter, since Twitter has become the natural place that hundreds of millions of people post things they deem interesting, important, surprising, funny, scary, scandalous, or otherwise worth sharing – anything, in short, they deem newsworthy. No matter how effective any company’s news-gathering organization, it simply can’t beat the scale of this discovery system.
Most interestingly, Dataminr algorithmically discovers, qualifies, categorizes and communicates breaking events in real time. As they happen. This is an extremely difficult technological feat to pull off. There are half a billion to a billion tweets per day, and Dataminr’s algorithms process this stream of data and associated metadata in real-time to discover even the smallest micro-events as they happen and determine their significance, relevance and actionability.
How Well Does It Work?
How well does this work? In short, very well, both because there is so much signal on Twitter and because Dataminr has developed and honed its algorithms with an outstanding team of data scientists over the past three years.
One particularly memorable example of the kind of event discovery Dataminr excels at is the assassination of Osama bin Laden. Dataminr’s algorithms discovered the news on the basis of 19 tweets in a 5-minute period on May 1, 2011. The algorithms used signal pattern recognition, linguistic analysis, sentiment classification and cross-referencing with third-party data sources to identify the news. Dataminr alerted its clients of the news at 10:20pm. At 10:24pm, Keith Urbahn, the former Chief of Staff to Defense Secretary Donald Rumsfeld (not the country music singer), provided partial confirmation in his own tweet: “So I’m told by a reputable person they have killed Osama bin Laden. Hot damn.” The first move in S&P Futures caused by the news occurred at 10:39pm, and Bloomberg and the New York Times began reporting the news at 10:43pm. Quickly the news spiraled into one of the most viral events in Twitter’s history, with messages increasing from 19 in a 5-minute period to 20,000 per minute 30 minutes later.
Through its use of very sophisticated event discovery technology, Dataminr beat major news sources to the punch by 23 minutes on the biggest story of the year, and one of the biggest of the decade. Pretty cool stuff.
False Information
What happens when there are misleading signals or false information posted? Twitter enables individuals to post supporting evidence in the form of photos, videos and links to other materials, which can help in verification, but sometimes false signals generate a following. At 1:07pm on April 23, 2013, the Associated Press posted the following tweet: “Breaking: Two Explosions on the White House and Barack Obama is injured.” Dataminr alerted its clients of this tweet immediately, and within minutes, the stock market lost $121 billion of market value. Thus began the famous “Flash Crash” of 2013:
Dataminr’s algorithms were not fooled, however. The AP Twitter account had been hacked, and the tweet was false. As the market was still falling, Dataminr alerted its clients that the tweet was likely false based on the lack of corroborative evidence, and quickly the market jumped back up to where it had been trading previously. As in the offline world, more intelligence helped sort out what really happened.
Dataminr beat the next closest news source on the hacked AP story by two full minutes – a millennium in the world of real-time financial data, where hedge funds spend tens of millions of dollars to get news signals several milliseconds faster than their competitors.
The Meaning For Journalism
The Dataminr announcement with Twitter and CNN introduces significant new questions into a media industry already in radical flux, most notably what will be the role of news organizations and journalists in an age when news can be electronically discovered.
In my view these roles will remain very significant, but their nature will shift. As is happening in so many other industries, information technology is continuing to automate an increasing number of tasks associated with data acquisition, analysis and distribution, but the resulting overflow of information is making human insight and the very complex analytical tasks that only humans can do even more important.
The information analysis that machines are capable of remains limited. They are excellent at classifying events on the basis of clear rules, but one needs human intelligence to help people understand which events are important (curation) and what events mean (analysis and context setting). In a world of overflowing information and proliferating media outlets, these tasks will be more important than ever, and I believe consumers will continue to assign meaningful value to them.
In addition, there will continue to be many stories – often the most important stories – that only old-fashion shoe leather can discover and analyze, particularly those that result from investigative reporting. The public would likely not have learned about the involvement of Chris Christie’s staff in the George Washington Bridge lane-closing scandal, for example, had enterprising journalists not started digging around the question of why traffic had become so outrageously bad in Fort Lee last September.
It is in part because of this kind of thoughtful, investigative reporting that the press plays such an important role in protecting rights and liberties in democratic societies through its exposure of abuses of power. Real-time information discovery can play a major and unique role in exposing abuses of power as well – witness the Arab Spring and recent protests in Turkey, Ukraine and Thailand – but it will never be able to replace investigative journalism or journalism that helps us understand the context and importance of major events and what they ultimately mean.
Dataminr is a Venrock-backed company, and I serve on its board of directors.
Here is Ted Bailey, the founder and CEO of Dataminr, making the announcement:
http://www.youtube.com/watch?v=q_2jcwp1Amc