This will only get worse. The latest announcement from the Thomson Reuters GFMS service, the premier data analytics environment around gold and silver, indicates that their Copper commodity service on Eikon now moves from mining company to mine by mine performance. “It all adds another data-rich layer of fundamental research to our customers’ copper market analyses” says their head of research. And there, in that line, we have a “fundamental” issue that lies behind the torrent of announcements we see in the B2B sector at the moment. Think only of Verisk buying Wood Mackenzie last week at a price which went well beyond the expectations (17X ebitda) of counter bidders like McGraw Hill, and which shocked private equity players who relish the data sector but find it hard to imagine 12X as an exceedable multiple. The question is this: Risk management and due diligence are vital market drivers, but they are data-insatiable; any and all data that casts a light on risk must be included in the process; it is the analysis, especially predictive analytics, which adds the value; so who will own the analytics – the data companies, the market intermediaries (Thomson Reuters, Bloomberg etc), or the end user customers?

Those of us who come from the content-driven world – they were out in force at Briefing Media’s splendid Digital Media Strategies event last week in London – find this understandably hard to argue, but our biggest single threat is commoditization. Even more than technology disruption, to which it is closely related, data commoditization expresses the antithesis of those things upon which the content world’s values were built. When I first began developing information services, in pre-internet dial-up Britain, we spoke lovingly of “proprietary data”, and value was expressed in intellectual property that we owned and which no one else had. For five years I fought alongside colleagues to obtain an EU directive on the “Legal Protection of Databases”, so it is in a sense discouraging to see the ways things have gone. But it is now becoming very clear, to me at least, that the value does not lie in the accumulation of the data, it lies in the analytics derived from it, and even more in the application of those analytics within the workflow of a user company as a solution. Thus if I have the largest database of cowhide availability and quality on the planet I now face clear and present danger. However near comprehensive my data may be, and whatever price I can get now in the leather industry, I am going to be under attack in value terms from two directions: very small suppliers of marginal data on things like the effect of insect pests on animal hides, whose data is capable of rocking prices in markets that rely on my data as their base commodity; and the analytics players who buy my data under licence but who resell the meaning of my data to third parties, my former end users, at a price level that I can only dream about. And those data analytics players, be they Bloomberg (who in some ways kicked off this acquisition frenzy five years ago when they bought Michael Liebrich’s New Energy Finance company) or others, must look over their shoulders in fear of the day when the analytics solutions become an end user App.

So can the data holding company fight back? Yes, of course, the market is littered with examples. In some ways the entire game of indexation, whereby the data company creates an indicative index as a benchmark for pricing or other data movement (and as a brand statement) was an attempt to do just that. Some data companies have invested heavily in their own sophisticated analytics, though there are real difficulties here: moving from that type of indicative analytics to predictive analysis which is shaped as a solution to a specific trader’s needs has been very hard. Much easier was the game of supplying analysed data back to the markets from which it originated. Thus the data created by Platts or Argus Media and the indexation applied to it has wonderful value to Aramco when pricing or assessing competitive risk. But in the oil trading markets themselves, where the risk is missing something that someone else noted, analysts have to look at everything, and tune it to their own dealing positions. Solutions are changing all the time and rapid customization is the order of the day.

Back out on the blasted heath which once was B2B magazine publishing, I kept meeting publishers at DMS who said “Well, we are data publishers now”. I wonder if they really understand quite what has happened. Most of their “data” can be collected in half an hour on the Open Web. There is more data in their domains free on DBpedia or Open Data sources than they have collected in a lifetime of magazine production. And even if they come up with a “must have” file that everyone needs, that market is now closing into a licensing opportunity, with prices effectively controlled, for the moment, by those people who control the analytics engines and the solution vending. Which brings me back to Verisk and the huge mystery of that extravagant pricing. Verisk obviously felt that its analytics would be improved in market appearance by the highly respectable Wood Mackenzie brand. Yet if a data corner shop, let alone Platts or Argus Media, were to produce reporting and data that contradicted Wood Mackenzie, anyone doing due diligence on their due diligence would surely demand that Verisk acquire the dissenting data and add that to the mix? If data really is a commodity business, far better to be a user than an owner.

Now, are you ready for this? I am not sure that I am, but I feel honour bound, having started a discussion last month under this heading about Internet of Things/Internet of Everything (IoT/IoE), to finish it by relating it back to the information marketplace and the media and publishing world. It is easy enough to think of the universal tagging of the world around us as a revolution in logistics, but surely it’s only effect cannot be to speed the Amazon drone ever more rapidly to our door? Or to create a moving map of a battlefield which relates what we are reading about in a book to all of the places being mentioned as we turn the pages? Or create digital catalogues as every book is tagged and can respond by position and availability!

You are right: there must be more to all of this. So let us start where we are now and move forward with the usual improbable claims that you expect to read here. Let’s begin with automated journalism and authorship, which, when I wrote here about the early work of Narrative Science and the Hanley Wood deal, was in its infancy, and then came Automated Insights and the Wordsmith package (automatedinsights.com). Here, it seemed to me, were the first steps in replacing the reporter who quarries the story from the press release with a flow of standardised analytics which could format the story and reproduce it in the journal in question just as if it had been laboriously crafted by Man. End result is a rapid change in the newspaper or magazine cost base (and an extension to life on Earth for the traditional media?).

I no longer think this will be the case. As with the long history of the postponed glories of Artificial Intelligence itself, by the time fully automated journalism arrives, most readers will be machines as well as most writers, in fields as diverse as business news and sports reporting and legal informatics and diagnostic medicine and science research reporting. Machine 2 Me will be rapidly followed by real M2M – Machine to Machine. The question then sharpens crudely: if the reporting and analysis is data driven and machine moderated, will “publishing” be an intermediary role at all? Or will it simply become a data analysis service, directed by the needs of each user organisation and eventually each user? So the idea of holding content and generalizing it for users becomes less relevant, and is replaced by what I am told is called “Actionable Personalization”. In other words, we move rapidly from machine driven journalism to personalised reporting which drives user workflows and produces solutions.

Let’s stumble a little further along this track. In such a deeply automated world, most things that retain a human touch will assume a high value. Because of their rarity, perhaps, or sometimes because of the eccentric ability of the human brain to retain a detail that fails the jigsaw test until it can be fitted into a later picture. We may need few analysts of this type, but their input will have critical value. Indeed, the distinguishing factors in discriminating between suppliers may not be the speed or capacity or power of their machinery, but the value of their retained humans who have the erratic capacity to disrupt the smooth flow of analytical conclusion – retrospectively. Because we must remember that the share price or the research finding or the analytic comparison has been folded into the composite picture and adjustments made long before any human has had time to actually read it.

Is all this just futurizing? Is there any evidence that the world is beginning to identify objects consistently with markers which will enable a genuine convergence of the real and the virtual? I think that the geolocation people can point to just that happening in a number of instances, and not just to speed the path of driverless cars. The so-called BD2K iniatives feature all sort of data-driven development around projects like the Neuroscience Information Framework. Also funded by the U.S. government, the Genbank initiatives and the development of the International Nucleotide Sequence Database Collaboration, point to a willingness to identify objects in ways that combine processes on the lab workbench with the knowledge systems that surround them. As so often, the STM world becomes a harbinger of change, creating another dimension to the ontologies that already exist in biomedicine and the wider life sciences. With the speed of change steadily increasing these things will not be long in leaving the research bench for a wider world.

Some of the AI companies that will make these changes happen are already in movement, as the recent dealings around Sentient (www.sentient.ai) make clear. Others are still pacing the paddock, though new players like Context Relevant (www.contextrelevant.com) and Scaled Inference (https://scaled inference.com) already have investment and valuations which are comparable to Narrative Science. Then look at the small fast growth players – MetaMind, Vicarious, Nara or Kensho – or even Mastodon C in the UK – to see how quickly generation is now lapping generation. For a decade it has been high fashion for leading market players in information marketplaces to set up incubators to grow new market presence. We who have content will build tools, they said. We will invest in value add in the market and be ready for the inevitable commoditization of our content when it occurs. They were very right to take this view, of course, and it is very satisfying to see investments like ReadCube in the Holtzbrinck/Digital Science greenhouse, or figshare in the same place, beginning to accelerate. But if, as we must by now suspect, the next wave to crash on the digital beach is bigger than the last, then some of these incubations will get flooded out before they reach maturity. Perhaps there was no time at which it is more important to have a fixed focus on 6 months ahead and three years. The result will be a cross-eyed generation, but that may be the price for knowing when to disinvest in interim technology that may never have time to flower.

« go backkeep looking »