Apr
3
Big Data: Six of the Best
Filed Under B2B, Big Data, Blog, data protection, Financial services, Industry Analysis, internet, privacy, Publishing, Reed Elsevier, Search, semantic web, social media, Thomson, Uncategorized, Workflow | 1 Comment
So the UK government has decided to monitor every tweet and every email and every social network connection, all in the good cause of the greater security of the citizen. While I am up to my eyes in articles defending the civil liberties of the citizen (at least some of whom are more afraid of the police than the terrorists) I see little commentary on the logistics of all of this, and at best guess estimates that owe more to powerful imagination than logistical reason. My mind goes to the software involved, and that prompts a wider question: while we are now familiar with Hadoop and the techniques used by the cloud-based systems of Yahoo!, Google, Amazon and Facebook, what deployable software is there in the market which works at a platform level and interfaces information systems with very large data aggregations on the one side, and user interfaces on the other.
In the media and information services area the obvious answer is MarkLogic (www.marklogic.com). Now a standard for performance in its sector, MarkLogic chose media alongside the government sector as its two key areas of market exposure in the development years. Throughout those years I have worked with them and supported their efforts to “re-platform” the industry. MarkLogic 5.0 is just about as good as it gets for information services going the semantic discovery route, and the testimony to this is installations in differing information divisions in every global and many national information service providers. So when MarkLogic roll out the consultancy sell these days, they do so with almost unparalleled experience of sector issues. I have no prior knowledge, but I am sure that they would be players in that Home Office contract.
Other potential players come from outside the media sector and outside of its concentration on creating third party solutions. In other words, rather than creating a platform for a content holder to develop client-side solutions, their experience is directly with the end-user organization. Scanning the field, the most obvious player is Palantir www.palantir.com). A Palo Alto start-up of the 2004 vintage (Stanford and PayPal are in its genes), this company targetted government and finance as its key starter markets, and has doubled in size every year since foundation. It raised a further $90m in finance in the difficult year of 2010, and informal estimates of its worth are now over $3 billion. It does very familiar things in its ability to cross search structured, unstructured, relational, temporal and geospatial data, and it now seems to be widening its scope around intelligence services, defense, cyber security, healthcare and financial services, where its partner on quant services is Thomson Reuters (QA Studio). This outfit is a World Economic Forum 2012 Tech pick – we all love an award – and as we hurry along to fill in the forms for the UK intelligence service, I expect to find them inside already measuring the living space – and the storage capacity.
My next pick is something entirely different. Have a look at www.treato.com. This service, from First Life Research, is more Tel Aviv than Palo Alto, but it provides something that UK security will be wanting – a beautifully simple answer to a difficult question. Here the service analysed 160,00 US blog sites and health portals comment sections to try to trap down what people said about the drugs they were taking. They have now examined 600 m posts from 23 million patients commenting on 8500 drugs, and the result, sieved through a clinical ontology-based system, is aggregated patient wisdom. When you navigate this, you know that this will have to find a place in evidence-based medicine before too long, and that the global service environment is on the way. In the meanwhile, since the UK National Health Service cannot afford this, lets apply it to the national email systems, and test the old theory that the British only have two subjects, their symptoms and the weather.
We started with two Silicon Valley companies, so it makes sense next to go to New Zealand. Pingar (www.pingar.com) starts where most of us start – getting the metadata to align and work properly. From automating meta tagging to automatic taxonomy construction, this semantic -based solution, while clearly one of the newest players on the pitch, has a great deal to offer. As with the other players I will come back to Pingar in more detail and give it the space it deserves but in the meanwhile I am very impressed by some indicative uses. Its sentiment analysis features will surely come in useful in this Home Office application, as we search to find those citizens more or less likely to create a breach of the peace. If there are few unique features – here or anywhere in these services, then there is a plenitude of tools that can make a real difference. Growing up in the shadow of MarkLogic and Palatir is a good place to be if you can move fast/agile.
But there are others. Also in the pack is Digital Reasoning (www.digitalreasoning.com), Tim Estes’ company from Franklin TN. Their Synthesys product has scored considerable success, in, guess where? The US government. Some analysts see them as Palantir’s closest competitor in size terms, and here is how they define the problem:
“Synthesys is the flagship product from Digital Reasoning that delivers Automated Understanding for Big Data. Enterprise and Government customers are awash with too much data. This data has three demanding characteristics – it is too big (volume), it is accumulating too fast (velocity) and it is located in many location and forms (variety). Solutions today have attempted to find ever better methods of getting the user to the “right” documents. As a result, data scientists and data analysts today are confronted with the dilemma of an ever-increasing need to read to understand. This is an untenable problem.”
I hear the UK department of spooks saying “hear, hear” so I guess we shall see these gentlemen in the room. But I must turn now to welcome a wonderfully exciting player, which, like Pingar, seems to have emerged at the right place at the right time. In 1985 I became a founder member of the Space Society. This could have been my recognition of the vital task of handling remotely sensed data, or the alluring nature of the Organizing Secretary who recruited me. She moved on, and so did I, ruefully reflecting that no software environment yet existing could handle the terabytes of data that poured from even the early satellites. Now we have an order of magnitude more data, but at last practical solutions like SpaceCurve (www.spacecurve.com) from Seattle. Here is the conversation we all wanted then: pattern recognition systems, looking at parallel joins between distributed systems and indexing geospatial polygons… working on multi-dimensional, temporal, geospatial data, data derived from sensors, and analysis of social graphs. Now, if I thread together the third of the words on their website that I understand, I perceive that large scale geospatial has its budding solutions too, and its early clients have been in commodities (the goal of all that geospatial thinking years ago) and defense. Of course.
So I hope to see them filling in their applications as well. In the meanwhile, I shall study hard and seek to produce in the next few months a more detailed analysis of each. But in the meanwhile, if you are gloomy about the ability of the great information companies to survive the current firestorm of Change, reflect on this. Three of my six – Palantir, Treato and SpaceCurve – share a common investor in Reed Elsevier Ventures. They should take a bow for keeping their owners anchored within the framework of change, and making them money while they do it.
Mar
28
Abundance and Scarcity
Filed Under B2B, Big Data, Blog, Financial services, healthcare, Industry Analysis, internet, online advertising, Publishing, semantic web, social media, Uncategorized, Workflow | 1 Comment
I sat down to write a glowing note on the Digital Science conference at London’s glorious Royal Institution last night. “Inventing the Future” was a huge success and underlined the creative quality of the debate on the digital future in this city. As I stared ruminatively at my blank screen, an alert crossed it: Emap have decided to split themselves into three parts, to be called (no, I am not kidding) Top Right Group (something to do with graphs?) for the whole outfit, i2i Events for the (you guessed it!) events division, 4C Group for the information division (“Fore-see”, geddit?), and, triumphantly, EMAP Publishing for the magazines. Given that they did not waste any of that expensive rebranding budget on the magazines we can guess that this lot are for sale first (though a rumour today also gives that honour to the CAP automotive data unit). The best guess is that everything is for sale, and some reports are already citing advisory appointments in a variety of places.
Meanwhile, the philosophers of the night before had been talking of the very nature of the digital, networked society. Their threnody was “Open”. JP Rangaswami, Chief Scientist at Salesforce.com (I have heard this man twice in a week and would be happy to go again for more tomorrow) set the tone. We have to realize that the network has turned our media picture on its head. Now we have to understand the ways in which consumers are re-using and reshaping content. The social networks are ways of amplifying and diminishing those responses, filtering and distilling them. The publisher’s role is to get out of the way – this is not a push world anymore, but act as a distributor and reproducer of excellence without doing harm or trying to outbid the creativity of endusers. Stian Westlake of NESTA, looking at this from a policy viewpoint, saw the need to rebalance the investment, to innovate in areas of strength like the UK financial services markets, and to make education fit the requirement of a networked economy. As JP said, re-quoting Stewart Brand “information wants to be free”. We have it in abundance, while we have scarce resources for shaping and forming it as users want it, and enabling them to do that in their own contexts.
It turns out, of course, that some of the data we want is held by government. The third speaker was Professor Nigel Shadbolt, Professor of AI at Southampton, Director of the new Open data Institute, and Sir Tim Berners Lee’s vice-gerent and apostolic delegate to the UK government’s Open Data programme here on earth. He mercifully skated across the difficulties of getting governments to do what they have said they will do, while pointing out that despite the fad of Big Data, linked data was now a vital component at all levels, big and small, in delivering the liberating effect of making compatible data available for remixing. With these three speakers we were in the magic territory of platform publishing. Here it was unthinkable not to promulgate your APIs. Here was a collaborative world of licensing and data sharing. Here was a vision of many of the things we shall be doing to to create a data-driven world in the networks for the net benefit of all of us.
And then I read the EMAP announcement, and it brings home the way in which the present and the future are pulling apart radically at the moment. No one looked at the EMAP holdings through the eyes of customers, buyers, or users. Channel and format, the classifications of the past, are the only way that current managers can see their businesses. So we divide into three channels what needed to be seen as a platform environment, created by ripping out all the formats and making all of the data neutral and remixable in any context. So the building and construction marketplace at EMAP, which has magazines, data and events (events – the greatest source of data yet discovered on earth), becomes a way of shaping and customizing content for users large and small, directed by them and driven by their requirements. But the advisors cannot understand anything but ongoing businesses, the strategy has no place in the IM, the McGraw-Hill failure to do this at Dodds and Sweets is not encouraging, so we divide the stuff into parcels that can be sold, and sell it off at small portion of its worth, while blaming the technology that could save it for “disrupting” it to death.
Maybe this is right. Maybe the old world has to be purged before the new one takes over. Maybe we have to go through the waste of redundancies, the dissipation of content, the loss of continuity with users/readers/customers before they are able to show us once again what we really should be doing. But now, when we know so much about “inventing the future” this seems a very rum way of proceeding. Incidentally, last night’s conference host, Digital Science, is a very exciting Macmillan start-up whose business it is to invest in software developed by users in science research to support their work. Truly then a new player with more than a whiff of the zeitgeist of this conference in its nostrils. Those of us with long memories remember an older Macmillan, however. One that owned the Healthcare and nursing magazine market, and lapped up the jobs advertising cream in the days when users (or the NHS), could not use the web as an advertising environment. So Macmillan sold its magazine division before the advertising crash – to EMAP. It is people, decisions and the choices made by users that change things. It is hardly new to note that lack of a tide table can create serious risk of drowning, but it could be true.
« go back — keep looking »