Big Data: Six of the Best

Filed Under B2B, Big Data, Blog, data protection, Financial services, Industry Analysis, internet, privacy, Publishing, Reed Elsevier, Search, semantic web, social media, Thomson, Uncategorized, Workflow | 1 Comment

So the UK government has decided to monitor every tweet and every email and every social network connection, all in the good cause of the greater security of the citizen. While I am up to my eyes in articles defending the civil liberties of the citizen (at least some of whom are more afraid of the police than the terrorists) I see little commentary on the logistics of all of this, and at best guess estimates that owe more to powerful imagination than logistical reason. My mind goes to the software involved, and that prompts a wider question: while we are now familiar with Hadoop and the techniques used by the cloud-based systems of Yahoo!, Google, Amazon and Facebook, what deployable software is there in the market which works at a platform level and interfaces information systems with very large data aggregations on the one side, and user interfaces on the other.

In the media and information services area the obvious answer is MarkLogic (www.marklogic.com). Now a standard for performance in its sector, MarkLogic chose media alongside the government sector as its two key areas of market exposure in the development years. Throughout those years I have worked with them and supported their efforts to “re-platform” the industry. MarkLogic 5.0 is just about as good as it gets for information services going the semantic discovery route, and the testimony to this is installations in differing information divisions in every global and many national information service providers. So when MarkLogic roll out the consultancy sell these days, they do so with almost unparalleled experience of sector issues. I have no prior knowledge, but I am sure that they would be players in that Home Office contract.

Other potential players come from outside the media sector and outside of its concentration on creating third party solutions. In other words, rather than creating a platform for a content holder to develop client-side solutions, their experience is directly with the end-user organization. Scanning the field, the most obvious player is Palantir www.palantir.com). A Palo Alto start-up of the 2004 vintage (Stanford and PayPal are in its genes), this company targetted government and finance as its key starter markets, and has doubled in size every year since foundation. It raised a further $90m in finance in the difficult year of 2010, and informal estimates of its worth are now over $3 billion. It does very familiar things in its ability to cross search structured, unstructured, relational, temporal and geospatial data, and it now seems to be widening its scope around intelligence services, defense, cyber security, healthcare and financial services, where its partner on quant services is Thomson Reuters (QA Studio). This outfit is a World Economic Forum 2012 Tech pick – we all love an award – and as we hurry along to fill in the forms for the UK intelligence service, I expect to find them inside already measuring the living space – and the storage capacity.

My next pick is something entirely different. Have a look at www.treato.com. This service, from First Life Research, is more Tel Aviv than Palo Alto, but it provides something that UK security will be wanting – a beautifully simple answer to a difficult question. Here the service analysed 160,00 US blog sites and health portals comment sections to try to trap down what people said about the drugs they were taking. They have now examined 600 m posts from 23 million patients commenting on 8500 drugs, and the result, sieved through a clinical ontology-based system, is aggregated patient wisdom. When you navigate this, you know that this will have to find a place in evidence-based medicine before too long, and that the global service environment is on the way. In the meanwhile, since the UK National Health Service cannot afford this, lets apply it to the national email systems, and test the old theory that the British only have two subjects, their symptoms and the weather.

We started with two Silicon Valley companies, so it makes sense next to go to New Zealand. Pingar (www.pingar.com) starts where most of us start – getting the metadata to align and work properly. From automating meta tagging to automatic taxonomy construction, this semantic -based solution, while clearly one of the newest players on the pitch, has a great deal to offer. As with the other players I will come back to Pingar in more detail and give it the space it deserves but in the meanwhile I am very impressed by some indicative uses. Its sentiment analysis features will surely come in useful in this Home Office application, as we search to find those citizens more or less likely to create a breach of the peace. If there are few unique features – here or anywhere in these services, then there is a plenitude of tools that can make a real difference. Growing up in the shadow of MarkLogic and Palatir is a good place to be if you can move fast/agile.

But there are others. Also in the pack is Digital Reasoning (www.digitalreasoning.com), Tim Estes’ company from Franklin TN. Their Synthesys product has scored considerable success, in, guess where? The US government. Some analysts see them as Palantir’s closest competitor in size terms, and here is how they define the problem:

“Synthesys is the flagship product from Digital Reasoning that delivers Automated Understanding for Big Data. Enterprise and Government customers are awash with too much data. This data has three demanding characteristics – it is too big (volume), it is accumulating too fast (velocity) and it is located in many location and forms (variety). Solutions today have attempted to find ever better methods of getting the user to the “right” documents. As a result, data scientists and data analysts today are confronted with the dilemma of an ever-increasing need to read to understand. This is an untenable problem.”

I hear the UK department of spooks saying “hear, hear” so I guess we shall see these gentlemen in the room. But I must turn now to welcome a wonderfully exciting player, which, like Pingar, seems to have emerged at the right place at the right time. In 1985 I became a founder member of the Space Society. This could have been my recognition of the vital task of handling remotely sensed data, or the alluring nature of the Organizing Secretary who recruited me. She moved on, and so did I, ruefully reflecting that no software environment yet existing could handle the terabytes of data that poured from even the early satellites. Now we have an order of magnitude more data, but at last practical solutions like SpaceCurve (www.spacecurve.com) from Seattle. Here is the conversation we all wanted then: pattern recognition systems, looking at parallel joins between distributed systems and indexing geospatial polygons… working on multi-dimensional, temporal, geospatial data, data derived from sensors, and analysis of social graphs. Now, if I thread together the third of the words on their website that I understand, I perceive that large scale geospatial has its budding solutions too, and its early clients have been in commodities (the goal of all that geospatial thinking years ago) and defense. Of course.

So I hope to see them filling in their applications as well. In the meanwhile, I shall study hard and seek to produce in the next few months a more detailed analysis of each. But in the meanwhile, if you are gloomy about the ability of the great information companies to survive the current firestorm of Change, reflect on this. Three of my six – Palantir, Treato and SpaceCurve – share a common investor in Reed Elsevier Ventures. They should take a bow for keeping their owners anchored within the framework of change, and making them money while they do it.

Mar

4

Sell the Magazines – NOW!

Filed Under B2B, Big Data, Blog, Cengage, Financial services, Industry Analysis, internet, Publishing, Reed Elsevier, Thomson, Uncategorized, Workflow | 2 Comments

As a Thomson man of the generation of ’67, I was well schooled in the dictum “its not what you buy, but what and when you sell that makes the real difference.”* And having spent almost three decades button-holing anyone who would listen, like some crazed digital ancient mariner, on the importance of building digital presence in B2B publishing and information markets, I should probably be pleased to see headlines in the Financial Times (3 March 2012) heralding the sale of EMAP’s print assets (“Analysts say EMAP faces challenge to move away from print”). But I am not. I know exactly when these print assets should have been sold: in 2002 at the end of the Dotcom Bust. And I cannot persuade myself that a wrong move then will be rectified by a pointless move now, or that value will be added to anything by selling the subscription/advertising print stable at EMAP – or at UBM, or at Haymarket, or Centaur, or Incisive – to someone who is simply going to live on a declining annuity until it expires. There will in any case be few buyers, and those who do appear will not want the stable, but just one or two of the old nags. The analysts who shriek the headline of this piece are simply transaction mongers who have a firmer grip of deal commissions than they do of the current strategic realities of B2B. So lets go back to 2002 and see what has happened after the management of B2B information and publishing and events decided that it was far too early to exit print subscriptions and, like the regional press, the market would come back to them.

By 2005 it was becoming clear that the bits that worked in B2B, outside of events, were information services and solutions. By that year controlled circulation magazines and newsletters, which had proliferated and at times been generated by online at the end of the previous decade began to wilt. Just as in the pre-2005 period we had spoken of VANs and VADs, so we began to talk about “vertical search” (it turned out to be much the same anyway) and started providing tailored information to self-defined users in commerce and industry. We were beginning to experience for the first time what it was going to be like to live in a “networked society/economy”. A small revolution was taking place: managers were beginning to have to find out what their users did for a living and construct solutions around their daily lives. This meant specialization and expertise in particular verticals: managers could no longer be shifted from title to title on the basis that they knew journalists and advertisers and everything else was the same whether you were publishing in machine tools or in ladies fashions.

And then we came to workflow. If we were really entering an information solutions-type world (where Thomson Reuters had already gone in IP and GRC , and Lexis Risk in insurance) then we had to provide our content directly to the desk of the user, sliced so that it modelled his working patterns, and supported by software tools that added value to it and kept us essential to his processes, and thus too important to be lightly discontinued. And how did we plan to earn his trust in this guise? By either inventing a new brand (think Globalspec in engineering) or by using our old print brands to ensure user confidence (think Bankers Almanac at RBI). Never mind that the print which supported those brands had eroded away, since they were there for entirely different reasons.

And now we are laying another layer in digital development on top of all of this. We now talk of Big Data, of using the services we have created for users as a sort of focussing glass so that we can go out from them to the client’s own content and all sorts of other datasets and find linkages through data mining and extraction, squeezing fresh insight all the time into the workflow of users who, wherever they work, have increasingly become, like us, knowledge workers. And our events activities increasingly morph into always-on trading and learning experiences, where we do introduce clients to the range of products and services in the sector, update and inform on new releases to people who have said they want to know, and move increasingly into the training and professional development of the sectors that we have chosen. Do you see where we are going? We are going to be the full service providers to a handful of vertical markets which we feel confident about dominating.

Why are we confident about that domination? Because we have the brands, many of them over a hundred years old in this country, which our verticals were brought up upon. And behind those brands are archival morgues, full of data with residual value in a Big Data sense. We did not sell those brands in 2002 when they were a going concern, so why sell them now when they are a cause for concern. By all means close the print, by all means reconstruct the service values using far less journalists in targeted niche environments online. By all means drive towards areas where you have real data intensity, but on the way remember the community and its existing brand affiliations. You want to take them with you.

Which brings us back round to EMAP. I see no point in hanging on to peripheral services, even data-based services like DeHavilland bought as recently as 2007, if they have no strategic coherence in terms of the markets that give EMAP positions of strength. I take these to be construction, local government, broadcast media and fashion. If strength in automotive cannot be linked to the Guardian’s position in Trader Media, then sell that too. But hold onto brands where they can be used to give community credibility and data where it can give archival searchability. By selling them you get a smaller but more profitable business. And that is also the result of digital network development of the type described here – smaller and more profitable businesses. Just don’t throw away something which is pretty worthless now on its own, but which may be needed on a journey to a much better place.

* Note that the companies that Thomson SOLD in the mid-1980s in the UK form the majority of EMAP and Trinity Mirror today, as well as large chunks of Springer and Infinitas, and elsewhere and afterwards the bulk of Cengage and a big portion of the US regional press. Were they right or not?

« go back — keep looking »

Apr

3

Big Data: Six of the Best

Mar

4

Sell the Magazines – NOW!

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin