Link Arms for Linked Data

Filed Under B2B, Big Data, Blog, data protection, Education, Financial services, healthcare, Industry Analysis, internet, mobile content, news media, Reed Elsevier, Search, semantic web, social media, Thomson, Uncategorized, Workflow | Leave a Comment

Now we are entering the post-competitive world (with a few exceptions!) it is worth pausing for a moment to consider how we are going to get all of the content together and create the sources of linked data which we shall need to fuel the service demand for data mining and data extraction. Of course, this is less of a problem if you are Thomson Reuters or Reed Elsevier. Many of the sources are relationships that you have had for a long time. Others can be acquired: reflect on the work put in by Complinet to source the regulatory framework for financial services prior to its acquisition by Thomson Reuters, and reflect that relatively little of this data is “owned” by the service provider. Then you can create expertise and scale in content sourcing, negotiating with government and agency sources, and forming third party partnerships (as Lexis Risk Management did with Experian in the US). But what if you lack these resources, find that source development and licensing would create unacceptable costs, but still feel under pressure to create solutions in your niche which will reflect a very much wider data trawl than could be accomplished using your own proprietory content?

The answer to this will, perhaps, reflect developments already happening in the education sector. Services like Global Grid for Learning, or the TES Connect Resources which I have described in previous blogs give users, and third party service developers (typically teacher’s centres or other “new Publishers”) the ability to find quality content and re-use it, while collaborations like Safari and CourseSmart allow customization of existing textbook products. So what sort of collaborations would we expect to find in B2B or professional publishing which would provide the quarries from which solutions could be mined? They are few and far between, but, with real appreciation for the knowledge of Bastiaan Deblieck at TenForce in Belgium, I can tell you that they are coming.

Lets first of all consider Factual Inc (www.factual.com). Here are impeccable credentials (Gil Elbiaz, the founder, started Applied Semantics and worked at Google) and a VC-backed attempt to corner big datasets, apply linkage and develop APIs for individual applications. The target is the legion of mash-up developers and the technical departments of small and medium sized players. Here is what they say about their data:

“Our data includes comprehensive Global Places data, with over 60MM entities in 50 countries, as well as deep dives in verticals such as U.S. Restaurants and U.S. Healthcare Providers. We are continually improving and adding to our data; feel free to explore and sign up to get started!

Factual aggregates data from many sources including partners, user community, and the web, and applies a sophisticated machine-learning technology stack to:

Extract both unstructured and structured data from millions of sources
Clean, standardize, and canonicalize the data
Merge, de-dupe, and map entities across multiple sources.

We encourage our partners to provide edits and contributions back to the data ecosystem as a form of currency to reduce the overall transaction costs via exchange.”

As mobile devices proliferate, this quarry is for the App trade, and here is, in the opinion of Forbes (19 April 2012), another Google in potential in the field of business intelligence (http://www.forbes.com/sites/danwoods/2012/04/19/how-factual-is-building-an-data-stack-for-business/2/).

But Los Angeles is not the only place where this thinking is maturing. Over in Iceland, now that the banking has gone, they are getting serious about data. DataMarket (http://datamarket.com), led by Hjalmar Gislason from a background of startups and developing new media for the telco in Iceland, offers a very competitive deal, also replete with API services and revenue sharing with re-users. Here is what they say about their data:

“DataMarket’s unique data portal – DataMarket.com – provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The portal allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

DataMarket’s data publishing solutions allow data providers to easily publish their data on DataMarket.com and on their existing websites through embedded content and branded versions of DataMarket’s systems, enabling all the functionality of DataMarket.com on top of their own data collections.”

And finally, in Europe we seem to take a more public interest-type view of the issues. Anyway, a certain amount of impetus seems to have come from the Open Data Foundation, a not-for-profit which also has a connection and has helped to stimulate sites like OpenCharities, OpenSpending (how does your government spend your money?), and OpenlyLocal, designed to illuminate the dark corners of UK local and regional government. All of these sites have free data, available under a creative commons-style licence, but perhaps the most interesting, still in beta, is OpenCorporates. Claiming to have data on 42,165,863 companies (as of today) from 52 different jurisdictions is is owned by Chrinon Ltd, and run by Chris Taggart and Rob McKinnon, both of whom have long records in the Open data field. This will be another site where the API service (as well as a Google Refine service) will earn the value-add revenues (http://api.opencorporates.com/). Much of the data is in XML, and this could form a vital source for some user and publisher generated value add services. The site bears a recommendation from the EC Information Society Commissioner, Nelly Kroes, so we should also record that TenForce (http://www.tenforce.com/) themselves are leading players in the creation of the Commission’s major Open Data Portal, which will progressively turn all that “grey literature, the dandruff of bureaucracy, back into applicable information held as data.

We seem here to be at the start of a new movement, with a new range of intermediaries coming into existence to broker our content to third parties, and to enable us to get the licences and services we need to complete our own service developments. Of course, today we are describing start-ups: tomorrow we shall be wondering how we provided services and solutions without them.

Apr

9

The Lost Chord

Filed Under B2B, Big Data, Blog, Financial services, healthcare, Industry Analysis, internet, Publishing, Reed Elsevier, Search, semantic web, STM, Thomson, Uncategorized, Workflow | 1 Comment

As we in the information services market start to get our thinking right about the influence of Big Data and our current obsession with workflow, then I am beginning to think that we will need to revise our whole approach to collaborative working in marketplaces. At the moment we are playing all the old tunes but none of them seem to quite fit the customer mood. Like that old vaudeville star, Jimmy “Schnozzle” Durante, we need to tinkle those ivories again and again until we find it. The Lost Chord!

So here is a sample of my keyboard doodling. I reason that we cannot “productize” information services for ever. Our customers are now too clever, and as we open our APIs and let them self-actualize or customize, we face real dangers. At the top end of most markets in most sectors the top 10 customers are well-equipped at the skills level, and are surrounded by systems integrators who can service them expensively but effectively. And amongst the medium and small enterprizes in our client base, the cost of doing anything but allow them to customize for themselves is prohibitive. And we are sitting in the middle of this, talking passionately about selling solutions and always seeking stickiness, while our client base shows dangerously independent tendencies.

There are two answers. We could sell less. Just licence everything, put the APIs in place, let the user community get on with it. For me, this is like sleep-walking on a cliff edge. Our only potent quality as service providers has been our knowledge of what users do with our data and how they work. Make the relationship one of pure licencing and we cut off the feedback loop and isolate ourselves from the way in which workflow software is being tweaked and refined, and the way our data grows, or diminishes, in importance as a result. Or we could go to the opposite extreme, way past the current middle ground where we build “solutions” and customers adopt and install them as applications, with all the difficulties described above. The “opposite extreme” is equally difficult, but at least keeps us in the game.

So what is the opposite extreme? Simply this: that we go on building solutions, but we increasingly customize them for our major customers, working in partnership with systems integrators and our software solution partners whose Big Data environment, or analytics, or data mining is part of the key to our service specification. Setting up our own systems integration, by alliance or as an in-house installation, could be vital to our ability to stay sticky, to bring the client’s own data and resources into play, and to learn where the market is going to go. I hear cries of “We are a content company, not a software house!”. Not so for the major players in B2B and STM, who have been fully invested in software for five years or so, and are more likely these days to buy a tool-set than a data-set. Much more cogent are the protests of those who do not want to get into ownership of major pieces of systems software: the answer there is strategic alliance. Discussing the pharma market the other day, where size is very important, I found myself advocating approaches to major customers for outsourcing large areas of non-research process which offered real productivity gains to the user, and gave the services solutions player and his systems software partner the ability to work inside the firewall and grow with the client need.

There may be 1000 major global clients across all verticals with whom this approach would work. It certainly works in government and financial services, traditionally the targets of the major players in Big Data software. But it again exposes two new problems. It leaves the bulk of the market behind in medium and small players unable to afford this type of soup-to-nuts solutioning. This, again, is a real opportunity for solution packaging with a systems integrator, either externally or internally to the content player. This will enable 3-5 year contracts with upgrades, data updating and maintenance. And in some instances integration will go further and permit scaled down custom solutions that parallel what the major players are doing. The trick will be to start by seeking to sell in the standard integration package, and then respond to the smaller customer’s need for customization. And there is a market of small players and consortia where this type of solutioning has been working for some time. Its Education, and the service area to watch is Pearson Learning Solutions.

And the other problem for the bigger data content players? Simply that there are killer whales out there! As the major enterprize software vendors see what is happening, they will feel that this type of solutioning undermines some sacred territory. We see that with Oracle in particular, but also IBM and SAP are always ready to buy on a vast scale. Some of today’s Big Data ex-start-ups, in the 5-10 year old Valley vintages, will be absorbed into these big players, which could be difficult – or an opportunity – if your content solution is tied to that newly acquired player. In fact, if the major content providers are not talking regularly to the mighty enterprize software players about how these worlds come together then they are less smart than I think they are. At the moment, in my experience, some at least of the enterprize software players are saying “We should probably buy some of them – but we have no experience of managing content.” If ever you find yourself saying “I never imagined that Springer or Elsevier or Wiley would end up as part of the solutions division at Oracle” then I hope that you will recall an article that went right to that point. And at least that would integrate all access at all points!

« go back — keep looking »

Apr

25

Link Arms for Linked Data

Apr

9

The Lost Chord

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin