Apr
25
Link Arms for Linked Data
Filed Under B2B, Big Data, Blog, data protection, Education, Financial services, healthcare, Industry Analysis, internet, mobile content, news media, Reed Elsevier, Search, semantic web, social media, Thomson, Uncategorized, Workflow | Leave a Comment
Now we are entering the post-competitive world (with a few exceptions!) it is worth pausing for a moment to consider how we are going to get all of the content together and create the sources of linked data which we shall need to fuel the service demand for data mining and data extraction. Of course, this is less of a problem if you are Thomson Reuters or Reed Elsevier. Many of the sources are relationships that you have had for a long time. Others can be acquired: reflect on the work put in by Complinet to source the regulatory framework for financial services prior to its acquisition by Thomson Reuters, and reflect that relatively little of this data is “owned” by the service provider. Then you can create expertise and scale in content sourcing, negotiating with government and agency sources, and forming third party partnerships (as Lexis Risk Management did with Experian in the US). But what if you lack these resources, find that source development and licensing would create unacceptable costs, but still feel under pressure to create solutions in your niche which will reflect a very much wider data trawl than could be accomplished using your own proprietory content?
The answer to this will, perhaps, reflect developments already happening in the education sector. Services like Global Grid for Learning, or the TES Connect Resources which I have described in previous blogs give users, and third party service developers (typically teacher’s centres or other “new Publishers”) the ability to find quality content and re-use it, while collaborations like Safari and CourseSmart allow customization of existing textbook products. So what sort of collaborations would we expect to find in B2B or professional publishing which would provide the quarries from which solutions could be mined? They are few and far between, but, with real appreciation for the knowledge of Bastiaan Deblieck at TenForce in Belgium, I can tell you that they are coming.
Lets first of all consider Factual Inc (www.factual.com). Here are impeccable credentials (Gil Elbiaz, the founder, started Applied Semantics and worked at Google) and a VC-backed attempt to corner big datasets, apply linkage and develop APIs for individual applications. The target is the legion of mash-up developers and the technical departments of small and medium sized players. Here is what they say about their data:
“Our data includes comprehensive Global Places data, with over 60MM entities in 50 countries, as well as deep dives in verticals such as U.S. Restaurants and U.S. Healthcare Providers. We are continually improving and adding to our data; feel free to explore and sign up to get started!
Factual aggregates data from many sources including partners, user community, and the web, and applies a sophisticated machine-learning technology stack to:
- Extract both unstructured and structured data from millions of sources
- Clean, standardize, and canonicalize the data
- Merge, de-dupe, and map entities across multiple sources.
We encourage our partners to provide edits and contributions back to the data ecosystem as a form of currency to reduce the overall transaction costs via exchange.”
As mobile devices proliferate, this quarry is for the App trade, and here is, in the opinion of Forbes (19 April 2012), another Google in potential in the field of business intelligence (http://www.forbes.com/sites/danwoods/2012/04/19/how-factual-is-building-an-data-stack-for-business/2/).
But Los Angeles is not the only place where this thinking is maturing. Over in Iceland, now that the banking has gone, they are getting serious about data. DataMarket (http://datamarket.com), led by Hjalmar Gislason from a background of startups and developing new media for the telco in Iceland, offers a very competitive deal, also replete with API services and revenue sharing with re-users. Here is what they say about their data:
“DataMarket’s unique data portal – DataMarket.com – provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The portal allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.
DataMarket’s data publishing solutions allow data providers to easily publish their data on DataMarket.com and on their existing websites through embedded content and branded versions of DataMarket’s systems, enabling all the functionality of DataMarket.com on top of their own data collections.”
And finally, in Europe we seem to take a more public interest-type view of the issues. Anyway, a certain amount of impetus seems to have come from the Open Data Foundation, a not-for-profit which also has a connection and has helped to stimulate sites like OpenCharities, OpenSpending (how does your government spend your money?), and OpenlyLocal, designed to illuminate the dark corners of UK local and regional government. All of these sites have free data, available under a creative commons-style licence, but perhaps the most interesting, still in beta, is OpenCorporates. Claiming to have data on 42,165,863 companies (as of today) from 52 different jurisdictions is is owned by Chrinon Ltd, and run by Chris Taggart and Rob McKinnon, both of whom have long records in the Open data field. This will be another site where the API service (as well as a Google Refine service) will earn the value-add revenues (http://api.opencorporates.com/). Much of the data is in XML, and this could form a vital source for some user and publisher generated value add services. The site bears a recommendation from the EC Information Society Commissioner, Nelly Kroes, so we should also record that TenForce (http://www.tenforce.com/) themselves are leading players in the creation of the Commission’s major Open Data Portal, which will progressively turn all that “grey literature, the dandruff of bureaucracy, back into applicable information held as data.
We seem here to be at the start of a new movement, with a new range of intermediaries coming into existence to broker our content to third parties, and to enable us to get the licences and services we need to complete our own service developments. Of course, today we are describing start-ups: tomorrow we shall be wondering how we provided services and solutions without them.