Now we are entering the post-competitive world (with a few exceptions!) it is worth pausing for a moment to consider how we are going to get all of the content together  and create the sources of linked data which we shall need to fuel the service demand for data mining and data extraction. Of course, this is less of a problem if you are Thomson Reuters or Reed Elsevier. Many of the sources are relationships that you have had for a long time. Others can be acquired: reflect on the work put in by Complinet to source the regulatory framework for financial services prior to its acquisition by Thomson Reuters, and reflect that relatively little of this data is “owned” by the service provider. Then you can create expertise and scale in content sourcing, negotiating with government and agency sources, and forming third party partnerships (as Lexis Risk Management did with Experian in the US). But what if you lack these resources, find that source development and licensing would create unacceptable costs, but still feel under pressure to create solutions in your niche which will reflect a very much wider data trawl than could be accomplished using your own proprietory content?

The answer to this will, perhaps, reflect developments already happening in the education sector. Services like Global Grid for Learning, or the TES Connect Resources which I have described in previous blogs give users, and third party service developers (typically teacher’s centres or other “new Publishers”) the ability to find quality content and re-use it, while collaborations like Safari  and  CourseSmart allow customization of existing textbook products. So what sort of collaborations would we expect to find in B2B or professional publishing which would provide the quarries from which solutions could be mined? They are few and far between, but, with real appreciation for the knowledge of Bastiaan Deblieck at TenForce in Belgium, I can tell you that they are coming.

Lets first of all consider Factual Inc (www.factual.com). Here are impeccable credentials (Gil Elbiaz, the founder, started Applied Semantics and worked at Google) and a VC-backed attempt to corner big datasets, apply linkage and develop APIs for individual applications. The target is the legion of mash-up developers and the technical departments of small and medium sized players. Here is what they say about their data:

“Our data includes comprehensive Global Places data, with over 60MM entities in 50 countries, as well as deep dives in verticals such as U.S. Restaurants and U.S. Healthcare Providers. We are continually improving and adding to our data; feel free to explore and sign up to get started!

Factual aggregates data from many sources including partners, user community, and the web, and applies a sophisticated machine-learning technology stack to:

  1. Extract both unstructured and structured data from millions of sources
  2. Clean, standardize, and canonicalize the data
  3. Merge, de-dupe, and map entities across multiple sources.

We encourage our partners to provide edits and contributions back to the data ecosystem as a form of currency to reduce the overall transaction costs via exchange.”

As mobile devices proliferate, this quarry is for the App trade, and here is, in the opinion of Forbes (19 April 2012), another Google in potential in the field of business intelligence (http://www.forbes.com/sites/danwoods/2012/04/19/how-factual-is-building-an-data-stack-for-business/2/).

But Los Angeles is not the only place where this thinking is maturing. Over in Iceland, now that the banking has gone, they are getting serious about data. DataMarket (http://datamarket.com), led by Hjalmar Gislason from a background of startups and developing new media for the telco in Iceland, offers a very competitive deal, also replete with API services and revenue sharing with re-users. Here is what they say about their data:

“DataMarket’s unique data portal – DataMarket.com – provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The portal allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

DataMarket’s data publishing solutions allow data providers to easily publish their data on DataMarket.com and on their existing websites through embedded content and branded versions of DataMarket’s systems, enabling all the functionality of DataMarket.com on top of their own data collections.”

And finally, in Europe we seem to take a more public interest-type view of the issues. Anyway, a certain amount of impetus seems to have come from the Open Data Foundation, a not-for-profit which also has a connection and has helped to stimulate sites like OpenCharities, OpenSpending (how does your government spend your money?), and OpenlyLocal, designed to illuminate the dark corners of UK local and regional government. All of these sites have free data, available under a creative commons-style licence, but perhaps the most interesting, still in beta, is OpenCorporates. Claiming to have data on 42,165,863 companies (as of today) from 52 different jurisdictions is is owned by Chrinon Ltd, and run by Chris Taggart and Rob McKinnon, both of whom have long records in the Open data field. This will be another site where the API service (as well as a Google Refine service) will earn the value-add revenues (http://api.opencorporates.com/). Much of the data is in XML, and this could form a vital source for some user and publisher generated value add services. The site bears a recommendation from the EC Information Society Commissioner, Nelly Kroes, so we should also record that TenForce (http://www.tenforce.com/) themselves are leading players in the creation of the Commission’s major Open Data Portal, which will progressively turn all that “grey literature, the dandruff of bureaucracy, back into applicable  information held as data.

We seem here to be at the start of a new movement, with a new range of intermediaries coming into existence to broker our content to third parties, and to enable us to get the licences and services we need to complete our own service developments. Of course, today we are describing start-ups: tomorrow we shall be wondering how we provided services and solutions without them.

 

The 14th Fiesole Retreat for academic librarians, publishers and researchers continues to provide an accurate guage of the direction and rate of change. Looking down over Florence from the European University Institute is to be reminded that renaissance and reformation come to all who wait, only in the digital world they come quicker. So the conference agenda had librarians morphing into anything but librarianship, publishing defending the indefensible, and scholarship apparently rooted in the minds of both as pursuing a very narrow track of priorities and activities. While a new world was clearly waiting in the wings, we were all reluctant to signal the Last Post. And that’s just the problem with these civilized events – they are so civilized!

Bruno Racine, French cultural politician and Director of the Bibliotheque Nationale de France, set the style from the kick-off. In an untroubled world, his great priorities, alongside building greater audio collections and newspaper archives, were developing the great French Gallica collection and furthering the cause of Europeana. Like a bandsman on the Titanic, so much in our media minds this week, this sounded like an invocation to keep playing. Fortunately the untroubled water was soon disturbed by Carol Tenopir, quiet revolutionary of many years standing, who started to throw some hand grenade facts into the water. Did we know how completely the scholars had deserted the library? Well, we do now. In a world where between 78 and 88 per cent of articles read are read digitally, 62% of those readings are in the laboratory, 26% at home and 10% while travelling. Only 2% are conducted on library premises. As each subsequent librarian presentation began with a picture of ever newer and more lavishly appointed buildings, one deep psychological gap yawned open. The scholars have gone nomadic, but the services that support them are rooted in expensive real estate.

But not always. In a brilliant demonstration of how lateral thinking is not confined to certain roles or age groups, Sylvia van Petegham, Chief Librarian at the University of Ghent, talked about relocating her library, or, rather her users MyLibrary, in the Cloud. She underlined the importance of the Amazon announcement on CloudSearch (I have a fantasy of my grandchildren saying that I am so old that I could remember when Amazon was a bookseller!). She spoke of what her team had learned through the Los Alamos SharedCanvas experimentation, and she emphasized many times the collaborative nature of the whole enterprize. In fact, when her conclusions emerged as “provide detailed metadata for free; publish for machines; create stable and durable links and URLs “I knew that I was listening to a publishing presentation after all. She said that when she first saw what Google could do “I became a Humble Librarian”. I find that very affecting, but not wholly true. I suspect that she found at that moment that the professional divisions of the real world had fallen away, and it was perfectly permissable for anyone to do anything now. Clear the way, we need to find this lady a place in the Titanic lifeboats right now!

But in some ways Sylvia’s theme had already been established. Deanna Marcum, now running Ithaka S+K after her years running the Library of Congress, got us thinking about Knowledge Navigators, and the importance of capturing the art and lore of collections specialists before it was lost. Mike Sweet of Credo had reminded the preconference that there is nothing wrong with discovery services that cannot be fixed in the reference layer, and Alix Vance of GeoScience World alongside Fiona Murphy of Wiley illustrated the collaborative nature of niche content provision. But it was one of the questions that triggered a key idea: do we Brand library services successfully? Then I knew that a prognostication of the first Fiesole meeting that I attended 12 years ago was becoming true: librarians were becoming publishers, but what on earth would publishers become?

If the presentations of Blaise Simque and Stephen Barr, respectively CEO and International President of Sage were anything to go by, then the answer would be “Really Nice People”. And responsible executives moving along the track of providing users with what they apparently want – several Open Access options, plenty of scope in pricing models to deal with individual or small scale users, quality peer review, and grateful authors willing to be interviewed on video expounding the importance of having risk capital available to support new journals. No trace here then of the facile commentary in last week’s Economist on journal publishing margins (for which that worthy journal should be deeply ashamed). Or of the price-gouging, excessive profitability commentary which has marked comment on this sector this year. Sara McCune Miller sold her air-con unit for 500$ USD in 1965 to found Sage, and has left the company in trust to three charities. The problem is not here at all. It lies in the formats to which companies like Sage have become subject (journal, article etc) and the necessity to keep the present business model going until a new one can be put in its place. And while we all pay obeisance to the primacy of the research article, do we not sometimes fear its commoditization? What happens when Mendeley or ReadCube become the interfaces of choice – less full text reading, better current awareness, more visualization? And a powerful diminution of quality control exercised by peer review as the only indicative guideline to quality itself? We are on the very thin edge of a very long wedge.

But publishing is relatively easy to do and offers low barriers to entry. Later on in the agenda Svante Kristensson, Director of Sweden’s Boras University library , showed what a creative publisher can do online with collections that demand the full scope of digital resources – the Swedish School of Textiles. And Gino Roncaglio of Tuscia University demonstrated how layering enables more productive scholarly eBooks – and eLibraries. As we came to a giddy end I reflected that the challenge of the linked data world has not yet fully sunk in – but that a good number of librarians are as close to identifying the range of user expectations in the network as their publishing colleagues are. But for the researchers in the last year who have spoken to me and doubtless many others of their need to discover quickly sources of unpublished articles which confirm experimental results, or find and use data underlying published experiments, or obtain lab videos on procedures, or to get updates on compliance and best practice procedures I have no answers. The problem is, as Carol Tenopir reveals, that they are all at home or in the lab researching.

The Fiesole Retreats, which only meet in Fiesole every few years, are wonderful wherever they meet and always cast light into the gloom in the way that small meetings usually do. The Charleston Company and Casalini Libri started all this: to them goes the honour of a lifeboat all to themselves.

« go backkeep looking »