We have had points of inflection and points of sustainability. Time then to propose a new “point”, one which applies universally throughout the world of information services and solutions, but which I found last week dramatically illustrated in the world of STM. Just as in the early years of the networked world we observed a point of disintermediation, at which the network effect removed real world service entities and made process cheaper and quicker, so we can now see places where re-intermediation is required, introducing a new service layer to streamline the network’s own inefficientcies, moving cost to a different place, but still often reducing it while increasing network efficientcies. And once this re-layering has taken place, the point of utility satisfied, and opportunity is created for major increases in the collection of neutral data about how the network processes in question work, and from this derives still greater efficiencies.

The ideal illustration of this is Mendeley (www.mendeley.com) and I listened with great appreciation last week when its co-founder, Jan Reichelt, described the problem which he and his colleagues had set out to solve. The inability to describe PDFs containing scholarly articles arose from the nature of network access and storage. Academic buyers knew what they had bought when they acquired it, but since the PDF envelope had no external description of what it contained, it was hard to cross search one’s collected downloads or manage a research project’s collection. And publishers, being publishers, adopted different practises regarding use of metadata, or even numbering systems. And as sources of grey literature, of files of evidence, or collections of abstracts became involved, local content access and control became a major overhead. Jan and his colleagues knew – only a few years ago they were researchers themselves.

So Mendeley was launched as an environment to regularize this, and to ensure that academics are able to better exploit the acquisitions that they have made. As a result those primary drivers of network efficientcy can be accomplished – saving money, making better decisions, and ensuring cost-effective compliance. This Point of Utility exploitation then has certain network knock-on effects. The service, like Mendeley, becomes an important part of the navigation of end users, and indeed may become part or the base for the user access dashboard. Once the Point of Utility has become an interface, then it is able to pick up all sorts of feedback data from the way end users act through the interface. This data about workflow will indicate usage and popularity, the common processes that users employ in discovery, the way in which resources in the system relate to each other, and the subjects that researchers really search (as distinct from the disciplines that journal editors think they subscribe to). Once this activity gets underway then the new interface owner can begin to suggest workflow improvements, and resell across the market the high value data which derives from the actual patterns of usage. There is a Point of Utlity in every network environment and Mendeley, through their knowledge of researcher proclivities, have camped on one of these exciting fault-lines in STM.

Some of these opportunities arise from publisher activity – lack of collaboration, lack of standardization, lack of knowledge about what happens to articles post-sales – and some from the same features in the user community. This is not a blame game. Mendeley has taken the initiative and must be welcomed as one of the foremost workflow players in the sector, especially since the launch of the Mendeley Institutional Edition last month, which takes the intermediary role into the the academic library, in conjunction with and powered by Swets, who have quickly grasped the point. This, as well as exporting the Mendeley API (http://www.mendeley.com/blog/design-research-tools/winners-of-the-first-binary-battle-apps-for-science-contest/), will turn fast growth into a torrent: Mendeley already have backlogs despite having captured 160 million references for 1.5 million users. Some publishers (Springers’s short article previews) clearly get it – others, as ever in this sector, plainly adopt a policy of “this too will pass”).

But of course it will not. Far from returning to normal , the science knowledge market is in a ferment. Visit Dryad (http://datadryad.org/) and go to Giga Science  (http://www.gigasciencejournal.com/) and observe the impact of Big Data on this sector. My friend Mark Ware, in an excellent note for Outsell (https://clients.outsellinc.com/insights/?p=11693), has given chapter, verse and analysis. Content control in the scientist’s workflow is becoming a multiple media nightmare. Does the metadata accurately describe the contents of video diaries and observations, or their audio equivalent? Can we source the data behind that report, and do our own analysis? How many unpublished, unreported studies have validated these results? What has been said and at which conferences about how far down the track this research team has gone? Where do we find the right mix of experience to staff this next enquiry? Regardless of its peer reviewed status, who actually used this work – and, if they did not use it, what did they rely upon instead? Mendeley is a promising beginning, but there is a long road ahead. Stefan Glanzer (Last.fm) and Alejandro Zubillaga (lately head of Warner Digital Music – where he must have seen parallel problems) put in the seedcorn and should be congratulated. They have a real start-up (my spirits rise when I visit an office that has its bike racks and its table football in the foyer!) with the wind behind it.

One last check on where that wind is blowing. Visit ResearchGate (www.researchgate.net) and look at the ways in which scientists are beginning to indulge in meaningful social networking. I have been told for 25 years that scientists and academics are too competitive to use social networking. Like much of the received wisdom of the pre-networked society, this is at best a half truth. The whole truth is that there are no longer simple generalizations that hold true about researcher behaviour. That is why they flock so quickly to the Point of Utility.

“Keep it Simple, Stupid” was an acronym I brought home from the first management course I ever attended yet it has taken me years to find out what it really means. There are, clearly, few things more complex than simplicity, and one man’s “Simple” is another man’s Higgs Boson. So I was very energised to have a call last week from an information industry original who has been offering taxonomy and classification services to the information marketplace since 1983. When I first met Ross Leher in the late 1980s we were both wondering how far we would have to go into the 1990s until information providers recognized that they needed high quality metadata to make their content discoverable in a networked world. Ross had sold his camera shop to take the long bet on this, but he worked at his new cause with a near religious persuasion, as I realised when I went to see him in the 1990s at his base in Denver, Colorado. Denver at that time was home to IHS, whose key product involved researching regulatory material from a morass of US government grey literature. Denver people did metadata. It was a revolution waiting to happen.

So when I heard his voice on the phone last week my first emotion was relief – that he had not simply given up and retired to Florida – and then agreement. Yes, we were 15 years too early. And many of the people we thought were primary customers, like the Yellow Page companies and the phone books and the industrial directories – are now either dead or dying, or in the trauma of complete technological makeover. Ross’s company, WAND Inc (www.wandinc.com) is now very widely acknowledged as a market leading player in horizontal and multi-lingual taxonomy and classification development. They are the player you go to if you have to classify content, if you are in a cross-over area between disciplines (he has a great case study around taxonomies for medical image libraries), and if you have real language problems (“make this search work just as effectively in Japanese and Spanish”). What they do is really simple.

Your taxonomy requirement is going to start with broad terms that define your content and its area of activity. These can then be narrowed and specified to give additional granularity in any specific field. These classifications can be incorporated into the WAND Preferred Term Code, given a number, and used in a programmatic, automated way to classify and mark up your content (www.datafacet.com). Preferred terms can be matched to synonyms, and the codes can be used to extend the process to very many different languages. So someone whose company, for example, was created in Spanish can be found in the same list as someone who has a Japanese outfit, as the result of a search made by a Chinese user working in Chinese.

And from synonyms we can extend the process  to extended terms themselves, and then map the WAND system to third party maps – think of UNSPSC, Harmonized Codes or NAICS, as well as those superficial and now dwindling Yellow Page classifications. WAND can isolate and list attributes for a term, and can then add brand information. All of these activities add value to commoditized data, and one would think that the newspaper industry at least would have been deep into this for 15 years. Yet few examples – Factiva is an honourable example – exist which demonstrate this.

Not the least interesting part of Ross’s account of the past few years was the interest now shown by major enterprize software and systems players in this field of activity. Reports from a variety of sources (IDC, Gartner) have high-lighted the time being wasted in  internal corporate search. Both Oracle and Microsoft have metadata initiatives relevant to this, and it still seems to me more likely that Big Software will see the point before the content industry itself. With major players like Thomson Reuters (Open Calais) deeply concerned about mark-up, there are signs that an awareness of the role of taxonomy is almost in place, but as the major enterprize systems players bump and grunt competitively with the major, but much smaller, information services and solutions players, I think this is going to be one of the competitive areas.

And there is a danger here. As we talk more and more about Big Data and analytics, we tend to forget that we cannot discard all sense of the component added value of our own information. We know that our content is becoming commoditized, but that is not improved by ignoring now conventional ways of adding value to it. We also know that the lower and more generalized species of metadata are becoming commoditized; look for instance at the recent Thomson Reuters agreement with the European Commission to widen the ability of its competitors to utilize its RICs equity listings codes. This type of thing means that, as with content, we shall be forced to increase the value we add through metadata in order to maintain our hold on the metadata – and content – which we own.

And, one day, the only thing worth owning – because it is the only thing people search and it produces most of the answers that people want – will be the metadata itself. When that sort of sophisticated metadata becomes plugged into commercial workflow and most discovery is machine to machine and not person to machine we shall have entered a new information age. Just let us not forget what people like Ross Leher did to get us there.

 

« go backkeep looking »