In the cold, wet and dark of an English summer it can be hard to remember that elsewhere matters digital are progressing at breakneck speed. In these circumstances spending last week in the US and getting sundry updates from European colleagues was like a punch on the nose. Everything and everyone is getting cleverer, and everything that was once the standard and the value  point is now commoditized. Even the poor old science journal, protected treasure of countless publishers , can now be launched out of a box, by any university, research team or laboratory with an internet connection (http://www.scholasticahq.com/). The commoditized article, meanwhile, realizes new potential if you envisage it not as the final outcome of the science process, but as an interface itself to a deeper understanding of the knowledge pathway.

If I had doubted this then I had a rude shock when looking at what Jan Velterop and his colleagues have been up to at Utopia Docs (http://utopiadocs.com/media/introduction/). In its first manifestations this company was about getting more linkage and metadata value from science articles, and I wrote about it under the heading “I Can See So Clearly Now” (https://www.davidworlock.com/?p=903) in October 2011. Then I tucked it into the semantic publishing cubbyhole, until warned by one of Jan’s colleagues that I was in danger of not seeing very clearly now at all. And he was right. Utopia Docs 2.0 is worth consideration by anyone as an operational interface for lab research users. Different to but just as valid as Mendeley, and indeed incorporating information from that service as it shows users the relationships an article may have – from Altmetric as well as Mendeley, from Cross-Ref as well as Sherpa/RoMEO (best recommendation for Open Access – use this to move from article to article, opening each one without recourse to a source site or permissions form or subscription barrier/validation.) But the real joy is in the way it handles figures, graphs and chemical structures: if all of this referencing, rotation, reconstituting can be done with figures, then connecting in complex datasets should be easy as well. Add some “social” – bookmarks and comments from other readers/users – and some searchable aids like Wikipedia cross references and lab products directories, and  then you can see the huge distance travelled from beta. As we all now contemplate the science desktop and the service interfaces which will dominate it, here is a another real contender.

It seems to me that science and technology are now moving rapidly down this knowledge handling and referencing track. Yet everything done here is fully applicable to B2B in general, and that just because “publishing” was different in these narrow segments, there is no reason why information services should be different at all. Looking at Innovadex (www.innovadex.com) this week, I realised that there are now very few business, industrial or scientific sectors without full service vertical search, hugely well attuned to definable client requirements, with service values attached to front and back end. We all still use Google/Bing et al, but when we have heavy duty knowledge-based work to do, there is usually a specialised can-opener to hand now ready to do the job. And these will begin to coalesce with content as the continuing consolidation of our industry takes place. Step up this week’s consolidation case study: IHS and GlobalSpec.

As one who has long carried a torch for GlobalSpec (www.globalspec.com), I want to congratulate Jeff Killeen and his team on an outstanding job, and Warburg Pincus, who have backed this company since 1996 , for extraordinary foresight and resolve en route to this $136m reward. As someone who knew IHS when they had BRS Search and the biggest and most unwieldy filing cabinet in engineering history, I also want to offer recognition credits to the buyer. This really is a winning solution, in the sense that both of these services together now comprise the complete engineering workflow; that over 10 million design briefs and specifications, and some 50k supplier catalogues, 70 e-newsletters, 15 online shows  and 7 million registered users all provide a huge barrier to entry in this sector; that the barrier is as great in terms of the sector vendors as well as sector buyers; and that none of this was based on technology unique to engineering, but on the tools and analytics that are available to all of us in every segment. And it all took 16 years to mature. And it worked because the difference between the winner and a number of losers was simple: the winners understood better than their competitors how engineers worked, how they communicated and how they solved problems and behaved.

And just time for a footnote along these themes. I was fascinated to see the merger of Yippy Inc (www.yippy.com) with MuseGlobal this week. I have known Muse for many years and admired their advocacy of Cloud-based solutions and their patient pursuit of data virtualization solutions at a time when it was only the spooks and the internal security people who were interested. Yippy, formerly Clusty, has a licence from the Vivismo patent (recently bought by IBM, who own 10% of the new company) for the data clustering vehicle, Velocity. So we are in Data-as-a-Service as well as SaaS country here. And here we locate the other trend line which we must watch with care. In this note we have seen user-based solutions bringing public and private content into intense analytical focus on the desktop; we have seen industrial scale vertical search and content alignment resolve workflow issues for professionals; and here we have data solutions which enable major corporates and institutions to impose their own private order on information and intelligence regardless of source. All of these will deploy in all markets at the same time. The clever game will be second-guessing which prevail in which verticals and in what horizontals of organizational size over what time periods.

This may be the age of data, but the questions worth asking about the market viability of information service providers are no longer about content. They are about what you do to content-as-data as you seek to add value to it and turn it into some form of solution. So, in terms of Pope’s epigram, we could say that the proper study of Information Man is software. Data has never been more completely available. Admittedly, we have changed tack now on the idea that we could collect all that we need and put it into a silo and search it. Instead, in the age of big data, we prefer to take the programme to the data. Structured and unstructured. Larger collectively than anything tackled before the emergence of Google and Yahoo!, and then Facebook, and inspired by the data volumes thrown off by those services. And now we have Thomson Reuters and Reed Elsevier knee deep in the data businesses and throwing up new ways of servicing data appropriate to the professional and business information user. So shall we in future judge the strategic leadership of B2B, STM, financial services or professional information services companies by what they know about the decisions they need to make about implementing which generation of what software to have what strategic effect on their marketplaces? I hope not, since I fear that like me they may be found wanting.

And clearly having a CTO but not having the knowledge of the right questions to ask him, or what the answers mean is not sufficient either. In order to get more firmly into this area myself I wrote a blog last month called “Big Data: Six of the Best”, in which I talked about a variety of approaches to Big Data issues. In media and information markets my first stop has always been MarkLogic, since working with them has taught me a great deal about how important the platform is, and how pulling together existing disparate services onto a common platform is often a critical first step. Anyone watching the London Olympics next month and using BBC Sport to navigate results and entries and schedules, with data, text and video, is looking at a classic MarkLogic 5 job (www.marklogic.com). But this is about scale internally, and about XML. In my six, I wanted to put alongside MarkLogic’s heavy lifting capacities  someone with a strong metadata management tradition, and a new entrant, with exactly those characteristics, is Pingar (www.pingar.com). Arguably, we tend to forget all the wonderful things we said about metadata a decade ago. From being the answer to all questions, it became a very expensive pursuit, with changing expectations from users and great difficulties in maintaining quality control, especially where authors created it, fudging the issue for many information companies.

So Pingar, who started in New Zealand  before going global, appropriately started its tools environment somewhere else. Using the progress made in recent years in entity extraction and  pattern matching, they have created tools to manage the automatic extraction of metadata at scale and speed. Working with large groups of documents (we are talking about up to 6 terrabytes – not “biggest” data but large enough for very many of us) metadata development becomes a batch processing function. The Pingar API effectively unlocks a toolbox of metadata management solutions  from tagging and organization  at levels of consistency that we all now need, to integration of the results with enterprize content management, with communications and with collaboration platforms. Sharepoint connectivity will be important for many users, as will the ability to output into CRM tools. Users can import their own taxonomies effectively, though over time Pingar will build facilities to allow taxonomy development from scratch.

As members of the Pingar team talked me through this, two thoughts persisted. In the first instance, the critical importance of metadata. Alongside Big Data, we will surely find that the fastest way to anything is searching metadata databases. They are not either/or, they are both/and. I am still stuck with the idea that however effective we make Big Data file searching, we will also need retained databases of metadata at every stage. And everytime we need to move into some sort of ontology-based environment, the metadata and our taxonomy become critical elements in building out the system. Big Data as a fashion term must not delude us from the idea that we shall be building and extending and developing knowledge based systems from now until infirmity (or whatever is the correct term for the condition that sparks the next great wave of software services development in 2018!)

And my other notion? If you are in New Zealand you see global markets so much more clearly. Pingar went quickly into Japanese and Chinese, in order to service major clients there, and then into Spanish, French and Italian. Cross -linguistic effort is thus critical Marc Andriessen is credited with the saying “Software is eating the world (which always reminds me of an early hero, William Cobbett, saying in the 1820s of rural depopulation through enclosures and grazing around the great heathland that now houses London’s greatest and slowest airport: “Here sheep do eat men”). I am coming to believe that Andriessen is right, and that Pingar is very representative of the best of what we should expect in our future diet.

« go backkeep looking »