May
20
The Proper Study of Information Man
Filed Under B2B, Big Data, Blog, Financial services, healthcare, Industry Analysis, internet, news media, Publishing, Reed Elsevier, Search, semantic web, STM, Thomson, Uncategorized, Workflow | Leave a Comment
This may be the age of data, but the questions worth asking about the market viability of information service providers are no longer about content. They are about what you do to content-as-data as you seek to add value to it and turn it into some form of solution. So, in terms of Pope’s epigram, we could say that the proper study of Information Man is software. Data has never been more completely available. Admittedly, we have changed tack now on the idea that we could collect all that we need and put it into a silo and search it. Instead, in the age of big data, we prefer to take the programme to the data. Structured and unstructured. Larger collectively than anything tackled before the emergence of Google and Yahoo!, and then Facebook, and inspired by the data volumes thrown off by those services. And now we have Thomson Reuters and Reed Elsevier knee deep in the data businesses and throwing up new ways of servicing data appropriate to the professional and business information user. So shall we in future judge the strategic leadership of B2B, STM, financial services or professional information services companies by what they know about the decisions they need to make about implementing which generation of what software to have what strategic effect on their marketplaces? I hope not, since I fear that like me they may be found wanting.
And clearly having a CTO but not having the knowledge of the right questions to ask him, or what the answers mean is not sufficient either. In order to get more firmly into this area myself I wrote a blog last month called “Big Data: Six of the Best”, in which I talked about a variety of approaches to Big Data issues. In media and information markets my first stop has always been MarkLogic, since working with them has taught me a great deal about how important the platform is, and how pulling together existing disparate services onto a common platform is often a critical first step. Anyone watching the London Olympics next month and using BBC Sport to navigate results and entries and schedules, with data, text and video, is looking at a classic MarkLogic 5 job (www.marklogic.com). But this is about scale internally, and about XML. In my six, I wanted to put alongside MarkLogic’s heavy lifting capacities someone with a strong metadata management tradition, and a new entrant, with exactly those characteristics, is Pingar (www.pingar.com). Arguably, we tend to forget all the wonderful things we said about metadata a decade ago. From being the answer to all questions, it became a very expensive pursuit, with changing expectations from users and great difficulties in maintaining quality control, especially where authors created it, fudging the issue for many information companies.
So Pingar, who started in New Zealand before going global, appropriately started its tools environment somewhere else. Using the progress made in recent years in entity extraction and pattern matching, they have created tools to manage the automatic extraction of metadata at scale and speed. Working with large groups of documents (we are talking about up to 6 terrabytes – not “biggest” data but large enough for very many of us) metadata development becomes a batch processing function. The Pingar API effectively unlocks a toolbox of metadata management solutions from tagging and organization at levels of consistency that we all now need, to integration of the results with enterprize content management, with communications and with collaboration platforms. Sharepoint connectivity will be important for many users, as will the ability to output into CRM tools. Users can import their own taxonomies effectively, though over time Pingar will build facilities to allow taxonomy development from scratch.
As members of the Pingar team talked me through this, two thoughts persisted. In the first instance, the critical importance of metadata. Alongside Big Data, we will surely find that the fastest way to anything is searching metadata databases. They are not either/or, they are both/and. I am still stuck with the idea that however effective we make Big Data file searching, we will also need retained databases of metadata at every stage. And everytime we need to move into some sort of ontology-based environment, the metadata and our taxonomy become critical elements in building out the system. Big Data as a fashion term must not delude us from the idea that we shall be building and extending and developing knowledge based systems from now until infirmity (or whatever is the correct term for the condition that sparks the next great wave of software services development in 2018!)
And my other notion? If you are in New Zealand you see global markets so much more clearly. Pingar went quickly into Japanese and Chinese, in order to service major clients there, and then into Spanish, French and Italian. Cross -linguistic effort is thus critical Marc Andriessen is credited with the saying “Software is eating the world (which always reminds me of an early hero, William Cobbett, saying in the 1820s of rural depopulation through enclosures and grazing around the great heathland that now houses London’s greatest and slowest airport: “Here sheep do eat men”). I am coming to believe that Andriessen is right, and that Pingar is very representative of the best of what we should expect in our future diet.
Apr
27
Open Up Your APIs!
Filed Under B2B, Big Data, Blog, eBook, Education, eLearning, Financial services, Industry Analysis, internet, mobile content, news media, Publishing, Search, semantic web, social media, STM, Thomson, Uncategorized, Workflow | 3 Comments
In this industry five years is enough to benchmark fundamental change. This week I have been at the 9th Publishers’ Forum, organized as always by Klopotek, in Berlin. This has become, for me, a must attend event, largely because while the German information industry is one of the largest in Europe, German players have been marked by a conservative attitude to change, and a cautious approach to what their US and UK colleagues would now call the business model laws of the networked information economy. At some level this connects to a deep German cultural love affair with the book as an object, and how could that not be so in the land that produced Gutenburg? On another level, it demonstrates that German business needs an overwhelming business case justification to institute change, and that it takes a time for these proofs to become available. Which is not to say that German businesses in this sector have not been inventive. An excellent two part case study run jointly by Klopotek and de Gruyter was typical: de Gruyter are the most transformed player in the STM sector because they have seized upon distribution in the network and selling global access as a fast growth path, and Klopotek were able to supply the necessary eCommerce and back office attributes to make this ambition feasible. And above all, in a room of more than 300 newspaper, magazine and book executives, we were at last able to fully exploit the language and practice of the network in information handling terms. This dialogue would have been impossible in Germany five years ago. A huge attitudinal change has taken place. Now we can deploy our APIs and allow users to get the value and richness of our content, contextualised to their needs, instead of covering them with the stuff and hoping they get something they want.
In some ways the Day 2 Keynote from Andrew Jordan, CTO at Thomson Reuters GRU business, exemplified the extent of this. The incomparable Brian O’Leary had started us off on Day 1 in good guru-ish style by placing context in its proper role and reminding us that it is not content as such but its relationships that increasingly concern us. You could not listen to him and still believe that content was the living purpose of the industry, or that the word “publishing” had not changed meaning entirely. With Michael Healy of CCC and Peter Clifton of +Strategy following him to hammer home the new world of collaboration and licencing, and the increasing importance of metadata in order to identify and describe tradeable entities, we were well on the way towards a recognition of new realities, ferried there before dinner by Jim Stock of MarkLogic using the connected content requirements of BBC Sport in an Olympic year to get us started in earnest on semantic approaches to discovery and our urgent needs to create appropriate platform environments to allow us to use our content fluently in this context.
So the ground was well-prepared for Andrew Porter. He took us on a journey from the acquisition of ClearForest by Reuters while it was being acquired by Thomson, to the use of this software by the new company to create OpenCalais, allowing third parties (over 60 of them) to get into entity extraction (events and facts, essentially) and then into the creation of complex cross-referencing environments, and finally to the use of this technology by Thomson Reuters themselves in the OneCalais and ContentMarketplace environments. So here was living proof of the O’Leary thesis, on a vast scale, building business-orientated ontologies, and employing social tagging in a business context. Dragging together the whole data assets of a huge player to service the next customer set or market gap. And no longer feeling obliged to wrap all of this in a single instance database, but searching across separately-held corporate datasets in a federated manner using metadata to find and cross-reference entities or perform disambiguation mapping. Daniel Mayer of Temis was able to drive this further and provide a wide range and scale of cases from a technology provider of note. The case was made – whether or not what we are now doing is publishing or not, it is fundamentally changed once we realize that what we know about what we know is as important as our underlying knowledge itself.
And of course we also have to adjust our business models and our businesses to these new realities – patient Klopotek have been exercising expertise in enabling that systems re-orientation to take place for many years. And we must recognize that we have not arrived somewhere, but that we are now in perpetual trajectory. One got a real sense of this from an excellent presentation to a very crowded room by Professor Tim Bruysten of richtwert on the impact of social media, and, in another way, from Mike Tamblyn of Kobo when he spoke of the problems of vertical integration in digital media markets. And, in a blog earlier this week, I have already reported on the very considerable impact of Bastiaan Deplieck of Tenforce.
Speaking personally, I have never before attended a conference of this impact in Germany. Mix up everything in the cocktail shaker of Frank Gehry’s great Axica conference centre alongside the Brandenburg Gate, with traditional book publishers rubbing shoulders with major information players, and chatting to software gurus, industry savants, newspaper and magazine companies, enterprize software giants and business service providers and you create a powerful brew in a small group. Put them through seperate German and English streams, then mix them up in Executive Lounge seminars and discussion Summits and the inventive organizers give everyone a chance to speak and to talk back. This meeting had real energy and, for those who look for it, an indication that the changes wrought by the networked economy and its needs in information/publishing terms, now burn brightly in the heart of Europe.
« go back — keep looking »