The Proper Study of Information Man

Filed Under B2B, Big Data, Blog, Financial services, healthcare, Industry Analysis, internet, news media, Publishing, Reed Elsevier, Search, semantic web, STM, Thomson, Uncategorized, Workflow | Leave a Comment

This may be the age of data, but the questions worth asking about the market viability of information service providers are no longer about content. They are about what you do to content-as-data as you seek to add value to it and turn it into some form of solution. So, in terms of Pope’s epigram, we could say that the proper study of Information Man is software. Data has never been more completely available. Admittedly, we have changed tack now on the idea that we could collect all that we need and put it into a silo and search it. Instead, in the age of big data, we prefer to take the programme to the data. Structured and unstructured. Larger collectively than anything tackled before the emergence of Google and Yahoo!, and then Facebook, and inspired by the data volumes thrown off by those services. And now we have Thomson Reuters and Reed Elsevier knee deep in the data businesses and throwing up new ways of servicing data appropriate to the professional and business information user. So shall we in future judge the strategic leadership of B2B, STM, financial services or professional information services companies by what they know about the decisions they need to make about implementing which generation of what software to have what strategic effect on their marketplaces? I hope not, since I fear that like me they may be found wanting.

And clearly having a CTO but not having the knowledge of the right questions to ask him, or what the answers mean is not sufficient either. In order to get more firmly into this area myself I wrote a blog last month called “Big Data: Six of the Best”, in which I talked about a variety of approaches to Big Data issues. In media and information markets my first stop has always been MarkLogic, since working with them has taught me a great deal about how important the platform is, and how pulling together existing disparate services onto a common platform is often a critical first step. Anyone watching the London Olympics next month and using BBC Sport to navigate results and entries and schedules, with data, text and video, is looking at a classic MarkLogic 5 job (www.marklogic.com). But this is about scale internally, and about XML. In my six, I wanted to put alongside MarkLogic’s heavy lifting capacities someone with a strong metadata management tradition, and a new entrant, with exactly those characteristics, is Pingar (www.pingar.com). Arguably, we tend to forget all the wonderful things we said about metadata a decade ago. From being the answer to all questions, it became a very expensive pursuit, with changing expectations from users and great difficulties in maintaining quality control, especially where authors created it, fudging the issue for many information companies.

So Pingar, who started in New Zealand before going global, appropriately started its tools environment somewhere else. Using the progress made in recent years in entity extraction and pattern matching, they have created tools to manage the automatic extraction of metadata at scale and speed. Working with large groups of documents (we are talking about up to 6 terrabytes – not “biggest” data but large enough for very many of us) metadata development becomes a batch processing function. The Pingar API effectively unlocks a toolbox of metadata management solutions from tagging and organization at levels of consistency that we all now need, to integration of the results with enterprize content management, with communications and with collaboration platforms. Sharepoint connectivity will be important for many users, as will the ability to output into CRM tools. Users can import their own taxonomies effectively, though over time Pingar will build facilities to allow taxonomy development from scratch.

As members of the Pingar team talked me through this, two thoughts persisted. In the first instance, the critical importance of metadata. Alongside Big Data, we will surely find that the fastest way to anything is searching metadata databases. They are not either/or, they are both/and. I am still stuck with the idea that however effective we make Big Data file searching, we will also need retained databases of metadata at every stage. And everytime we need to move into some sort of ontology-based environment, the metadata and our taxonomy become critical elements in building out the system. Big Data as a fashion term must not delude us from the idea that we shall be building and extending and developing knowledge based systems from now until infirmity (or whatever is the correct term for the condition that sparks the next great wave of software services development in 2018!)

And my other notion? If you are in New Zealand you see global markets so much more clearly. Pingar went quickly into Japanese and Chinese, in order to service major clients there, and then into Spanish, French and Italian. Cross -linguistic effort is thus critical Marc Andriessen is credited with the saying “Software is eating the world (which always reminds me of an early hero, William Cobbett, saying in the 1820s of rural depopulation through enclosures and grazing around the great heathland that now houses London’s greatest and slowest airport: “Here sheep do eat men”). I am coming to believe that Andriessen is right, and that Pingar is very representative of the best of what we should expect in our future diet.

May

11

Decline and Fall of the Google Empire: Revisited

Filed Under B2B, Big Data, Blog, Financial services, healthcare, Industry Analysis, internet, mobile content, news media, online advertising, Publishing, Search, semantic web, social media, STM, Uncategorized, Workflow | Leave a Comment

I have been waiting to write this post for four months. Ever since I wrote a piece with this title in January 2011 friends and colleagues have been asking “And now…?”, and this has intensified since Google’s results announcement in January 2012. 25% revenue growth? Breaking $10 billion revenue in a single quarter? In anyone elses’ results statement this would have been sparkling news in a recession. Google’s shares dropped 10% on the news. And then the analysis. Cost-per-click – Google’s revenue from advertizers – fell 8% in the quarter, and the same amount in the previous quarter. This is a company still totally dependent on advertising. Imagine a newspaper company whose yield from classifieds fell 8% per quarter to see the wonderful way in which “velocity”, as Larry Page describes growth, disguises performance.

When I last wrote on this subject I was trying to describe an advertising-based search company that was trying to kick the habit and migrate elsewhere. Clearly Android, now on 250 million handsets, is the most obvious escape hatch. Analysts forecast that 2012 will see Android account for 12% of gross revenues, which demonstrates that migration is slow and old habits die hard. So if my grandchildren do not grow up thinking of Google as a phone company, as I suggested in the original blog, what will they think of the mature Google, shuffling along in the carpet-slippers of 10% growth? Well, they could imagine it as an operating system – Chrome is still growing strongly and Chrome OS has not been fully exploited. Or they could think of it as a social network environment: Google+ is now up to 90 million members, still a fraction of Facebook, but up from 40 million the previous quarter. Indeed, social networking may be a “must win”, or at least a “must compete strongly” environment for Google if the search-advertising market is to be prolonged long enough for these other options to emerge from under the strategy umbrella. With Google taking the axe to so many of its product development fields directly related to search, this requirement is exacerbated.

However, what really gets me writing this evening is the strong suspicion that Google themselves think that the answer is elsewhere. An interview with Ben Fried, the Google CIO, in the Wall Street Journal yesterday has him saying that the Cloud is reaching a tipping point (http://blogs.wsj.com/cio/2012/05/10/google-cio-ben-fried-says-cloud-tipping-point-is-at-hand/?mod=google_news_blog). Google clearly feel that Cloud computing, in the age of ubiquitous broadband (whenever that happens), will be their route to a business base in individual and small business sectors. As Google has used the Cloud to take costs out of its own core business, which given the comments above it has needed to do, so it can use its global data centre coverage to do the same for others. In this world, where we can fondly imagine two remotely sited workers watching each other’s real time edits on a document in Google Docs, small development teams can access a wide range of tools and pursue the sort of “fail fast”, constantly re-iterating, development strategies beloved of major corporates.

But this is a place where the competition is established, hot and strong, and despite Google’s history as a solutions developer, Apple and Microsoft go back further. iCloud, dependent on a syncing environment rather than the broadband, moves all the files to the Cloud, with users retaining copies and, as Steve Jobs is always quoted as saying, demoting “the PC to be just a device”. There is a different philosophy of Cloud here, but one that seems more based on now than when. And then again there is Amazon, inspired, as was Google, by the long struggle to use the Cloud to solve its own back office issues, now offering AWS as a solution in the very markets that Google thinks should be its own.

So it cannot be just the Cloud that Google see as their exit-from advertising-dependence platform. But the Cloud and Big Data? This article’s timing is much influenced by the announcement of Google BigQuery, which, although semi-publicly trialled since December last year, was formally launched on 1 May (http://www.zdnet.com/blog/big-data/googles-bigquery-goes-public/405). Since it covers databases of up to two terabytes (seems big to me!), this has been described as a business intelligence tool by some commentators who expected larger database environments from the inventor of MapReduce (working in pedabytes), who kicked off this Big Data thing to begin with and are clearly working here as elsewhere from the “solve our own problems, then generalize to solve yours” standpoint indicated above. But here is a real irony: if you are working in a Big Data context much of what you will be looking for is indexed on Google, but not searchable in a Google Cloud context. Again, contrast Amazon, where they have now begun adding public databases to their Cloud offering, searchable in their EC2 (Electric Compute Cloud) context. Here are some of the first offerings:

“Annotated Human Genome Data provided by ENSEMBL
The Ensembl project produces genome databases for human as well as almost 50 other species, and makes this information freely available.

Various US Census Databases from The US Census Bureau
United States demographic data from the 1980, 1990, and 2000 US Censuses, summary information about Business and Industry, and 2003-2006 Economic Household Profile Data.

UniGene provided by the National Center for Biotechnology Information
A set of transcript sequences of well-characterized genes and hundreds of thousands of expressed sequence tags (EST) that provide an organized view of the transcriptome.

Freebase Data Dump from Freebase.com
A data dump of all the current facts and assertions in the Freebase system. Freebase is an open database of the world’s information, covering millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC archives, it contains structured information on many popular topics, including movies, music, people and locations – all reconciled and freely available.”

In all, Google now face a struggle. As they move to a new service environment, we need to remember that they created the original company not by inventing search but improving it. Page ranking was a big step forward in its day and created a meteoric growth company. From this they built an Empire, now maturing. Edward Gibbon, commenting upon the fall of Rome and the rise of its rivals, marked a certain point of no return. “If all the barbarian conquerors had been annihilated in the same hour, their total destruction would not have restored the empire of the West: and if Rome still survived, she survived the loss of freedom, of virtue, and of honour.”

Is this where Google now is, and can its still youthful originators recreate it?

« go back — keep looking »

May

20

The Proper Study of Information Man

May

11

Decline and Fall of the Google Empire: Revisited

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin