Reading blogs is seriously bad for morale. As a blogger, I do it compulsively, instinctively, and, too often, with a complete suspension of disbelief.  “I read it on a blog” has become this century’s equivalent of “well, I read it in the newspaper”. Neither stand up completely to scrutiny, though nor do any other media outlets either. We may have to revert  to  “I met a man who knows about these things and he said..” And my finding from last week was that when you do meet a man who knows, it is really surprising how much there is to be learnt.

Last week there was an Open Day for IXXUS, an excellent UK integrator with a really good track record. They use MarkLogic, and in a very effective case study around the parliamentary publisher, Dods, Simon Thompson, who manages Dods, demonstrated how effectively you can reposition a media company like this once you have full control of all of its content, the ability to search unstructured data, and thus the ability to redefine services as solutions to user requirements. And if this was not a surprise as such, the neatness of the Dods solution was certainly impressive.

The day also brought in other elements. As an Alfesco user from time to time, I was quietly amazed to find that this UK-based operation, the second largest Open Source player in the world behind Red Hat, has now registered 3 million downloads and proudly boasts 19 quarters of consecutive revenue growth. If anyone doubts the importance of workflow then take a look at this, but the element that stuck out for me was Alfresco’s concern for content management in the context of social media. It is all at http://www.ixxus.com/webinar/ and well worth a look. It is also good to be reminded of the continually growing power of open source search, especially in vertical market contexts. On show here are Lucene/Solr. With customers like the Guardian, Cisco, Salesforce, Zappos, and publishers like Taylor and Francis, this presentation spoke volumes about how far open source search has travelled in the past few years, “They came for the cost, they stayed for the flexibility” quoth the man. And Lucene is now 10 years old.

So in a week that jolted from received impressions in so many different ways I was not entirely surprised to get a note from my friend, Ian Nairn, on Internet TV. Now I am not a regular watcher. Glued to the rugby matches I find, like the current candidate for Chairman of the BBC, Lord Patten, that it is fairly hard to watch most of the time. Yet Ian is a provider of good leads, so I followed this one to http://www.ednetinsight.com/news-alerts/the-heller-report/you-on-the-tube–the-internet-tv–channel–to-the-family-flat-screen.html and found plenty of nourishment. In schools one could readily imagine the television screen becoming an engine of integration, backed by Cloud-based storage. After all, we have had two generations of LMS and VLE, and while the technology is widely deployed in western schools it is neither simple to use for the demotivated (staff and students) and the service provision in schools is neither intuitive nor technician-free. In fact IT has created a new school power base and less than 10% of teachers are seen as natural users in the sense of creating, deploying and storing lessons online.

And one of the elements of this article which triggered my enthusiasm was its reference to Cambridge’s Global Grid for Learning (GGfL). Here is a context where a resource creating, permissions cleared global resource of actual and embryonic learning objects comes into its own. We know that Internet TV, strongly driven by Google, is happening and will change many relationships. I had not factored learning into this environment, but now that I have I certainly think it creates scenarios ready for dynamic change.

And do not even ask me what I was doing in the China Daily. But it produced this thought (http://www.chinadaily.com.cn/cndy/2011-03/13/content_12162539.htm) via the New York Times. There John Markoff has discovered e-discovery software for lawyers. No surprises fro those with their heads in this space, but a graphic example of how a $2.2 million dollar legal workflow process in 1978 can be done today for a fraction of the cost – $400,000 in fact. The fact that legal fees do not seem to decrease is a mystery that I may never crack, but here are witnesses to a truth that we must put into the centre of our considerations: the major professions are now rapidly automating, with an impact on society with which we have not yet come to terms. The article has some great examples of pattern recognition and linked me back to the IXXUS day, and to the man who said that 80% of corporate data is now email. Mike Lynch reckons that one lawyer will be able to do the work of 500… unless of course you met a man last week who said differently.

If you are an STM publisher reading this, then it may already be too late for you to act decisively enough to put yourself in the vanguard of change. For I am not the first to say what I am about to say, and there is now a good literature based around the idea that the network is a world of small beginnings, followed by mass change at unprecedented rates that catch whole industries unawares. We are coming to one of those points, and my growing realization was triggered into certainty by being sent a link to a Harvard Business Review article from November 2010 (thank you, Alexander van Boetzelaar for making sure I saw this).  Since HBR as an old world publisher makes a business of paid-for reprints I cannot give the link, but it is reprint R1011B.

The article is called “The Next Scientific Revolution”, by Tony Hey, a director of Microsoft Research and one of the Fourth Paradigm people who made such an impact in 2009. Their arguments, pioneered by the late Jim Gray, saw scientific enquiry gathering force as the experimental methods of early Greece and China were subsumed into the modern theoretical science of the Newtonian age, and then carried forward through computation and simulation into the age of high performance computing in the last century. So now we stand on the verges of a fourth step , the ability to concentrate unprecedented quantities of data and apply to it data mining and analytics, that, unlike the rule-based enquiries of the previous period, are able to throw out unsuspected relationships and connections that in turn are the source of further enquiry.

All of this reminds me of Timo Hannay of Nature and his work with the Signalling Gateway consortium of cell science researchers based in San Diego. I am not sure how successful that was for all parties involved, and to an extent it does not matter (especially given the lead time in experience given to Nature by this work). To me this was a signal of something else: on the network the user will decide and make the revolutionary progress, and we “publishers” will have to be ready in an instant to follow, developing the service envelope in which users will be able to do what they need to do. At the moment we are all sitting around in STM talking about overpublishing, the impossibility of bench science absorbing the soaring output of research articles, or libraries to keep up on restricted budgets, when the real underlying problem we are not seeing is the fact that the evidence behind those articles is “unpublished” and unconcentrated, and that as the advanced data mining and analytics tools become increasingly available they have insufficient scale targets in terms of collected data.

Of course, there are big data collections available. And their usage and profitability is significant. Many are non-profit and some are quasi-monopolistic. But I see huge growth in this area, especially in physics, chemistry and the life sciences, to the point where “evidence aggregation and access management and quality control” is the name of the business, not journal publishing. Mr Hey comments in his article “Critically, too, most of us believe scientific publishing will change dramatically in the future.”  “We foresee the end product today – papers that discuss an experiment and its findings and just refer to datasets – morphing into a wrapper for the data themselves, which other researchers will be able to access directly over the internet, probe with their own questions, or even mash into their own datasets in creative ways that yield insights that the first researcher may never of dreamed of.”

What does “access directly” mean in this context? Well, it could mean that universities and researchers allow outside access to evidential data, but this poses other problems. Security and vetting loom large. Then again, evidential peer review may be a requirement – was the evidence created accurately, ethically or using reliable methodologies? Plenty of tasks for publishers here. Then again, can I hire tools to play in this sandpit? Is the unstructured content searchable, and is metadata consistent and reliable? These are all services “publishers” can offer, in a business model that attracts deposit fees for incoming data as well as usage fees. But there will be natural monopolies. It may be true, as Mr Hey claims, that “through data analysis scientists are zeroing in on a way to stop HIV in its tracks”, but how many human immunodeficientcy virus data stores can there be? Right, only one.

So the new high ground will have fewer players. A few of those will be survivors from the journal publishing years, and I hope one at least will have the decency to blush when recalling the pressure put on people like me, in my EPS days, to remove the ever-growing revenues of the science database industry (human genomics, geospatial, environmental, for the most part), from the STM definition since it was not “real” science publishing – and reduced their share-of-market figures! But then again, maybe they should look around them. Isn’t what is being described here exactly what LexisNexis are doing with Seisint and Choicepoint, or Thomson Reuters with Clearforest. And why? Because their users dictate that this shall be so. For the same reason this is endemic in patent enquiry: see my erstwhile colleague David Bousfield anatomizing this fascinatingly only last week (https://clients.outsellinc.com/insights/index.php?p=11416). And why have market-leading technology companies in this space – think of MarkLogic and their work on XML and the problems of unstructured data – made such an impact in recent years in media and government (aka intelligence)? I see a pattern, and if I am right, or even half right, it poses problems for those who do not see it.

I rest my case. Next Friday I shall do the Rupert Murdoch 80th birthday edition, for which I plan to bake a special cake!

« go backkeep looking »