It reminds one superficially of mineral extraction. Who owns the seam of diamonds – the miner or the landowner? When rights are not clear or landownership in dispute? But this business of text or data mining is not really like that at all, and I was reminded this week by blogging contributions from two old friends that who owns the results of data extraction, from thousands or millions of unstructured files, where the data retrieved from individual datasets may be tiny (well within most fair usage provisions) but the contribution to the whole value may be huge, remains at issue. Play this in the context of Big Data and real questions emerge.

Lets go back to the beginning. Here are a couple of top of head examples of life on the planet that give a clue to what is worrying me:

* According to research quoted by the UK’s National Centre for Text Mining “fewer than 7.84% of scientific claims made in a full text article are reported in the abstract for that article”. This, they point out, makes cross-searching of articles using data mining and extraction techniques very important to science research. Fortunately the JISC organization which licences all journal article content from publishers on behalf of UK universities permits researchers to data mine these files, and no doubt this was agreed with the publishers within the license(?). But the question in my mind is this: who owns the product created by the data mining, and is this a new value which can be resold to someone else?

* Lexis Risk Management use many hundreds of public and private US data resources in their Big Data environment to profile people and companies. Both private and public data is researched, and, of course, it will often be the case that unique connections will be thrown up which encourage or discourage users from doing business with the data subject. Clearly Lexis own the result of the custom sweep of the data, and clearly it needs to be updated and amended over time as a result of fresh data becoming available, or more data being licensed into the mine. But do Lexis, or any other data extractor, own the result of the extraction process? They are able to sell a value derived from it, and that value emerges directly from the search activity and the weighting of the answers that they have accomplished. But do they own or need to own the content (which may be different in ten minutes time when another search is done on the same subject)? And can the insurance company who buys that result as part of their risk management model resell the data content itself to a third party?

I have put up two examples because I do not wish to polarize the argument into publishers v government. The issue arises in the UK, as the media lawyer’s lawyer, Laurie Kaye has pointed out, because the Hargreaves Review of copyright law recommends the retention of rights with the data miner – so you can make new products by recombining other people’s data. The UK government has adopted this recommendation with its usual emphatic “maybe”. Elsewhere in the world of August which I deserted to take a holiday, the UK government has come out with a storming approval of Open Data, and, as Shane O’Neill has repeatedly pointed out in his blogs, this contrasts sharply with the content retention policies pursued by UK civil servants, even now creating a Public Data Corporation in order to frustrate the political drive of its masters (how easily a licensing authority becomes a restricting body!).

There are two really troubling aspects of this to me. In the first instance we are not going to get the data revolution, the Berners Lee dream of linked data, the creation of hybrid workflow content modelling, or the Big Data promise of new product and service development unless there is a primary assumption in our society that all Open Web content, and all government or taxpayer funded content is available for data cross searching, unless there are national security considerations. And that it is a standard expectation for data leasing that discovery from multiple files creates new services for the person putting the intellectual effort into that discovery, and hopefully new wealth and employment in our society. If we simply continue to debate copyright as if it connotes the transfer of real world rights into the digital network then we shall constrain the major hope of intellectual property development this century.

And the second thing? Well, I am realist enough to know, after 20 years of lobbying this point, that it is unreasonable to expect the UK government to change its attitude to an information society in my lifetime. So maybe we can undermine these guardians of “my information is my power” by saying that we do not want their content – just the right to search it. After all if it is good enough for the universities and the progress of science, it should be good enough for Ordnance Survey and the Land Registry!

References

Making Open Data Real (www.data.gov.uk/opendataconsultation)

The Public Data Corporation (http://discuss.bis.gov.uk/pdc/)

Response to the Hargreaves Report (http://www.bis.gov.uk/assets/biscore/innovation/docs/g/11-1199-government-response-to-hargreaves-review)

National Centre for Text Mining (http://www.bis.gov.uk/assets/biscore/innovation/docs/g/11-1199-government-response-to-hargreaves-review)

Laurence Kaye (http://laurencekaye.typepad.com/)

Shane O’Neill (http://www.shaneoneill.co.uk/)

Sometimes it takes a really big event to remind us of underlying changes that we should have recognized more prominently at the time. With BNA, The Bureau of National Affairs Inc, in Washington DC (and seldom is a location so important as this one) being acquired by Bloomberg a real shift is recognized. It is not solely or only the case that Bloomberg want to move closer to law practises in the US and around the world, or that many of those practises might at some future point become Bloomberg terminal users rather than Thomson Reuters WestlawNext users. It is that law and regulation pervades every branch of business, from finance outwards, and that the idea that paralegal or quasi-legal had fundamentally different needs from “qualified ” legal are gone. My colleague at Outsell, David Curle, has been particularly good at pointing out this democratization of the law and the wide and free availability of primary legal content. BNA built a very successful company around the idea that lawyers and others should have a closer view of how law was being created in Congress, and how embryonic law might affect the interests of their clients and their companies.

First, the details. Bloomberg have apparently offered $990m for BNA, which is around 2.25 X the current stock price, 3 X current revenues of $331 m and about 13 X EBITDA. This is a very good price at this time, though a pre-recession valuation might have been a shade higher. BNA was an employee-owned company with an eighty year history of democratic process (to attend an AGM, with its board election involving some 1500 shareholders, was always an impressive demonstration of this). Its founders, New Deal lawyers, all shared a principled view of the importance of participation and the sharing of information. Now it joins another (intensely) private company, younger by 50 years but also founded on the idea that content should and could be shared more effectively.

So what does all of this do to the balance of power? For Thomson Reuters, comparatively little, given that it has moved decisively (through its GRC developments) into that wider view of legal and regulatory relevance stated above. BNA’s two great assets would be its brand, forever associated with the reporting of embryonic law in committee in DC, but actually much wider in content and significance, and its tax services, a market leader in conjunction with CCH (Wolters Kluwer) and Thomson Reuters Tax (RIA). It is notable that Thomson, Reed Elsevier (Lexis), and CCH all license content from BNA for access online. This will presumably end after current contracts expire. Thomson will be hurt least by this. But note how important contextualised news is now to everyone: BNA gives this to Bloomberg in a way which helps to neutralize the Reuters/West advantage.

But both Lexis and CCH will suffer collateral damage. The loss of the tax content will cause real hurt to both, and the wider impact of the loss of the BNA brand and full content set will be hard for Lexis in particular. BNA content was important in that context in particular, since previous attempts to absorb and use highly branded legal content (Matthew Bender) seem to have petered out in terms of user recognition. Given that private equity was unable to enter the contest at these valuations, Lexis would have been the obvious candidate as a counter bidder, and the fact that it felt unable to match a high but not astronomic bid points to possible future environments. It may be that Reed Elsevier see their future with Lexis in risk management rather than in legal as such, and if that were the case then we could well, in the next five years, see a new order of things, with Thomson Reuters and Bloomberg dominating legal and regulatory marketplaces, and CCH and Lexis forming a sort of second division in positions increasingly hard to maintain outside of specialist niches. There is only one shoe left to drop in US legal marketplaces. Analysts will now look closely at whether ALM (owned by Apax) will be the last major play.

Bloomberg appear to be indicating that they will hold BNA as a separate wholly-owned subsidiary in the first instance. This makes sense: they have distinctive cultures and need time to get to know each other. It is however interesting to think where the optimum first linkages will take place. Certainly management in the nascent Bloomberg Government unit will be salivating: they will rightly see the congressional law reporting as a key element in bringing more widespread usage in government at all levels. And everyone involved in the business of proliferating Bloomberg terminals more widely in the tax advisory marketplace will be exultant, since this is a real game changer for them. If the claim that we are all moving to workflow is correct, then BNA is vital to Bloomberg in its wish to move into adjoining, content – related markets like legal and paralegal.

And a final and personal note on culture. As an advisory director to BNA’s international marketing (Bloomberg will transform that with their global coverage) I have, for almost 25 years, worked with quite the most civilized publisher on the Planet. The values of the founders were exemplified by their successors, and while employee ownership sometimes caused problems of its own, those who worked there were embued well beyond the normal with a sense of purpose, and indeed, a lifetime commitment, to what they were doing, and a belief that their purpose was part of the public good. This cannot be bottled, so Bloomberg must be careful to preserve it. Having tried to enter security law in the early years of this century, and made very slow progress, they should know how difficult it is to get very high level editorial intervention and commentary to work properly. The biggest property they have so far bought is BusinessWeek, which was not strictly comparable. BNA is different, and to get the real value they will need to treat it very differently.

« go backkeep looking »