Jun
7
VZBV is out to get you!
Filed Under B2B, Big Data, Blog, data protection, Financial services, healthcare, Industry Analysis, internet, news media, privacy, Search, social media, Uncategorized | 3 Comments
However, I bet the US government gets you first! As the newspapers (Guardian 07/06/13) reproduce slides for training US security and FBI officials in the use of the data feeds they get from Google, Verizon, AOL, Apple, Facebook et al who under the FAA enactment can now download usage data and user content, the German Federation of Consumer Organizations (VZBV) brought an action against Apple – and won in the Regional Court of Berlin. One of Apple’s crimes was sharing data with subsidiaries, and another was re-using data not directly gathered in the trading activity (for instance, the recipient’s details on a gift certificate sale). If I worked in Apple’s legal department, dedicated to taking no prisoners in any legal wrangle, I would be getting fairly schizoid by now. As indeed I am, whenever I use terms like Open Access, Open Data, Open Society, Open Sesame… and then reflect on the attempt by everyone in the networked society and the network economy to suborn and subvert every data instance into own-able IP.
And now, a word of explanation. My silence here in the last 10 days reflects my listening elsewhere. And speaking – to two Future of Publishing sessions, in London and New York, sponsored by IXXUS (www.Ixxus.com), and at a seminar organized by the University of Southampton’s Web Sciences Doctoral students group (slides are available here). At each of these sessions we discussed the networked society and its implications. And Big Data reared its ugly head, strengthening my resolution never to mention the apparently undefinable term again, but to talk instead of massive data components and the strengthening business of data analytics. But nowhere did we discuss data protection – or revelation – and I regret that, especially now that the brilliant latest issue of DataIQ, the journal of DQM Group (www.dqmgroup.com) has come to hand. On page after page it nails the issues that every data holder should have in mind, and which our networked content industry grievously neglects at its peril. If the FBI don’t get you, the German courts will!
But I am less surprized, on reflection, by the US revelations than I thought I would be, bearing in mind the huge amounts of high level analytics and search software that agencies of the US government have bought over the years. On the one hand we should be grateful that a degree of paranoia has spawned an industry. This is where Seisint (Lexis Risk) came from, here is where Palantir and other software developers have flourished. These software developments were always intended for more than calculating the Presidential expenses or searching the library. The Military/Intelligence complex has been a rich patron for developing many of the tools that the networked society depends upon. On the other side we should reflect that mass observation on this scale is the Orwellian manifestation of a police state, and that those who battle for the liberty of the individual are betrayed if it becomes necessary to infringe that liberty in the cause of protecting it. In saying this, I should also say that I am sure that the UK government would be equally intrusive if they could afford it, and in times like these the natural tendencies of governments to use National Security as the cloak for the erosion of civil liberties is global. But after the emergency, do you ever remember government giving privacy rights back?
Which brings me to the network protection of user data in non-security contexts. Here there can be no doubt about who owns what: the problem is getting people to admit to the obvious. Thus, it seems to me axiomatic that when I use a networked service, then the transactional data that I input remains mine, unless or until I have accepted conditions of service that say otherwise. And even those conditions cannot rob me of my ownership: all they can do is define agreed conditions or re-use for the people I am dealing with at the time. Eventually, in the network, we will each of us be able to isolate every data entry that we make anywhere and store it in a personal DropBox in the Cloud. We will then sell or gift rights of re-use to designated parts to Apple, to market researchers, to the US government as part of a visa waiver application. But we shall at least be in control, and have pushed back on the arrogance of data players who seem to believe that every sign-on is their property. It is this type on “unthinking”, I am sure, that lies behind Bloomberg’s huge intrusion into user rights when they allowed their news team to examine the access records of their clients. I know we do not like Bankers in our society now (Don’t worry, Doctors and Lawyers and Journalists and Politicians – we shall be back for you again just as soon as we have finished off these financiers), but surely no one at Goldman Sachs deserves a call from a news reporter saying “I see you have not used your terminal this week, so are you still employed?”
Here in Europe we pride ourselves on ordering things differently. Our secret weapon is Germany, where, for fairly obvious historical reasons, privacy is now a fetish and data protection has become a goal pursued on behalf of citizens by a lobby of what can only be described as, well, privacy fundamentalists. The current revision of the European Data Protection Directive (95/46/EC) into European law will effectively turn the current opt-out regime in the EU into an opt-in world. Not necessarily a bad thing, says Mark Roy, CEO of the Data Agency in an article in DataIQ. I agree, and I also agree that the right of erasure (the right to be forgotten) is pretty difficult to manage. But the real horror story is the bureaucracy, the checking, the courts and the fines that all of this entails. Somewhere here there has to be a balance between German fanaticism and US laissez faire regarding the rights of individuals to the ownership of their own information. We have never seemed further apart from creating this essential building block of a networked society.
Apr
30
Is Open Access Over?
Filed Under Big Data, Blog, healthcare, Industry Analysis, internet, Publishing, Search, semantic web, social media, STM, Uncategorized, Workflow | 5 Comments
A sudden thought. Doing an interview with some consultants yesterday (we are fast approaching the season when some major STM assets will come back into the marketplace) I was asked where I had estimated Open Access would be now when I had advised the House of Commons Science and Technology Committee back in 2007 on the likely penetration of this form of article publishing. Around 25%, I answered. Well, responded the gleeful young PhD student on the end of the telephone, our researches show it to be between 5-7%. Now, I am not afraid of being wrong (like most forecasters, I have plenty of experience of it!). But it is good to know why and I suspect that I have been writing about those reasons for the last two years. Open Access, defined around the historic debate twixt Green and Gold, when Quixote Harnad tilted at publishers waving their arms like windmills, is most definitely over. Open is not, if by that we begin to define what we mean by Open Data, or indeed Open Science. But Open Access is now open access.
In part this reflects the changing role of the Article. Once the place of publisher solace as the importance of low impact journals declined, it is now the vital source of the things that make science tick – metadata, data, abstracting, cross-referencing, citation, and the rest. It is now in danger of becoming the rapid act at the beginning of the process which initiates the absorption of new findings into the body of science. Indeed some scientists (Signalling Gateway provided examples years ago) prefer simply to have their findings cited – or release their data for scrutiny by their colleagues. Dr Donald Cooper of the University of Colorado, Boulder, used F1000Research to publish a summary of data collected in a study that investigated the effect of ion channels on reward behavior in mice .In response to public referee comments he emphasized that he published his data set in F1000Research “to quickly share some of our ongoing behavioral data sets in order to encourage collaboration with others in the field”. (http://f1000.com/resources/Open-Science-Announcement.pdf)
I have already indicated how important I think post-publication peer review will be in all of this. So let me now propose a four-stage Open Science “publication process” for your consideration:
1. Research team assembles the paper, using Endnote or another process tool of choice, but working in XML. They then make this available on the research programme or university repository, alongside the evidential data derived from the work.
2. They then submit it to F1000 or one of its nascent competitors for peer review at a fee of $1000. This review, over a period defined by them, will throw up queries, even corrections and edits, as well as opinion rating the worth of the work as a contribution to science.
3. Depending upon the worth of the work, it will be submitted/selected for inclusion in Nature, Cell, Science or one of the top flight branded journals. These will form an Athenaeum of top science, and continue to confer all of the career-enhancing prestige that they do today. There will be no other journals.
4. However, the people we used to call publishers and the academics we used to call their reviewers will continue to collect articles from open sources for inclusion in their database collections. Here they will do entity extraction and other semantic analysis to make what they will claim as the classic environments which each specialist researcher needs to have online, while providing search tools to enable users to search here, or here plus all of the linked data available on the repositories where the original article was published – or search here, on the data, and on all other articles plus data that have been post-publication reviewed anywhere. They will become the Masters of Metadata, or they will become extinct. This is where, I feel, the entity or knowledge stores that I described recently at Wiley are headed. This is where old-style publishing gets embedded into the workflow of science.
So here is a model for Open Science that removes copyright in favour of CC licenses, gives scope for “publishers” to move upstream in the value chain, and to increasingly compete in the data and enhanced workflow environments where their end-users now live. The collaboration and investment announced two months ago between Nature and Frontiers (www.frontiersin.org), the very fast growing Swiss open access publisher seems to me to offer clues about the collaborative nature of this future. And Macmillan Digital Science’s deal on data with SciBite is another collaborative environment heading in this direction. And in all truth, we are all now surrounded by experimentation and the tools to create more. TEMIS, the French data analytics practice, has an established base in STM (interestingly their US competitor, AlchemyAPI, seems to work most in press and PR analysis). But if you need evidence of what is happening here, then go to www.programmableweb.com and look at the listings of science research APIs. A new one this month is BioMortar API “standardized packages of genetic patterns encoded to generate disparate biological functions”. We are at the edge of my knowledge here, but I bet this is a metadata game. Or ScholarlyIQ, a package to help publishers and librarians sort out what their COUNTER stats mean (endorsed by AIP), or ReegleTagging API, designed for the auto-tagging of clean energy research, or, indeed, OpenScience API, Nature Publishing’s own open access point to searching its own data.
And one thing I forgot. Some decades ago, I was privileged to watch one of the great STM publishers of this or any age, Dr Ivan Klimes, as he constructed Rapid Communications of Oxford. Then our theme was speed. In a world where conventional article publishing could take two years, by using a revolutionary technology called fax to work with remote reviewers, he could do it in four months. Dr Sam Gandy, an Alzheimer’s researcher, is quoted by F1000 as saying that his paper was published in 32 hours, and they point out that 35% of their articles take less than 4 days from submission to publication. As I prepare to stop writing this and press “publish” to instantly release it, I cannot fail to note that immediacy may be just as important as anything else for some researchers – and their readers.
« go back — keep looking »