Data: Re-Use, Waste and Neglect

Filed Under B2B, Big Data, Blog, data analytics, Industry Analysis, internet, machine learning, Publishing, semantic web, social media, STM, Uncategorized | Leave a Comment

We live in fevered times. What happens at the top cascades. This must be the explanation for why revered colleagues like Richard Poynder and Kent Anderson are conducting Mueller – style enquiries into OA (Open Access). And they do make a splendidly contrasting pair of prosecutors, like some Alice in Wonderland trial where off-with-his-head is a paragraph summary, not a judgement. Richard (https://poynder.blogspot.com/2018/11/the-oa-interviews-frances-pinter.html). wants to get for-profit out of OA, presumably not planning to be around when the foundation money dries up and new technology investment is needed. Kent defends vigorously the right of academic authors to make money from their work for people other than themselves, and is busy, in the wonderful Geyser (thegeyser@substack.com) journal sniffing the dustbins of Zurich to find “collusion” between the Swiss Frontiers and the EU. Take a dash of Brexit, add some Trumpian bitters, the zest of rumour, shake well and pour into a scholarly communications sized glass. Perfect cocktail for the long winter nights. We should be grateful to them both.

But perhaps we should not be too distracted. For me, the month since I last blogged on Plan S and got a full postbag of polite dissension, has been one of penitent reflection on the state of our new data-driven information marketplace as a whole. In the midst of this. Wellcome announced its Data Re-Use prize, which seems to me to exemplify much of the problem. (https://wellcome.ac.uk/news/new-wellcome-data-re-use-prizes-help-unlock-value-research?utm_source=linkedin&utm_medium=o-wellcome&utm_campaign=). Our recognition of data has not properly moved on from our content years. The opportunities to merge, overlap, drill down through, mine together related data sets are huge. The ability to create new knowledge as a result has profound implications. But we are still on the nursery slopes when it comes to making real inroads into the issues, and while data and text mining techniques are evolving at speed, the licensing of access and the ownership of outcomes still pose real problems. We will not be a data driven society until sector data sources have agreed protocols on these issues. Too much data behind paywalls creates ongoing issues for owners as well as users. Unexploited data is valueless.

It’s not as if we have collected all the data in the marketplace anyway. At this year’s NOAH conference in London at the beginning of the month I watched a trio of start-ups in the HR space present, and then realised that they were all using the same data collected differently. There has to be an easier way of pooling data in our society, ensuring privacy protection but also aligning clean resources for re-use using different analytics and market targets to create different service entities. Lets hope the Wellcome thinking is pervasive, but then my NOAH attention went elsewhere as I found myself in a fascinating conversation about a project which is re-utilising a line of content as data that has been gratuitously ignored. And in scholarly communication, one of the best ploughed fields on the data farm.

Morressier, co-founded in Berlin by Sami Benchekroun, with whom I had the conversation, is a startling example of the cross-over utility of neglected data. With Justus Weweler, Sami has concerned himself with the indicative data you would need to give evaluated.

Progress reporting on early stage science. Posters, conference agendas, seminar announcements, links to slide sets – Morressier is exploring the hinterland of emerging science, enabling researchers and funders to gauge how advanced work programmes are and how they can Map the emerging terrain in which they work. Just when we imagined that every centimetre of the scholarly communication workflow had been fully covered, here comes a further chapter, full of real promise, whose angels include four of the smartest minds in scholarly information, morressier.com is clearly one to watch.

And one to give us heart. There really are no sectors where data has been so eked out that no further possibilities, especially of adding value through recombination with other data, in fact, in my daily rounds, I usually find that the opposite is true. Marketing feedback data is still often held aloof from service data, few can get an object based view of how data is being consumed. And if this is true at the micro level in terms of feedback, events companies have been particularly profligate with data collection, assessment and re- use And while this is changing it still does not have the priority it needs. Calling user data “exhaust” does not help: we need a catalytic converter to make it effective when used with other data in a different context.

When we have all the data and we are re-combining it effectively, we shall begin to see the real problems emerge. And they will not be the access and re-use issues of today, but the quality, disambiguation and “fake” data problems we are all beginning to experience now and which will not go away, Industry co-operation will be even more needed, and some players will have to build a business model around quality control. The arrival of the data driven marketplace is not a press release, but a complex and difficult birth process.

Oct

26

Towards Self-Publishing in Scholarly Communications

Filed Under Artificial intelligence, Big Data, Blog, data analytics, eBook, Industry Analysis, internet, machine learning, Publishing, Reed Elsevier, semantic web, STM, Uncategorized, Workflow | Leave a Comment

Standing in the crowded halls of the Frankfurt Book Fair is as good a place as any to phantasize about the coming world of self-publishing. After detailed discussion about Plan S, or DUL or Open Access books one can easily become waterlogged by the huge, social, political and commercial pressures built up in our methodologies of getting work to readers. In general terms, intermediaries arise where process is so complex that neither originators nor ultimate users can cope with it without facilitation. So publishing in Europe was a refinement of the eighteenth century role of booksellers, adding selection and financing to the existing self-publishing model of booksellers. In the next two centuries the new business model became so entrenched – and, for some, so profitable, that their successors behaved as if it was ordained by God and nature, and would live for ever. Intermediation will indeed probably be required as far as we can predict, but it is certain to change, and it is not certain to include all of the roles that publishers cherish most deeply.

Two episodes this week re-inforce some of these feelings. In one instance, the scholarly market software company Redlink (https://redlink.com/university-of-california-press-adds-remarq/) announces an agreement to supply its software toolset to the University of California Press. Nothing unusual here, but something symptomatic. More and more publishers using clever tools to heighten value and increase discoverability. But those software tools are becoming more and more “democratic” – they can be used in good machine learning contexts to help to generate more technical skill at different points in the network, both before and after the “publishing process”. In other words, the more it becomes clear to, say, a funder or a research unit or a university that the divine mystery of publishing devolves to a set of software protocols, the more likely it is, given that publishers cannot control digital dissemination, that the control point for content release will migrate elsewhere. In a previous note I mentioned UNSILO’s manuscript evaluation system with very much the same thought in mind – while the pressure is on traditional publishing to arm themselves with the new intelligent tools for competitive purposes as well as to increase speed and reduce cost, these tools also contain the seeds of a transition to a place where research teams, institutions and finders can do the publishing bit effectively for themselves. So the question left on the table is – what other parts of the processes of scholarly communication are left requiring intermediary support?

And so the struggle now is to look at the those other parts of the scholarly research and communications process that are hard to gather and bring into focus and analysis. It was interesting in this light to reflect that Web of Science Group and Digital Science are already well down this track. Gathering together peer review and making sense of it (Publons) is the sort of thing that only an outside agency can do effectively, just as collecting and analysing posters (Morrissier) will release layers of value previously unrecognised. And while many bemoan the abject failures in optimizing research funding through effective dissemination and impact of the results, only Kudos have really grasped the nettle and begun to build effective dissemination planning processes. But how can these interventions be put together and scaled? And how can we ensure the dataflows are unpolluted by self-promotion or lack of verification and validation?

Some of these questions cannot be resolved now, but do we know that we are at least moving in a self-publishing direction? Well, Gates Foundation and Wellcome – and perhaps Horizon 2020 seem to think so, even if they use intermediaries to help them. Researchers and academics are substantially self publishers already, producing posters, blogs, tweets, annotations, videos, evidential data, letters and articles online with little assistance. And it was interesting to see the Bowker report of last week which indicated a 38% growth in self-publishing last year to over a million new books in both print and e-publishing, though ebooks are doing far less impressively. And then:

“Since 2012, the number of ISBNs assigned to self-published titles has grown 156 percent. “

http://www.bowker.com/news/2018/New-Record-More-than-1-Million-Books-Self-Published-in-2017.html

Of course, this may just reflect consumer trends, but such trends alter attitudes to the possible in other sectors. Certainly the economic impossibility of the academic monograph in many fields will be affected by the growth of a library crowd funding model (Knowledge Unlatched) and this will extend to departmental and institutional publishing in time.

So I left my 51st Frankfurt in buoyant mood, thinking of the day when it is renamed the the Frankfurt Publishing Software and Solutions Fair, held on one floor, and I can once again get into the bar of the Hessicher Hof with a half decent chance of getting a seat – and a drink!

And then, as I was finishing, came this https://osc.universityofcalifornia.edu/2018/10/open-source-for-open-access-the-editoria-story-so-far/

Open Source for Open Access: The Editoria Story So Far

« go back — keep looking »

Nov

29

Data: Re-Use, Waste and Neglect

Oct

26

Towards Self-Publishing in Scholarly Communications

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin