Apr
19
Scholarly Communications: the View from Fiesole
Filed Under Artificial intelligence, Big Data, Blog, data analytics, Industry Analysis, internet, machine learning, mobile content, Publishing, Reed Elsevier, semantic web, Uncategorized, Workflow | Leave a Comment
You can see a long way from Fiesole. John Milton, in Paradise Lost, remembered the red orb of the sun sinking over the Tuscan hills and likened it to the burnished shield on Satan’s back as he is cast into Hell in Paradise Lost. Some of the delegates at the annual Fiesole Retreat, looking at Open Access and the future of scholarly communication, may have felt similarly cast down, but, if so, they kept it to themselves in a meeting, celebrating its 21st birthday, that lived up to a reputation for real debate, direct speaking, but total respect for the positions of delegates from all sides of the scholarly information workflow. This meeting, a joint venture of Casalini Libri and the Charleston library conference, was at its very best as the European Commission, critically important library interests, publishers of all disciplines, and OA providers alongside traditional subscription journals all contributed viewpoints on a developing situation in scholarly communications which desperately needs the debate engendered here.
As an observer of the debate and anchorman for the ensuing discussion I have waited ten days before adding my own view to all this. In truth, I cannot sum up the complexity and detail, or render the passion and eloquence of many of the arguments. But the cumulative effect on me was to sharpen the conclusion that I was witnessing something coming, however slowly, to an end. The debate about OA and Plan S is not an end in itself. Subscription publishing will never reassert itself and OA disappear. Nor will the world slowly become totally OA. The changes and the debate point to bigger and more fundamental changes. I was left feeling that just as we have been through Digital Replacement – all paper based content went digital – followed by Digital Transformation – the workflows and processes went digital and became wholly network interconnected – we now approach Digital Re-invention – in which the forms and artefacts of the analogue world themselves give way to digital connectivity which not only alters relationships in the network, but introduces the computer, the machine as reader and researcher, into the workflow.
We are now in a situation where the old generalities are becoming useless. STM and HSS are near meaningless, given the differences between Life Sciences and Physics, or Chemistry, as research communication fields. Likewise statistical social sciences and humanities. And when I asked what the identifiable critical information problems of scientists were I got two answers – Reproducibility and Methodology. In other words, researchers were anxious to repeat previous experiments using the same or different data or conditions in order to see if results were the same, and they wished to explore the methods used by successful experiments in order to justify a choice of methodology. Response to these demands requires that all of the data is available and connected by metadata, which is evidently not the case. And of course, specialist services will come into play to meet the needs – in these cases protocols.io, and Ripeta and Gigantum (both new members at Digital Science). These are the type of tools that researchers will use. So what about the books, journals, articles? Who will read them? The answer of course is the intelligent machine, and the nomenclature will change as it becomes obvious that the machine is only interested in content-as-data, not in format at all.
I asked, again and in vain, whether any publishers present had an idea of the current proportion of usage made by non search bot machines. But the fact is we are not measuring this. And we all nodded when someone said the next generation just want to get the preprint done and stop there – getting something into the network with a growing confidence that it will be found seems to be the thing. We are certainly getting smarter at measuring impact and dissemination, though still behind the curve in accomplishing those vital matters. And, Lordy, Lordy, we do have an industry hang up about the way academics are rewarded with tenure and grant support. Is it so frightening for us to imagine change here because we have hung the future of academic publishing around the neck of an archaic system of academic rewards? Why is it that we always think that change only occurs in our sub sector and the rest of the world stays constant? There is already movement around impact factors in academic review systems. The very fact of PlanS shows funders getting more interested in measuring impact and increasing dissemination. The only certainty about a network is that when one position alters, so do all the rest.
So my concerns about this sector remain more about the pace of change than the direction. Work like the eLife Reproducible Document Stack (RDS) is fascinating in this regard – will we interconnect the research lab manuals and review the work in progress at some point? Or will publishing be an automated function of the RDS in time. Whatever happens, we will always need the presence of cross industry multi-disciplinary groups like Fiesole to get the vital perspective, the view from a hill.
Dec
31
Simple Rules for New Years Blogging
Filed Under Artificial intelligence, Big Data, Blog, data analytics, Education, Industry Analysis, internet, machine learning, mobile content, news media, Uncategorized, Workflow | Leave a Comment
Apologies to those kind readers who expected an earlier interjection in December. Truth to tell, I was speechless. Caught somewhere between astonishment at my fellow countrymen’s mania for national self harming, my own complete self-identification historically, culturally and pschychologically as a. “European”, and impatience with all the wise and honest Americans who I know and who cannot collectively somehow re-enact the Emperors clothes nursery tale, there suddenly seemed nothing left to say worth saying, least of all around the topic of electronic information and digital society.
But then I returned to Nova Scotia again for the holidays, and in its clear, cold, sunny air it seems a dereliction of a bloggers duty not to have a message at New Year. And by dint of looking over everyone’s shoulders, I see that Rule One of the New Year message is to make a recommendation, preferably to nominate something as the something of the Year. And as it happens I do have a Book of the Year for this information industry. Please read The Catalogue of Ship-wrecked Books, by Edward Wilson Lee. The inevitable pesky publishers sub-title in the US purports to sell it as a book about Christopher Columbus and his son, but the UK edition hits the point – it is about the attempt by Columbus’s son to build a universal library in Seville, getting royal patronage and setting up buying agents in the great early cities of print to create an early Internet Archive, making available a stream of knowledge as rich as the gold and silver of Peru and Mexico just then flowing into the royal coffers.
The attempt fails of course, but it does set off arguments about the nature of Knowledge which we need to keep having as we dimly perceive the arrival of the leading edge of the development of knowledge products and solutions. And here comes Rule Two: Issue a Warning. And here is mine – Refrain in 2019 from labelling everything you see as AI sourced, related or derived. We are still in the Colon Columbus stage in building the universal knowledge base. Let’s save AI as a term for when AI arrives. Many people are doing really clever things, but they are at best embryonic knowledge products. We are really quite far away from new knowledge created in a machine-driven context without human intervention. Indeed we are still a long way from getting enough information as metadata in a machine understandable form, and when we do we usually do not understand what we have done.
So here comes Rule Three: declare a News Story of the Year. And here is mine. The gracious acknowledgement by Google that their automated recruitment system, which analyses thousands of CVs to produce the best candidates, had a male bias built in to it. And of course it did! Feed the past into an expert system and it replicates the flaws of the past. And its not that the systems doing the analytics are not clever, its just that the dumb data and the dumb documents are not as dumb as we think, and in fact they are larded with all of the mistakes we have ever made. And we need to know that before we evaluate the outcomes as Intelligent, or even believable.
And if we need to be careful about the nature of the information we are using, we need to deal in known quantities. Rule Four: try to make an insight. Mine concerns differentiating between data and documents. The other night, as one does on cold and isolated coastline, we fell to discussing derivations. My wife produced her weightlifters copy of Merriam Webster, and we got into derivations old-style. Datum, neutral, is always related to single objects of an incontrovertible nature. Docuumentum carries the idea of learning throughout its history. When we talk about content-as-data, what do we really mean? And when we talk about AI, do we speak of Intelligence created by machines deriving knowledge from pure data, or of machines learning from knowledge available, fallacies and all, in order to postulate new knowledge? We do need to be clear about our, as derived from our inputs, or we will surely be disappointed by what happens next. We need to start listening very carefully to conversations about concept analysis, concept-based searching and conceptual analysis.
Which logically brings me to Rule Five. End with a prediction. Mine would concern a question I asked in several sessions at Frankfurt this year and have had little but confusion as a result. My question was “What proportion of your readership is machines, and what economic benefits does that readership bring to you?”. I think machine readership will become much more important in 2019, as we seek to monetise it and as we seek to evaluate what content in context means in the context of analytical systems. So just as none of us knew how many machines were reading us this year, next year I think most of us will be aware. And whether those were just browsers, or bots, or knowledge harvesters, or what?
And then I notice there is a Rule Six. You end by wishing every kind reader who reaches this point a happy, healthy and prosperous New Year, which I do for all in 2019. After all, using my rule-based system this column could be written by a machine next year – and read by one too!
keep looking »