Dec
31
Simple Rules for New Years Blogging
Filed Under Artificial intelligence, Big Data, Blog, data analytics, Education, Industry Analysis, internet, machine learning, mobile content, news media, Uncategorized, Workflow | Leave a Comment
Apologies to those kind readers who expected an earlier interjection in December. Truth to tell, I was speechless. Caught somewhere between astonishment at my fellow countrymen’s mania for national self harming, my own complete self-identification historically, culturally and pschychologically as a. “European”, and impatience with all the wise and honest Americans who I know and who cannot collectively somehow re-enact the Emperors clothes nursery tale, there suddenly seemed nothing left to say worth saying, least of all around the topic of electronic information and digital society.
But then I returned to Nova Scotia again for the holidays, and in its clear, cold, sunny air it seems a dereliction of a bloggers duty not to have a message at New Year. And by dint of looking over everyone’s shoulders, I see that Rule One of the New Year message is to make a recommendation, preferably to nominate something as the something of the Year. And as it happens I do have a Book of the Year for this information industry. Please read The Catalogue of Ship-wrecked Books, by Edward Wilson Lee. The inevitable pesky publishers sub-title in the US purports to sell it as a book about Christopher Columbus and his son, but the UK edition hits the point – it is about the attempt by Columbus’s son to build a universal library in Seville, getting royal patronage and setting up buying agents in the great early cities of print to create an early Internet Archive, making available a stream of knowledge as rich as the gold and silver of Peru and Mexico just then flowing into the royal coffers.
The attempt fails of course, but it does set off arguments about the nature of Knowledge which we need to keep having as we dimly perceive the arrival of the leading edge of the development of knowledge products and solutions. And here comes Rule Two: Issue a Warning. And here is mine – Refrain in 2019 from labelling everything you see as AI sourced, related or derived. We are still in the Colon Columbus stage in building the universal knowledge base. Let’s save AI as a term for when AI arrives. Many people are doing really clever things, but they are at best embryonic knowledge products. We are really quite far away from new knowledge created in a machine-driven context without human intervention. Indeed we are still a long way from getting enough information as metadata in a machine understandable form, and when we do we usually do not understand what we have done.
So here comes Rule Three: declare a News Story of the Year. And here is mine. The gracious acknowledgement by Google that their automated recruitment system, which analyses thousands of CVs to produce the best candidates, had a male bias built in to it. And of course it did! Feed the past into an expert system and it replicates the flaws of the past. And its not that the systems doing the analytics are not clever, its just that the dumb data and the dumb documents are not as dumb as we think, and in fact they are larded with all of the mistakes we have ever made. And we need to know that before we evaluate the outcomes as Intelligent, or even believable.
And if we need to be careful about the nature of the information we are using, we need to deal in known quantities. Rule Four: try to make an insight. Mine concerns differentiating between data and documents. The other night, as one does on cold and isolated coastline, we fell to discussing derivations. My wife produced her weightlifters copy of Merriam Webster, and we got into derivations old-style. Datum, neutral, is always related to single objects of an incontrovertible nature. Docuumentum carries the idea of learning throughout its history. When we talk about content-as-data, what do we really mean? And when we talk about AI, do we speak of Intelligence created by machines deriving knowledge from pure data, or of machines learning from knowledge available, fallacies and all, in order to postulate new knowledge? We do need to be clear about our, as derived from our inputs, or we will surely be disappointed by what happens next. We need to start listening very carefully to conversations about concept analysis, concept-based searching and conceptual analysis.
Which logically brings me to Rule Five. End with a prediction. Mine would concern a question I asked in several sessions at Frankfurt this year and have had little but confusion as a result. My question was “What proportion of your readership is machines, and what economic benefits does that readership bring to you?”. I think machine readership will become much more important in 2019, as we seek to monetise it and as we seek to evaluate what content in context means in the context of analytical systems. So just as none of us knew how many machines were reading us this year, next year I think most of us will be aware. And whether those were just browsers, or bots, or knowledge harvesters, or what?
And then I notice there is a Rule Six. You end by wishing every kind reader who reaches this point a happy, healthy and prosperous New Year, which I do for all in 2019. After all, using my rule-based system this column could be written by a machine next year – and read by one too!
Oct
26
Towards Self-Publishing in Scholarly Communications
Filed Under Artificial intelligence, Big Data, Blog, data analytics, eBook, Industry Analysis, internet, machine learning, Publishing, Reed Elsevier, semantic web, STM, Uncategorized, Workflow | Leave a Comment
Standing in the crowded halls of the Frankfurt Book Fair is as good a place as any to phantasize about the coming world of self-publishing. After detailed discussion about Plan S, or DUL or Open Access books one can easily become waterlogged by the huge, social, political and commercial pressures built up in our methodologies of getting work to readers. In general terms, intermediaries arise where process is so complex that neither originators nor ultimate users can cope with it without facilitation. So publishing in Europe was a refinement of the eighteenth century role of booksellers, adding selection and financing to the existing self-publishing model of booksellers. In the next two centuries the new business model became so entrenched – and, for some, so profitable, that their successors behaved as if it was ordained by God and nature, and would live for ever. Intermediation will indeed probably be required as far as we can predict, but it is certain to change, and it is not certain to include all of the roles that publishers cherish most deeply.
Two episodes this week re-inforce some of these feelings. In one instance, the scholarly market software company Redlink (https://redlink.com/university-of-california-press-adds-remarq/) announces an agreement to supply its software toolset to the University of California Press. Nothing unusual here, but something symptomatic. More and more publishers using clever tools to heighten value and increase discoverability. But those software tools are becoming more and more “democratic” – they can be used in good machine learning contexts to help to generate more technical skill at different points in the network, both before and after the “publishing process”. In other words, the more it becomes clear to, say, a funder or a research unit or a university that the divine mystery of publishing devolves to a set of software protocols, the more likely it is, given that publishers cannot control digital dissemination, that the control point for content release will migrate elsewhere. In a previous note I mentioned UNSILO’s manuscript evaluation system with very much the same thought in mind – while the pressure is on traditional publishing to arm themselves with the new intelligent tools for competitive purposes as well as to increase speed and reduce cost, these tools also contain the seeds of a transition to a place where research teams, institutions and finders can do the publishing bit effectively for themselves. So the question left on the table is – what other parts of the processes of scholarly communication are left requiring intermediary support?
And so the struggle now is to look at the those other parts of the scholarly research and communications process that are hard to gather and bring into focus and analysis. It was interesting in this light to reflect that Web of Science Group and Digital Science are already well down this track. Gathering together peer review and making sense of it (Publons) is the sort of thing that only an outside agency can do effectively, just as collecting and analysing posters (Morrissier) will release layers of value previously unrecognised. And while many bemoan the abject failures in optimizing research funding through effective dissemination and impact of the results, only Kudos have really grasped the nettle and begun to build effective dissemination planning processes. But how can these interventions be put together and scaled? And how can we ensure the dataflows are unpolluted by self-promotion or lack of verification and validation?
Some of these questions cannot be resolved now, but do we know that we are at least moving in a self-publishing direction? Well, Gates Foundation and Wellcome – and perhaps Horizon 2020 seem to think so, even if they use intermediaries to help them. Researchers and academics are substantially self publishers already, producing posters, blogs, tweets, annotations, videos, evidential data, letters and articles online with little assistance. And it was interesting to see the Bowker report of last week which indicated a 38% growth in self-publishing last year to over a million new books in both print and e-publishing, though ebooks are doing far less impressively. And then:
“Since 2012, the number of ISBNs assigned to self-published titles has grown 156 percent. “
http://www.bowker.com/news/2018/New-Record-More-than-1-Million-Books-Self-Published-in-2017.html
Of course, this may just reflect consumer trends, but such trends alter attitudes to the possible in other sectors. Certainly the economic impossibility of the academic monograph in many fields will be affected by the growth of a library crowd funding model (Knowledge Unlatched) and this will extend to departmental and institutional publishing in time.
So I left my 51st Frankfurt in buoyant mood, thinking of the day when it is renamed the the Frankfurt Publishing Software and Solutions Fair, held on one floor, and I can once again get into the bar of the Hessicher Hof with a half decent chance of getting a seat – and a drink!
And then, as I was finishing, came this https://osc.universityofcalifornia.edu/2018/10/open-source-for-open-access-the-editoria-story-so-far/
« go back — keep looking »