May
18
Facing up to Father: The pleasures and pains of a Cotswold childhood
Filed Under Uncategorized | 1 Comment
New book by David Worlock. Pre-order now at Marble Hill Publishers or Amazon.
A small Cotswold farm is the setting for a classic struggle of wills. Robert Worlock, eccentric and demanding, resolutely maintains the old ways, determined above all to make his son into a farmer fit to take over the family acres. His son, David, is equally determined not to be bullied into something he neither wants nor likes. His childhood becomes a battleground: can he find a way to make his father love him without denying his right to determine his own life?
Jun
24
The conversation often goes like this:
“What do you think are the most important issues for the information industry today?”
“Well, of course it’s AI, and getting these AI developers to act responsibly around data“.“
You mean, act responsibly and transparently and identify the data used and held in their models?“
“Yes, of course, and acting responsibly also means paying a decent license fee for our data content!”
“Yes, they have to realise that they cannot ignore the powerful legal and moral position of those who hold copyrights in valuable data. “
I too am a firm advocate of data licensing for AI modelling reuse. When IP is used for any purpose, I believe that it has to be recognised, the usage has to be by consent, and that proper acknowledgement in monetary terms needs to be made to recognise the effort and curation involved. In saying this, of course, I also want to make it clear that I know that most data owned by most B2B organisations that use it in information services were not the original IP of these owners, but that the current owners have obtained the data in the course of creating an information service of some sort of another. In doing this, they often edited it, structured it, improved it, added metadata to it and created value as a result. The original owners – governments, private citizens, research organisations, corporate bodies etc create the data by virtue of their existence and their activity, and in some instances need it to be collected and manipulated for reasons of public policy, research and innovation, compliance activity or reputation management. For most information service providers, the date that they have collected is the most valuable commodity in their world – “the oil of the virtual world“. They prize it highly and they think it is unique. They are right to value it, but we are all becoming gradually aware that there is more data in the world than is contained in the worlds commercial databases, the Cloud or even the Internet.
in the course of looking for and trying to map the various emerging data licensing agencies, the breadth of possibility becomes clear. The powerhouse that is CCC, the Copyright Clearance centre (www.copyright.com) is central to everything and is concentrated around scientific and medical data.).ProRata (prorata.ai )builds AI-based attribution and monetization technologies and solutions that credit and compensate content owners for the value of their work. Human Native (humannative.ai ) says“ Better AI starts with better data. We bring together suppliers of high quality, premium data with reputable AI developers—come join the ecosystem“. Created by Humans (createdbyhumans.ai) calls itself.: “The AI rights licensing platform for books“ while Narrativ ( narritiv.ai) is a licensing site for voices and voice data. And the Data Llicensing Alliance run by Dave Myers, is more than four years old, and seeks to build a marketplace of buyers and sellers in STEM data (www.diadata.com).
Yet all of this rich variety exists in the domain of human creativity. The needs in data terms of AI models are not confined to human creativity. The potential use of data derived from machine intelligence now becomes a factor in creating AI models, and just as we have heard about synthetic data in terms of financial services, so we are now beginning to think about synthetic data in terms of AI modelling.. The announcement last week of the funding of. SandboxAQ by Nvidia takes this former Google startup into new territory.
SandboxAQ (www.sandboxaq.) is, it says, “ leading the next wave of enterprise AI with Large Quantitative Models (LQMs) — grounded in physics and built to simulate real-world systems. Across biopharma, chemicals, advanced materials, cybersecurity, healthcare, navigation, and more—LQMs provide the scientific accuracy and computational scale to solve the world’s most complex challenges.“ So, in Financial services and in scientific research and innovation at least, we can make our own data and not be wholly dependent upon the world of owned and traded data. And as this new scenario becomes apparent, some of us will begin to wonder what it’s affect will be unreal world data evaluations.
The use of AI in this way to create logical extensions of existing knowledge is already well established. I notice that the industry is beginning to refer to “synthetic“ data as opposed to the “real world data“ (inevitably,RwD) found in books and journals, government reports and newspapers. Of course, AI businesses, large or small, point to the licensing cost of data as a crippling tax which will restrict innovation. So far it does not seem to have strangled the competitive appetites of Silicon Valley, but will it stop start up innovators in small markets and niche sectors?
It seems that the data industry is thinking about that already. There is serious activity now around the idea of Open Data in this context (or it already exists in Open Science) not just as a way of sharing datasets amongst researchers, but also as a way of using Open Data to help small scale developers in build effective models without severe licensing costs. Common Pile vo1 is a development of this type.(https://huggingface.co/blog). The duty of ensuring that data is complete, accurate, and has not been distorted or polluted is a vital one, and ensuring that building effective models is not limited to the developers who have the deepest pockets is important as well. The huge collaboration that has built the common pile ( University of Toronto and Vector Institute, Hugging Face, the Allen Institute for Artificial Intelligence, Teraflop AI, Cornell University, MIT, CMU, Lila Sciences, poolside, University of Maryland, College Park, and Lawrence Livermore National Laboratory) are trying to build public standards in terms of both quality of data and of transparency. We should all be grateful for their work.
So now we have data in a variety of forms. Information industry data that can be exchanged and traded shares the business of AI model development with Open Data resources built and released for the very purpose, and with AI created data built as a way of testing probability and computing the logical data extensions of the world we already know., Is this also a pointer towards the ability of the machines to create the resources required by the machine. Perhaps we should be thinking not just about the value of data and data licensing transactions, but also about the duration and lifespan of data licensing markets themselves.
May
26
This is what it comes down to, and you can take my word for it. I have been commentating upon the marketplace for information for 40 years now and I know well enough that you no one listens unless you either anticipate the death of something or predict a revolution in something else. No half measures will do. Gradual evolution? Progressive change? These cut no ice with investors or their bankers, or with heritage or legacy businesses or startup entrepreneurs. And unless we have death or revolution, we do not get the angst-ridden, guilt-induced introversion that reflects the middle and senior management of successful, stable and progressive companies in slow changing markets. The threat of death or revolution makes them worried and keeps strategy consultants employed. I have been wonderfully indebted to death or revolution these 40 long years.
This system, so beneficial to consultants, works well as long as we call out the right deaths and name the right revolutions. If we get it wrong then harms far worse than a misguided strategy evaluation could be the result. And now, perhaps , we are in imminent danger of getting it very wrong. We seem to be concentrating upon the localised effects of the application of generative AI in particular to both publishing processes and to the use of AI in new product development. We are also, quite rightly, deeply concerned with data licensing and trying to prop up the copyright conventions that have underpinned the information marketplace for so long.. We do not seem to be looking at the effect of AI more broadly on the users of published information. The equivalent injunction to “follow the money“ in information markets used to be “follow the eyeballs“. Has AI, and what it does for us, had the effect of distracting us from paying total attention to what our users are actually doing – with AI?
Perhaps then we need to look beyond the way in which “death or revolution“ discussions have taken place in the past. surely this whole idea of an information based society in which information creators are served by intermediaries who are value and aggregation before creating commercial relationships with users seems like what it is – the last relic of the Gutenberg age. we need a new model that reflects the dynamic relationships of the network, the complete personalisation and customisation of content as data, and the ability of the ultimate user to add the ultimate values required.
Much of my past 40 years has been spent helping and advising, among others, the publishers of scholarly journals focused on scientific research and discovery. This sector has almost always been a belwether place where signs of the future may be detected. Early and intelligent users of new technologies abound. Big issues include integrity, where AI plays a role on both sides of the equation. Other concerns are focused on processing and it’s costs: AI will be influential here. Relatively fewer people are concerned with an issue which seems to me quite central: every year fewer and fewer researchers are reading original articles. AI summarisation, and the overall increases in article generation have meant that fewer researchers have time or energy to read the bulk of new material published in their discipline. The resulting paradox – more and more articles produced but less less human readers – seems to me to add up to a perfect“death of publishing“ argument. My view, for the past five years, has been that we will create a self publishing environment in which acts of verification, peer review and value estimation take place quite separately from the initial appearance of scholarly findings as self posted articles in pre-print servers, in blogs or in other postings. The commercial activity will not be in publishing, it will be in software and data services.
Another field in which I have found myself working regularly over these years has been in credit reporting and credit rating. Here is a world which seemingly depends upon standards, data frameworks and criteria and strict rules of verification. Companies working in this field have been at pains to build trusted brands which allow users to build faith in their ability to maintain the standards. Tem years ago at a conference in Hong Kong , I argued that intelligent software would one day replace branded services. Today I feel that door is being pushed open, though not yet to the point where trading companies are able to use their own evaluations to create trusted partnerships. But still, the question is now on the table – do we need an intermediary to establish trust between trading partners?
In 1979 I left what was then conventional publishing to start work in a legal information retrieval initiative. A start up in almost every sense of the word. We put the entirety of the laws and statutes, and the historical case law, of the United Kingdom onto a computer and lawyers searched it through a landline and modem. Things change. The world in which I once strived and struggled is no longer about information retrieval, but entirely concerned with the delivery of legal services and solutions. Today’s players make law practices and corporate counsel more effective, but, as has always been the threat, technological change may remove the intermediary. It is not surprising now to find London Magic Circle law practices whose technology is as advanced as that of their technology suppliers. AI does that. it removes the knowledge and power balance between the user and the supplier.
And didn’t it always do that? Is this not the Internet revolution at last coming home to roost? A network of relationships changes those relationships. A virtual network is very different from a real world network. In the late 1990s and in the early years of this century we spoke the language of “disintermediation“. Then we forgot to watch it happening. We see the network effect in our own lives every day. Very old people who clearly recall secretaries and typewriters, younger people who can remember going to see a bank manager, or any of the host of in-person services which have now collapsed into the network . intermediation is ending and a successful navigation of our world is increasingly are driven by the time and effort expended by the individual end user, enabled and increasingly supported by AI. I am not saying that this is wrong or bad: I just want to notice the difference and the way in which AI develops and expands to fit the needs and requirements of ultimate users. It is at least possible that, at some future date, one of the quality of life factors which may be vitally important to each individual, alongside clean air and water, access to electricity and Internet bandwidth, will be the quality of AI available and affordable to each of us.
keep looking »