Phil Cotter’s comment on last week’s post here really got me going. Now that I know that suicide bombers max their credit cards before setting off to do the deed I somehow feel a gathering sympathy for the security services. So the starting point is 5 million up-to-the-limit cards? We need to funnel cash into predictive analytics urgently if anything we do is to show better results than airport security (to begin from a very low measure indeed). So I began to look for guidelines in the use and development of predictive analytics, thinking that while we wait for terrorist solutions we might at least get a better handle on marketing. I am surprized and impressed by how much good thinking there is available, so in the spirit of a series of blogs last year (Big Data: Six of the Best) here are some starting points on innovative analytics players who all have resonance for those of us who work in publishing, information and media markets. And a warning: the specialized media in these fields all seem to have lists of favoured start-ups enttitled “50 Best players in Data Analytics”, so I am guilty of scratching lightly at the start-up surface here.

In the same spirit of self-denial that drives me to abstain from a love of eating croissants for breakfast, I have also decided to stop using the expression “B** D***”. I am so depressed by publishers asking what it means, and then finding that, because of “definition creep” or “meaning drift”, I have defined it differently from everyone else, including my own last attempted definition, that I am going to cease the usage until the term dies a natural, or gets limited to one sphere of activity. So Data Analytics is my new string bag, and Predictive Analytics is the first field of relevant activity to be placed inside it. Or do I mean Predictive behaviour analytics?

I was very impressed by analysts studying our use of electricity (http://www.datasciencecentral.com/profiles/blogs/want-to-predict-human-behavior-use-these-6-lessons-based-on-data-). Since the work throws up some lessons which we should bear in mind as we push predictive analytics into advertising and marketing. The thought that it was easier to influence human populations through peer pressure and an appeal to altruism, as against offers of “two for one”, cash bonuses and discounts is clearly true, yet our behaviour in marketing and advertising demonstrates that we behave as if the opposite was the case. The emphasis on knowing the industry context – all analytics are contextualised – and the thought that, even today, we tend to try to make the analysis work on insufficient data, are both notions that ring true for me. We need as well to develop some scientific rigour around this type of work, using good scientific method to develop and disprove working hypotheses. Discerning the signal from the noise, like “never stop improving”, are vital, as well as being hard to do. I ended this investigation thinking that even as the science was young, the attitudes of users as customers were even more immature. If we are to get good results we have to school ourselves to ask the right questions – and know which of our expectations are least likely to be met.

Which brings me to the people we should be asking. Amongst the sites and companies that I looked at, many were devoted from differing angles to marketing and advertising. But many took such differing approaches that you could imagine using several in different but aligned contexts. Take a look for example at DataSift (www.datasift.com). It now claims some 70% accuracy (this is a high number) in sentiment tracking, creating an effective toolset for interpreting social data. Here is the answer to those many publishers in the last year who have asked me “what is social media data for, once you have harvested it?” Yet this is completely different from something like SumAll (https://sumall.com), which is a marketeers toolset for data visualization, enabling users to detct and dsiplay the patterns that analysis creates in the data. Then again, marketing people will find MapR (www.mapr.com) fascinating, as a set of tools to support pricing decisions and develop customer experience analytics. Over at Rocket Fuel Inc (www.rocketfuel.com) you can see artificial intelligence being applied to digital advertising. As a great believer in sponsorship, I found their Sponsorship Booster modelling impressive. This player in predictive modelling has venture capital support from a range of players, from Summit to Nokia.

When the data is flowing in real time, different analytical tools are called for, and MemSQL (www.memsql.com) has customers as diverse as Zynga, and Credit Suisse and Morgan Stanley to prove it. Zoomdata (www.zoomdata.com) is a wonderful contextualization environment allowing users to connect data, stream it, visualize it and give end-user access to it – on the fly. This is technology which really could have a transformative effect on the way that you interface your content to end users, and you can demo it on the Data Palette on the site. And finally, do you have enough of the right data? Or does some government office somewhere have data that could immensely improve your results? Check it on Enigma (press.enigma.io), the self-styled “Google of Public Data”, a discovery tool which could change radically product offerings throughout the industry. Perhaps it is significent that the New York Times is an investor here.

So, for the publisher who has built the platform and integrated search, and perhaps begun to develop some custom tools, there is a very heartening message in all of this. A prolific tool set industry is growing up around you at enormous pace, and if these seven culled from the data industry long lists are anything to judge by, the move from commoditized data increasingly free on the network to higher levels of value add which preserve customer retention and enhance brand are well within our grasp.

However, I bet the US government gets you first! As the newspapers (Guardian 07/06/13) reproduce slides for training US security and FBI officials in the use of the data feeds they get from Google, Verizon, AOL, Apple, Facebook et al who under the FAA enactment can now download usage data and user content, the German Federation of Consumer Organizations (VZBV) brought an action against Apple – and won in the Regional Court of Berlin. One of Apple’s crimes was sharing data with subsidiaries, and another was re-using data not directly gathered in the trading activity (for instance, the recipient’s details on a gift certificate sale). If I worked in Apple’s legal department, dedicated to taking no prisoners in any legal wrangle, I would be getting fairly schizoid by now. As indeed I am, whenever I use terms like Open Access, Open Data, Open Society, Open Sesame… and then reflect on the attempt by everyone in the networked society and the network economy to suborn and subvert every data instance into own-able IP.

And now, a word of explanation. My silence here in the last 10 days reflects my listening elsewhere. And speaking – to two Future of Publishing sessions, in London and New York, sponsored by IXXUS (www.Ixxus.com), and at a seminar organized by the University of Southampton’s Web Sciences Doctoral students group (slides are available here). At each of these sessions we discussed the networked society and its implications. And Big Data reared its ugly head, strengthening my resolution never to mention the apparently undefinable term again, but to talk instead of massive data components and the strengthening business of data analytics. But nowhere did we discuss data protection – or revelation – and I regret that, especially now that the brilliant latest issue of DataIQ, the journal of DQM Group (www.dqmgroup.com) has come to hand. On page after page it nails the issues that every data holder should have in mind, and which our networked content industry grievously neglects at its peril. If the FBI don’t get you, the German courts will!

But I am less surprized, on reflection, by the US revelations than I thought I would be, bearing in mind the huge amounts of high level analytics and search software that agencies of the US government have bought over the years. On the one hand we should be grateful that a degree of paranoia has spawned an industry. This is where Seisint (Lexis Risk) came from, here is where Palantir and other software developers have flourished. These software developments were always intended for more than calculating the Presidential expenses or searching the library. The Military/Intelligence complex has been a rich patron for developing many of the tools that the networked society depends upon. On the other side we should reflect that mass observation on this scale is the Orwellian manifestation of a police state, and that those who battle for the liberty of the individual are betrayed if it becomes necessary to infringe that liberty in the cause of protecting it. In saying this, I should also say that I am sure that the UK government would be equally intrusive if they could afford it, and in times like these the natural tendencies of governments to use National Security as the cloak for the erosion of civil liberties is global. But after the emergency, do you ever remember government giving privacy rights back?

Which brings me to the network protection of user data in non-security contexts. Here there can be no doubt about who owns what: the problem is getting people to admit to the obvious. Thus, it seems to me axiomatic that when I use a networked service, then the transactional data that I input remains mine, unless or until I have accepted conditions of service that say otherwise. And even those conditions cannot rob me of my ownership: all they can do is define agreed conditions or re-use for the people I am dealing with at the time. Eventually, in the network, we will each of us be able to isolate every data entry that we make anywhere and store it in a personal DropBox in the Cloud. We will then sell or gift rights of re-use to designated parts to Apple, to market researchers, to the US government as part of a visa waiver application. But we shall at least be in control, and have pushed back on the arrogance of data players who seem to believe that every sign-on is their property. It is this type on “unthinking”, I am sure, that lies behind Bloomberg’s huge intrusion into user rights when they allowed their news team to examine the access records of their clients. I know we do not like Bankers in our society now (Don’t worry, Doctors and Lawyers and Journalists and Politicians – we shall be back for you again just as soon as we have finished off these financiers), but surely no one at Goldman Sachs deserves a call from a news reporter saying “I see you have not used your terminal this week, so are you still employed?”

Here in Europe we pride ourselves on ordering things differently. Our secret weapon is Germany, where, for fairly obvious historical reasons, privacy is now a fetish and data protection has become a goal pursued on behalf of citizens by a lobby of what can only be described as, well, privacy fundamentalists. The current revision of the European Data Protection Directive (95/46/EC) into European law will effectively turn the current opt-out regime in the EU into an opt-in world. Not necessarily a bad thing, says Mark Roy, CEO of the Data Agency in an article in DataIQ. I agree, and I also agree that the right of erasure (the right to be forgotten) is pretty difficult to manage. But the real horror story is the bureaucracy, the checking, the courts and the fines that all of this entails. Somewhere here there has to be a balance between German fanaticism and US laissez faire regarding the rights of individuals to the ownership of their own information. We have never seemed further apart from creating this essential building block of a networked society.

« go backkeep looking »