In the cold, wet and dark of an English summer it can be hard to remember that elsewhere matters digital are progressing at breakneck speed. In these circumstances spending last week in the US and getting sundry updates from European colleagues was like a punch on the nose. Everything and everyone is getting cleverer, and everything that was once the standard and the value  point is now commoditized. Even the poor old science journal, protected treasure of countless publishers , can now be launched out of a box, by any university, research team or laboratory with an internet connection (http://www.scholasticahq.com/). The commoditized article, meanwhile, realizes new potential if you envisage it not as the final outcome of the science process, but as an interface itself to a deeper understanding of the knowledge pathway.

If I had doubted this then I had a rude shock when looking at what Jan Velterop and his colleagues have been up to at Utopia Docs (http://utopiadocs.com/media/introduction/). In its first manifestations this company was about getting more linkage and metadata value from science articles, and I wrote about it under the heading “I Can See So Clearly Now” (https://www.davidworlock.com/?p=903) in October 2011. Then I tucked it into the semantic publishing cubbyhole, until warned by one of Jan’s colleagues that I was in danger of not seeing very clearly now at all. And he was right. Utopia Docs 2.0 is worth consideration by anyone as an operational interface for lab research users. Different to but just as valid as Mendeley, and indeed incorporating information from that service as it shows users the relationships an article may have – from Altmetric as well as Mendeley, from Cross-Ref as well as Sherpa/RoMEO (best recommendation for Open Access – use this to move from article to article, opening each one without recourse to a source site or permissions form or subscription barrier/validation.) But the real joy is in the way it handles figures, graphs and chemical structures: if all of this referencing, rotation, reconstituting can be done with figures, then connecting in complex datasets should be easy as well. Add some “social” – bookmarks and comments from other readers/users – and some searchable aids like Wikipedia cross references and lab products directories, and  then you can see the huge distance travelled from beta. As we all now contemplate the science desktop and the service interfaces which will dominate it, here is a another real contender.

It seems to me that science and technology are now moving rapidly down this knowledge handling and referencing track. Yet everything done here is fully applicable to B2B in general, and that just because “publishing” was different in these narrow segments, there is no reason why information services should be different at all. Looking at Innovadex (www.innovadex.com) this week, I realised that there are now very few business, industrial or scientific sectors without full service vertical search, hugely well attuned to definable client requirements, with service values attached to front and back end. We all still use Google/Bing et al, but when we have heavy duty knowledge-based work to do, there is usually a specialised can-opener to hand now ready to do the job. And these will begin to coalesce with content as the continuing consolidation of our industry takes place. Step up this week’s consolidation case study: IHS and GlobalSpec.

As one who has long carried a torch for GlobalSpec (www.globalspec.com), I want to congratulate Jeff Killeen and his team on an outstanding job, and Warburg Pincus, who have backed this company since 1996 , for extraordinary foresight and resolve en route to this $136m reward. As someone who knew IHS when they had BRS Search and the biggest and most unwieldy filing cabinet in engineering history, I also want to offer recognition credits to the buyer. This really is a winning solution, in the sense that both of these services together now comprise the complete engineering workflow; that over 10 million design briefs and specifications, and some 50k supplier catalogues, 70 e-newsletters, 15 online shows  and 7 million registered users all provide a huge barrier to entry in this sector; that the barrier is as great in terms of the sector vendors as well as sector buyers; and that none of this was based on technology unique to engineering, but on the tools and analytics that are available to all of us in every segment. And it all took 16 years to mature. And it worked because the difference between the winner and a number of losers was simple: the winners understood better than their competitors how engineers worked, how they communicated and how they solved problems and behaved.

And just time for a footnote along these themes. I was fascinated to see the merger of Yippy Inc (www.yippy.com) with MuseGlobal this week. I have known Muse for many years and admired their advocacy of Cloud-based solutions and their patient pursuit of data virtualization solutions at a time when it was only the spooks and the internal security people who were interested. Yippy, formerly Clusty, has a licence from the Vivismo patent (recently bought by IBM, who own 10% of the new company) for the data clustering vehicle, Velocity. So we are in Data-as-a-Service as well as SaaS country here. And here we locate the other trend line which we must watch with care. In this note we have seen user-based solutions bringing public and private content into intense analytical focus on the desktop; we have seen industrial scale vertical search and content alignment resolve workflow issues for professionals; and here we have data solutions which enable major corporates and institutions to impose their own private order on information and intelligence regardless of source. All of these will deploy in all markets at the same time. The clever game will be second-guessing which prevail in which verticals and in what horizontals of organizational size over what time periods.

My personal voyage in the world of software for search and data service development continues. I had the pleasure last week of hearing a Tableau (http://www.tableausoftware.com/) user talk about the benefits of visualization, and came away with a strong view that we do not need to visualize everything. After all, visualization is either a solution – a way of mapping relationships to demonstrate a point not previously understood – or a way of summarizing results in ways that enable us to take them in quickly. I did not think of it as a communication language, and if that is what it is then clearly we are only in the foothills. Pictures do not always sustain narrative, and sometimes we kid ourselves that once we have the data in a graph then we all know what it means. Visualization needs a health warning: “The Surgeon General suggests that before inhaling any visualization you should first check the axes.”! However, when data visualization gets focussed then it becomes really exciting. Check out HG Data (www.hgdata.com), a way of analysing a corporations complete span of relationships:

“While LinkedIn tracks the relationships between people in business, HG Data tracks the underlying relationships between the business entities themselves.”

Now that is a seriously big claim, but you can begin here to see plug-in service values from Big Data which will shape the way we look at companies in future. But my real object this week was elsewhere – in deep and shallow Space. A subject of speculation to me over 20 years ago was whether we would ever be able to analytically control the floods of data beginning to be received from satellites which was inundating space research centres. In its day, this was the first “drinking from the firehose” phenomenon, and it would appear to me retrospectively that we never really cracked this one, as much as learnt to live with our inadequacies. In the intervening time we have become experts at handling very large dataflows, because Google was forced to learn how to do it. And in the intervening years the flood has grown past tsunami, and ceased to be an issue about space research, and become an issue about how we run Earth.

So first lets update on the Space side of things. Those few research satellites that I encountered in 1985 have now been joined, according to Frost and Sullivan, by a vast telemetry and measurement exercise in the skys above us which will result in around 927 satellites by 2020. Some 405 will be for communication, with earth observation (151), Navigation (including automatic aircraft landing) and reconnaisance figuring high. Only 75 will be devoted to the R&D which initially piqued my interest in this. But since the communication, navigation and observation functions will measure accurately down to one metre, we shall inevitably find our lives governed in similar micro-detail by what these digital observers discover.

Now step over and look at SpaceCurve (http://spacecurve.com/). I had the pleasure of speaking to its founder, Andrew Rogers, a week or so ago and came away deeply impressed by the position they have taken up. Andrew is a veteran of Google Earth (and a survivor of the UK Met Office!) He is also a problem solver, big time. Taking the view that Google may have cracked its own problems but were not going to crack anything of this scale, he left, and the result is SpaceCurve:

“Immediately Actionable Intelligence
SpaceCurve will deliver instantaneous intelligence for location-based services, commodities, defense, emergency services and other markets. The company is developing cloud-based Big Data solutions that continuously store and immediately analyze massive amounts of multidimensional geospatial, temporal, sensor network and social graph data.
The new SpaceCurve geospatial-temporal database and graph analysis tools will enable application developers and organizations to leverage the real-time models required for more powerful geospatial and other classes of applications and to extend existing applications.”

As I understand it, what SpaceCurve is about is solving the next generation problem before we have rolled out the current partial solution. This is 2.0 launching before 1.0 is fully out of beta. The problems that Andrew and his colleagues solved in interval indexing and graph analysis are not a part of the current Big Data market leaders output, but they are very much in line with the demands of geospatial data flows. Here real time analytics just do not do the job if they are dependent on column stores assuming an order relationship. The thing to do is to abandon those relationships. SpaceCurve is not just looking at far bigger data environments: it suggests that they cannot be handled in ways that we currently envisage as being “big data”.

Despite the increased size of content handling, SpacCurve see themselves searching in a partially federated manner, since many data holders, and in particular governments, will not allow the data off the premises. Government and corporations share the need to be able to see provenance and determine authenticity, so SpaceCurve’s role in these massive data collections may be in part as an outsourcing custodial authority, looking after the data on the owner’s site. And indeed, the problem for SpaceCurve may be one of which markets it chooses first and where the key interest comes from – government and public usage, or the enterprize markets.

The next major release is due in 2013, so we shall soon find out. Meanwhile, it is striking that a major investor here, Reed Elsevier Ventures, has a parent who invested, through Lexis, in Seisint, also a deeply government aligned environment, and more recently in the Open Source Big Data environment, HPCC. Investing in the next generation is always going to make sense in these fast moving markets.

« go backkeep looking »