Jan
12
Take the Program to the Data
Filed Under B2B, Big Data, Blog, Financial services, healthcare, Industry Analysis, Publishing, Reed Elsevier, Search, semantic web, Uncategorized, Workflow | 1 Comment
Its Big Data week, yet again. In the last two months we have seen all of the dramas and confusions attendant upon emerging markets, yet none of the emerging clarity which one might expect when a total sea change is taking place in the way in which we extract value from data content. Then this week, with all the aplomb of an elephant determined not to be left behind in a world which has apparently decided that the hula hoop is the only route to sanity, Oracle announced its enterprize Big Data solution. Again. Only now it is called the Big Data Appliance. It started shipping on Tuesday. And the world will never be the same again.
At the heart of the Oracle launch is a Hadoop license. This baby elephant lies at the heart of almost everything. The two Hadoop – based commercializations, have both raised finance in the lead-up to 2012: Cloudera ($40m) and Hortonworks ($20m), while other sector players like MapR who also exploit Hadoop found 2011 a really good time to raise money. And this had a radiating effect on the whole data handling sector. Neo 4j, a database technology (NeoTechnology, based in Malmo and Menlo Park) for graph storage and resolution raised $10m in a round led by Fidelity. Meanwhile, Microsoft signed a deal with Horton works, IBM said it would launch Hadoop in the Cloud, EMC (Greenplum) went for MapR, Dell announced a Hadoop-based initiative, and the world waits and wonders what Hewlett Packard will do, now that it has Autonomy for analytics.
So now we have plenty of initiatives, and, as usual, not much idea of who the next generation of users will be. The first generation speak for themselves. We can see the benefits that Facebook derive from being able to used Hadoop-based tools to find connections and meanings in their content that would have been impossible to cost-effectively reveal in a prior age. And the same would be true of such unlikely bedfellows as the Department of Homeland Security, or Walmart, or Sony (think Playstation Network), or the Israeli Defence Force, or the US insurance industry (via Lexis Risk), or Lexis Nexis (who announced a Big Data integration with MarkLogic), let alone the two players who effectively started all this: Yahoo! (Hadoop) and Google (MapReduce). So asking where it goes next is a legitimate question, but one which can only be answered if we accept that the next group of users are never going to recreate the Google server farms in order to break into these advantageous processing environments. The next group of intensive users will have their XML content on MarkLogic, or their graphical data on Neo 4j. They will want to use the US census data remotely (so will contract with Amazon for process time on the Amazon web presence), and will use a large variety of third party content held in similar ways. Some of their own content will still be held locally on MySQL databases – like Facebook – while others will be working in part or fully in the Cloud, and combining that with their own NoSQL applications. But the essential point here is that no one will be building huge data warehousing operations governed by rigid and mechanistic filing structures. Literally, we are increasingly leaving the data where it is, and bringing the analytical software to it, in order to produce results that are independent of any single data source.
And this too produces another sort of revolution. The front door to working in this way is now the organizational software itself. When Lexis Risk announced at the end of last year that they were going to take HPCC open source, a number of critics saw that as turning their back to an exploitation opportunity. Yet it makes very real sense in the context of Oracle, Microsoft and IBM seeking to build their own “solutions”. Some businesses will want to run their own solutions, and will make a choice between open source Hadoop and open source HPCC. Others in systems integration will seek out open source environments to create unique propositions. But since it was always unlikely that Lexis Risk was going to challenge the enterprize software players in their own bailiwick, then open source is a way of getting a following, harvesting vital feedback, and earn not insignificant returns in servicing and upgrading users.
I am also delighted to see that other winners seem likely to be MarkLogic, since I have been proud of working with them and speaking at their meetings for a number of years. For publishers and information providers, it is now clear that XML remains the route forward. But MarkLogic 5 is clearly being positioned as the information service providers socket for plugging into the Big Data environment. Anyone who believes that scientists will NOT want to analyse all data in a segment, or engineers source all relevant briefs with their ancilliary information, or lawyers cross examine all documentation regardless of location, or pharma companies examine research files in the context of contra-indications should stop reading now and take up fishing. My observation is that Big Data is like Due Diligence: once someone does it, even if the first results are not impressive, all competitors have to do it. The risk of not trying to find the indicative answer by the most advanced methods is too great to take.
Comments
1 Comment so far
[…] To read the full story click on the link This entry was posted in BIG DATA, Business Information and tagged BIG DATA, David Worlick Blog, It is Big Data Week. Bookmark the permalink. ← SME Access to Finance in the USA: Small Business Lending Fund Successful but Future Uncertain Lexis Nexis Introduces Lexis Practice Advisor → […]