Of Big Data, Elephants, and Serendip(ity)

Big Data fascinates me

Ever since my technology adolescence days, when I was studying data structures, machine learning and knowledge-based systems at UTC (Compiegne, France), data has been core to all the hardware and software businesses I was involved in. Whether it was UNIX servers (and the new data structures, file systems and databases they brought with them), business activity monitoring, telecommunications billing systems, speech recognition, or mobile image processing, data was central.

Startups founders I advise these days will tell you how much of a b*** I am about the data model when developing a new service. With all due respect, those who start development with the UI and think they will just plug in some database to support their service are in for serious surprises — not to mention costs.

The latest example that comes to my mind is Vitogo, a self-tracking workout app for iPhone. Great interface, great design, but… it doesn’t work. According to the company, the app is now undergoing a “major rewriting” that has been taking months so far. I suspect the data model (or lack of a well though-out model, from inception) is central to the issue.  Anyone familiar with what is going on at Vitogo, please comment or correct my assumption.

So when the industry started to talk about Big Data a few years ago, I got Big Interest. I won’t spiel here a tutorial about what Big Data is, for fear of boring most of you. Enough to quote Noreen Burlingame in her well-written, concise “Little Book of Big Data” (2012; available here in a kindle reader version):

“Every day of the week, we create 2.5 quintillion bytes of data. This data comes from everywhere: from sensors used to gather climate information, posts to social media sites, digital pictures and videos posted online, transactions records of online purchases, and from cell phone GPS signals — to name a few. In the 11 years between 2009 and 2020, the size of the “Digital Universe” will increase 44 fold. That’s a 41% increase in capacity every year. In addition, only 5% of this data being created is structured and the remaining 95% is largely unstructured, or at best semi-structured. This is Big Data.”

Now for the elephants

One of the core emerging technologies for dealing with Big Data is Hadoop. Apache Hadoop is an open source platform build around Map/Reduce, a “seed” technology developed by Google (and inspired by LISP — no wonder I like it… there’s no denying one’s childhood love) to address their needs for indexing and analyzing web data, and dealing with the flow of searches hitting their data.

Several companies have sprouted up, that enhance / simplify / complement Hadoop with specialized layers and applications. A good overview of the current Hadoop landscape (suppliers and major users) can be found on Wikipedia here. And then there’s the analytics world, with the likes of ClearStory Data and Palantir Technologies.

Among these players, Hortonworks is particularly notable. Hortonworks was created by Yahoo! and Benchmark Capital in 2011 to take on the Yahoo! contributions to Hadoop. It is an independent company and one of the rising stars of Big Data, together with Cloudera and a few others.

The legend says that the Hadoop name comes from that of one of the original authors’ son’s toy elephant. Hence the elephant in the Apache Hadoop logo. How inventive of Yahoo! and Benchmark to have called Hortonworks after Horton, Dr Seuss’ elephant… I guess ‘Babar’ was protected by a strong copyright.

But wait a minute

Why did Dr Seuss name his character Horton?

I will offer that when “Horton Hears a Who!”, the book featuring the nicest elephant on Earth, was published in 1954, Dr Seuss might have been influenced by the elephants of the Horton Plains in Sri Lanka. A paradisiac high plateau at 2,100 meters of altitude in the center of Sri Lanka, the Horton Plains were home to a large population of elephants until their extinction in the late 1940’s due to over-hunting by the British who then occupied the region.

From Big Data to Serendip (Sri Lanka), the connection is now established. Serendipitously, thanks to Yahoo! and Benchmark Capital.

“—- you don’t reach Serendib by plotting a course for it.
You have to set out in good faith for elsewhere
and lose your bearings … serendipitously.”

(John Barth, The Last Voyage of  Somebody the Sailor)

Pin It