Category Archives: Big Data

It’s Hadoop’s world. We just live in it

It’s Hadoop’s world. We just live in it. Welcome to #hw2011!

That was the starting battle-cry from Mike Olsen, CEO of Cloudera as he kicked off the third Hadoop World Conference. Indeed, after drinking the kool-aid for two days, I’ve been almost fully ingested, stored and transformed even if I have yet to be accessed, let alone managed.

For it seems that within a few years, all digital information, including my electronic Freudian id and perhaps my ego as well, will be deposited into Hadoop, forever ready to be accessed via any number of different social and structured graphs.

Hadoop background

Some have speculated that within five years, Hadoop will hold 50% of the world’s information. I now believe that to be true, albeit potentially a copy of the other 50%, if not uniquely in Hadoop.

Hadoop and its Google ancestors enable storage on a scale and scope never previously possible with baked-in redundancy and resiliency at lower operational cost than big iron solutions. And the software is free, needing nothing more common commodity hardware and a Java host.

Google created and shared the concept. Doug Cutting and a colleague started an independent Apache licensed implementation five years ago. Since then, it has been adopted by the largest web properties: Facebook, Twitter, eBay among others. Even major enterprises like JP Morgan and Disney have been using it in production for at least two years.

Commercially supported releases are available from Cloudera and Hortonworks.

The Conference

Hadoop is still in the early adopter stage and has not yet crossed Geoff Moore’s Chasm. This is most reminiscent of the state of the web circa 1994. Forward looking companies are making incredible strides in competitive advantage using primitive tools and smart developers.

Cloudera is doing great job in championing the ecosystem. They recognize that growing the overall market and adoption is the correct long term path to riches. I look forward to #hw2012.

Economy of the Cloud

Ed Felten at Freedom to Tinker recently posted an article about the economics of cloud computing (partially in response to last week’s New York Times op-ed about the cloud). While I agree with Felten’s main contention – namely, that the cost of resource management is a driving force in the movement towards the cloud – I would argue that there are also much more important factors at play.

Within the last few years, startups and small companies have seen a growing focus on web- or phone-based applications, from which customers expect much higher uptime, performance, and reliability. Even beyond the cost factor, there’s no real reason for a small company to grow their own servers* when alternatives exist with much better guarantees of all three of those expectations. This is particularly true near the start of an application life cycle, when the amount of data required to run effectively is often disproportionate to the size of the actual user base – if a small company were to run their own servers, they would have to endure remarkably low resource utilization. But even with established applications, it makes much more sense from a practical standpoint not to waste CPU cycles during low traffic times of the day or year. So ultimately I see the movement to the cloud as being spawned mostly from four separate factors:

  • The desire for greater reliability
  • The fact that most companies would prefer to focus on their primary objective (e.g. developing software) rather than working on supportive infrastructure Continue reading Economy of the Cloud