Understanding the basic purpose of Hadoop is easy: it offers a way to quickly store, process and extract deliverable meaning(s) from vast datasets. It does this by breaking the datasets up into commodity-server-sized chunks, replicating these to reduce failure, and sending them out to a connected web (cluster) of commodity servers (nodes) . Understanding how it can integrate with the current big data landscape and may integrate with the future one is a little harder—for that, I’ve turned to the experts. Luckily for me, and for you, if you’re in my boat, many of them maintain active twitter and blogging presences. Even more luckily, the quality and clarity of writing is really, really high. The following list is by no means exhaustive, but poking into the thoughts of even a few can elucidate everything from machine learning to data modeling and distributed systems.
1.) Hilary Mason
- bio: Chief Scientist at Bit.ly, Co-Founder, hackNY.org
- twitter: @hmason
- blog: hilarymason.com
- Watch: Machine Learning: a Love Story
2.) Todd Lipcon
- bio: Engineer at Cloudera, Hadoop/HBase Committer
- twitter: @tlipcon
- Read: 7 Tips for Improving MapReduce Performance
3.) Daniel Abadi
- bio: Assistant Professor of Computer Science at Yale University, Chief Scientist at Hadapt.
- twitter: @daniel_abadi
- blog dbmsmusings.blogspot.com/
- Read: Hadoop’s tremendous inefficiency on graph data management (and how to avoid it)
4.) Gary Helmling
- bio: HBase Committer at Apache Software Foundation, Hadoop Developer at Trend Micro
- twitter: @gario
- Watch: New HBase Features: Coprocessors and Security (from Hadoop Summit 2011)
5.) Josh Patterson
- bio: Solutions architect at Cloudera
- twitter: @jpatanooga
- blog: jpatterson.floe.tv/
- Read: What is Data Mining
6.) Chris Neuman
- bio: Founder, 2 Bettas Labs
- twitter: @ckneumann
- blog: baconwrappeddata.com
- Read: Thoughts on Hadoop World 2011
7.) Doug Cutting
- bio: Architect at Cloudera, Director, Apache Software Foundation, Creator of Hadoop
- twitter: @cutting
- Read: Hadoop Creator Doug Cutting Talks About Why He Got into Open Source
8.) Peter Skomoroch
- bio: Principal Data Scientist at LinkedIn, runs datawrangling.com and trendingtopics.org
- twitter: @peteskomoroch
- blog: datawrangling.com
- Read: Amazon Elastic MapReduce: A Web Service API for Hadoop
- bio: Adjunct Assistant Professor in the Computer Science Department within USC’s Viterbi School of Engineering. Senior Computer Scientist, Jet Propulsion Laboratory
- twitter: @chrismattmann
- Read: The Case for the Digital Babelfish
10.) Arun C Murthy
- bio: Founder of Hortonworks, VP of Apache Hadoop
- twitter: @acmurthy
- Read: Apache Hadoop: Best Practices and Anti-Patterns
11.) Dmitriy Ryaboy
- bio: Analytics Tech Lead at Twitter
- twitter: @squarecog
- blog: squarecog.wordpress.com
- Watch: How Hadoop Is Used at Twitter
12.) Andrew Ferguson
- bio: CS grad student at Brown University
- twitter: @adferguson
- blog: andrewferguson.net
- Read: Understanding Filesystem Imbalance in Hadoop
13.) Chad Metcalf
- bio: Infrastructure Operations Engineer at Cloudera
- twitter: @metcalf
14.) Florian Leibert
- bio: Software Engineer, Research at Twitter
- twitter: @flo
- blog: flori.posterous.com
- Read: Keyword Extraction Using Lexical Chains
15.) Jeff Darcy
- bio: Cloud filesystem software engineer at Red Hat
- twitter: @obdurodon
- blog: pl.atyp.us/
- Read: Stop the Hate
16.) Ryan Rawson
- bio: Architect at CX, Inc, HBase committer
- twitter: @ryanobjc
- Watch:Ryan Rawson on HBase at StumbledUpon (NoSQL Tapes Interview)
17.)Alex Feinberg
- bio: Senior Software Engineer at LinkedIn
- twitter: @strlen
- Read:Replication, atomicity and order in distributed systems
18.) Johan Oskarsson
- bio: Developer at Twitter, formerly at Last.fm.
- twitter: @skr
- blog: blog.oskarsson.nu
19.) Greg West
- bio: Salesforce R&D Engineer
- twitter: @gwestr
20.) Alexander Popescu
- bio: Software Archtect and Founder/CTO of InfoQ.com
- twitter: @al3xandru
- blog: nosql.mypopescu.com
- Read: NoSQL Data Modeling
21.) Amr Awadallah
- bio: Founder/CTO at Cloudera
- twitter: @awadallah
- blog:awadallah.com/blog/
22.) Ted Dunning
- bio: Commiter on Apache Mahoot, Product Architect at MapR
- twitter: @ted_dunning
- blog: tdunning.blogspot.com
- Read: Buzzwords Keynote…blog edition
23.) Andrew McAfee
- bio: Author of Enterprise 2.0, co-author of Race Against the Machine, blogger for HBR
- twitter: @amcafee
- blog: andrewmcafee.org
- Read: My Scariest Graph
24.) James Kobielus
- bio: Senior Analyst, DW, Analytics and BI at Forrester
- twitter: @jameskobielus
- blog: jkobielus.blogspot.com
- Read: Data Scientist: Important New Role or Trendy Job Title Inflation?
25.) Amund Tveit
- bio: Founder, Atbrox
- twitter: @atveit
- blog: atbrox.com
- Read: MapReduce & Hadoop Algorithms in Academic Papers
26.) Thomas Brox Røst
- bio: Founder, Atbrox
- twitter: @brox
- blog: thomas.broxrost.com
27.) Vanessa Alvarez
- bio: Analyst at Forrester
- twitter: @vanessaalvarez1
- blog: blogs.forrester.com/vanessa_alvarez
- Read: Data is the Key that Ties it All Together
28.) Jeff Hammerbacher
- bio: Chief Scientist at Cloudera
- twitter: @hackingdata
- blog: jeffhammerbacher.com
- Read: Peer Reviewed Journals for Source Code and Data: Narrative Forms for the Modern Scientific Method
29.) Greg Wilson
- bio: Software Engineer at Side Effects Software, creator of Software Carpentry, author of Data Crunching: Solve Everyday Problems Using Java, Python and More
- blog: third-bit.com/blog
- Read: Empirical Software Engineering (co-written with Jorge Aranda)
30.) Jeff Kelly
- bio: Principle Research Contributor at Wikibon, blogger at SiliconANGLE
- twitter: @jeffreykelly
- blog: siliconangle.com/blog/author/jeffkelly/
- Read: The Stakes are High in the Hadoop Distribution Race
Related Posts:
Share the post "30 Hadoop and Big Data Spelunkers Worth Following"