30 Hadoop and Big Data Spelunkers Worth Following

Understanding the basic purpose of Hadoop is easy: it offers a way to quickly store, process and extract deliverable meaning(s) from vast datasets. It does this by breaking the datasets up into commodity-server-sized chunks, replicating these to reduce failure, and sending them out to a connected web (cluster) of commodity servers (nodes) . Understanding how it can integrate with the current big data landscape and may integrate with the future one is a little harder—for that, I’ve turned to the experts. Luckily for me, and for you, if you’re in my boat, many of them maintain active twitter and blogging presences. Even more luckily, the quality and clarity of writing is really, really high. The following list is by no means exhaustive, but poking into the thoughts of even a few can elucidate everything from machine learning to data modeling and distributed systems.

1.) Hilary Mason

bio: Chief Scientist at Bit.ly, Co-Founder, hackNY.org
twitter: @hmason
blog: hilarymason.com
Watch: Machine Learning: a Love Story

2.) Todd Lipcon

bio: Engineer at Cloudera, Hadoop/HBase Committer
twitter: @tlipcon
Read: 7 Tips for Improving MapReduce Performance

3.) Daniel Abadi

bio: Assistant Professor of Computer Science at Yale University, Chief Scientist at Hadapt.
twitter: @daniel_abadi
blog dbmsmusings.blogspot.com/
Read: Hadoop’s tremendous inefficiency on graph data management (and how to avoid it)

4.) Gary Helmling

bio: HBase Committer at Apache Software Foundation, Hadoop Developer at Trend Micro
twitter: @gario
Watch: New HBase Features: Coprocessors and Security (from Hadoop Summit 2011)

5.) Josh Patterson

bio: Solutions architect at Cloudera
twitter: @jpatanooga
blog: jpatterson.floe.tv/
Read: What is Data Mining

6.) Chris Neuman

bio: Founder, 2 Bettas Labs
twitter: @ckneumann
blog: baconwrappeddata.com
Read: Thoughts on Hadoop World 2011

7.) Doug Cutting

bio: Architect at Cloudera, Director, Apache Software Foundation, Creator of Hadoop
twitter: @cutting
Read: Hadoop Creator Doug Cutting Talks About Why He Got into Open Source

8.) Peter Skomoroch

bio: Principal Data Scientist at LinkedIn, runs datawrangling.com and trendingtopics.org
twitter: @peteskomoroch
blog: datawrangling.com
Read: Amazon Elastic MapReduce: A Web Service API for Hadoop

9.)Chris Mattman

bio: Adjunct Assistant Professor in the Computer Science Department within USC’s Viterbi School of Engineering. Senior Computer Scientist, Jet Propulsion Laboratory
twitter: @chrismattmann
Read: The Case for the Digital Babelfish

10.) Arun C Murthy

bio: Founder of Hortonworks, VP of Apache Hadoop
twitter: @acmurthy
Read: Apache Hadoop: Best Practices and Anti-Patterns

11.) Dmitriy Ryaboy

bio: Analytics Tech Lead at Twitter
twitter: @squarecog
blog: squarecog.wordpress.com
Watch: How Hadoop Is Used at Twitter

12.) Andrew Ferguson

bio: CS grad student at Brown University
twitter: @adferguson
blog: andrewferguson.net
Read: Understanding Filesystem Imbalance in Hadoop

13.) Chad Metcalf

bio: Infrastructure Operations Engineer at Cloudera
twitter: @metcalf

14.) Florian Leibert

bio: Software Engineer, Research at Twitter
twitter: @flo
blog: flori.posterous.com
Read: Keyword Extraction Using Lexical Chains

15.) Jeff Darcy

bio: Cloud filesystem software engineer at Red Hat
twitter: @obdurodon
blog: pl.atyp.us/
Read: Stop the Hate

16.) Ryan Rawson

bio: Architect at CX, Inc, HBase committer
twitter: @ryanobjc
Watch:Ryan Rawson on HBase at StumbledUpon (NoSQL Tapes Interview)

17.)Alex Feinberg

bio: Senior Software Engineer at LinkedIn
twitter: @strlen
Read:Replication, atomicity and order in distributed systems

18.) Johan Oskarsson

bio: Developer at Twitter, formerly at Last.fm.
twitter: @skr
blog: blog.oskarsson.nu

19.) Greg West

bio: Salesforce R&D Engineer
twitter: @gwestr

20.) Alexander Popescu

bio: Software Archtect and Founder/CTO of InfoQ.com
twitter: @al3xandru
blog: nosql.mypopescu.com
Read: NoSQL Data Modeling

21.) Amr Awadallah

bio: Founder/CTO at Cloudera
twitter: @awadallah
blog:awadallah.com/blog/

22.) Ted Dunning

bio: Commiter on Apache Mahoot, Product Architect at MapR
twitter: @ted_dunning
blog: tdunning.blogspot.com
Read: Buzzwords Keynote…blog edition

23.) Andrew McAfee

bio: Author of Enterprise 2.0, co-author of Race Against the Machine, blogger for HBR
twitter: @amcafee
blog: andrewmcafee.org
Read: My Scariest Graph

24.) James Kobielus

bio: Senior Analyst, DW, Analytics and BI at Forrester
twitter: @jameskobielus
blog: jkobielus.blogspot.com
Read: Data Scientist: Important New Role or Trendy Job Title Inflation?

25.) Amund Tveit

bio: Founder, Atbrox
twitter: @atveit
blog: atbrox.com
Read: MapReduce & Hadoop Algorithms in Academic Papers

26.) Thomas Brox Røst

bio: Founder, Atbrox
twitter: @brox
blog: thomas.broxrost.com

27.) Vanessa Alvarez

bio: Analyst at Forrester
twitter: @vanessaalvarez1
blog: blogs.forrester.com/vanessa_alvarez
Read: Data is the Key that Ties it All Together

28.) Jeff Hammerbacher

bio: Chief Scientist at Cloudera
twitter: @hackingdata
blog: jeffhammerbacher.com
Read: Peer Reviewed Journals for Source Code and Data: Narrative Forms for the Modern Scientific Method

29.) Greg Wilson

bio: Software Engineer at Side Effects Software, creator of Software Carpentry, author of Data Crunching: Solve Everyday Problems Using Java, Python and More
blog: third-bit.com/blog
Read: Empirical Software Engineering (co-written with Jorge Aranda)

30.) Jeff Kelly

bio: Principle Research Contributor at Wikibon, blogger at SiliconANGLE
twitter: @jeffreykelly
blog: siliconangle.com/blog/author/jeffkelly/
Read: The Stakes are High in the Hadoop Distribution Race

New England Database Summit 2012: Too Big to Flail?

Big Data and OfficeWriter

2013 Business Intelligence Trends

SoftArtisans

30 Hadoop and Big Data Spelunkers Worth Following

Related Posts:

Related posts:

Blogged