All posts by Richard

Boston’s Big Datascape, Part 2: Nasuni, VoltDB, Lexalytics, Totutek, Cloudant

January 30, 2012 Richard 1 Comment

This ongoing series examines some of the key, exciting players in Boston’s emerging Big Data arena. The companies I’m highlighting differ in growth stages, target markets and revenue models, but converge around their belief that the data is the castle, and their tools the keys. You can read about the first five companies here.

6) Nasuni

Product: Nasuni is an cloud enterprise storage system. Their Nasuni Filers propagate data from a local disk cache to cloud storage, essentially giving users a unified file share in the cloud that doesn’t require replication of file servers.
Founder: Andres Rodriguez
Technologies used: on-premise storage, UniFS™ file system, VMs, cloud storage
Target Industries: Manufacturing, Construction, Legal, Education
Location: Natick, MA

7) VoltDB

Product: VoltDB is an in-memory relational database designed to handle millions of operations per second (125k TPS per commodity server) with near-perfect fault tolerance and automatic scale-out. It has three flavors—an Enterprise, startup/ISV, and community edition.
Founder: Michael Stonebraker (ln)
Technologies used: in-memory DBMS, OLTP, ACID, SQL
Target industries: Capital Markets, Digital Advertising, Online Games, Network Services
Location: Billerica, MA

[Read the full post]

Big Data

Mo’ Data, Mo’ Problems, E03: DynamoDB vs. HBase

January 27, 2012 Richard 1 Comment

In this episode of our big data series, we’re talking about DynamoDB, Amazon’s newly released managed distributed database, and how it stacks up against HBase in terms of features, data size, latency and ease of use.

Big Data, Events

Boston Hadoop Meetup Group: The Trumpet of the Elephant

January 26, 2012 Richard

Heheh. But seriously, if you live in the Boston area and are working with Hadoop, or interested in working with Hadoop, or just think the name is fun to say, you should absolutely clear your calendar the night of February 15. Why? Because it’s the first Boston Hadoop Meetup Group since November, and judging by the presenter line-up, it’s going to be a doozie (or an Oozie, if you want to get all topical).

First up, MapR’s Chief Application Architect Ted Dunning (t|l) on using Machine Learning within Hadoop. I’m really excited about this one.

Second, Cloudera Systems Engineer Adam Smieszy (t|l) on integrating Hadoop into your existing data management and analysis workflows.

Last, Hadapt’s CTO Philip Wickline (t|ln) “will give a high-level discussion about the differences between HBase and Hive, and about transactional versus analytical workloads more generally speaking, and dive into the systems required for each type of workload. ”

Each talk will run about 15-20 minutes, with time for Q&A after, followed by (free) beer and mingling.

The Boston Hadoop MeetUp Group is organized by Hadapt’s Reed Shea (t|l). Hadapt is doing some very very cool stuff with unstructured and structured data processing and analytics–cool enough that founder/Chief Scientist Daniel Abadi took teaching leave from Yale to turn his research into a product.

This particular MeetUp is sponsored by Hadapt, MapR, Cloudera and Fidelity, and is being held at Fidelity’s downtown office, from 6 to about 8:30 pm. For more information and to sign up, visit the event page.

See you there!

Social Media, SQL Server

#Meme15 Assignment 2: All A’Twitter

January 17, 2012 Richard 1 Comment

A new monthly blog series has entered the #sqlfamily. The brainchild of Jason Strate (b|t), “#Meme15” focuses around the ways social networks can further our professional development. This month’s assignment is one dear to my own heart (and brain. And fingers): Twitter. I’ve written before about what Twitter can do for your company—how it can give high-tech B2Bs personality, credibility and new leads. What I haven’t covered as much is what it can do for you, the employee. There are two questions in the assignment:

Why should average Jane or Joe professional consider using twitter?
What benefit have you seen in your career because of twitter?

As a person whose primary job responsibilities involve social media, I’m going to go with the first option—for an excellent answer to the second, check out Stacia Misner’s response.

So, why should you, the non Social Media Marketer/Specialist/Strategist etc use Twitter? In short, there are three main reasons: build relationships, gain knowledge and enhance your public image.

In slightly longer, Twitter is a public conversation, a place to learn, share and connect. Someone posts a link to a blog post about Power View; you read it and learn something new about Power View (animated data points, oh my!). Someone asks a question about stored procedures, aka your pride and joy, and you answer them. Bonds form between the teachers and the taught, the @er and @ed, tweeter and retweeter—but they can also form, albeit more loosely, between all of the above and their networks of listeners. When you perform any activity on Twitter, from favoriting a Tweet to organizing a Tweetup, it deepens your digital profile to anyone who thinks to look or happens to listen at the right time.

Twitter allows you to join (or start!) non-geographically-restricted communities grouped around any interest or combination of interests. It lets you play pin the avatar on the body at conferences. It’s a virtual kickstarter for eventual IRL relationships. For all the banality of some of its content, Twitter’s function as a connector is far from trivial.

[#Meme15 logo by Matt Velic]

Big Data

Boston’s Big Datascape, Part 1

January 16, 2012 Richard 1 Comment

[Excerpted from the Riparian Data blog]
Big Data, or the technologies, languages, databases and platforms used to efficiently store, analyze and extract conclusions from massive data sets, is a Big Trend right now. Why? In a nutshell, because a) we are generating ever increasing amounts of data, and b) we keep learning faster, easier and more accurate ways of handling and extracting business value from it. On Wall Street, some investment banks and hedgefunds are incorporating sentiment analysis of web documents into their trading strategies. In healthcare, companies like WellPoint, Explorys and Apixio are using distributed computing to mine health records, practice guidelines, studies and medical/service costs to more accurately and affordably insure, diagnose and treat patients.

Unsurprisingly, Silicon Valley is big data’s epicenter, but Boston, long a bastion of Life Sciences, Healthcare, High Tech and Higher Ed, is becoming an important player, particularly in the storage and analytics arenas. This series aims to spotlight some of the current and future game changers. These companies differ in growth stages, target markets and revenue models, but converge around their belief that the data is the castle, and their tools the keys.

1) Recorded Future

Product: Recorded Future is an API that scans, analyzes and visualizes the sentiment and momentum of specified references in publically available web documents (news sites, blogs, govt. sites, social media sites etc)
Founder/CEO: Christopher Ahlberg
Technologies used: JSON, real-time data feeds, predictive modeling, sentiment analysis
Target Industries: Financial Services, Competitive Intelligence, Defense Intelligence
Located: Cambridge, MA

2) Hadapt

Product: The Hadapt Adaptive Analytical Platform is a single system for processing, querying and analyzing both structured and unstructured data. The platform doesn’t need connectors, and supports SQL queries.
Founders: Justin Borgman (CEO); Dr. Daniel Abadi (Chief Scientist)
Technologies used: Hadoop, SQL, Adaptive Query Execution™
Target Industries: Financial Services, Healthcare, Telecom, Government

[Read the full post]

Big Data

Mo’ Data, Mo’ Problems, E02: HDFS

January 9, 2012 Richard

In this episode of our big data series, we’re talking Hadoop’s distributed file system (HDFS), which kind of acts like a virtual singles’ bar for the client, namenode and datanodes that map and reduce your data.

Office Life

The Tiger Springs in the New Year

December 30, 2011 Richard

For the apocalpysers, a bit of Eliot before we pop the corks.

See you in 2012!

Office Life

Season’s Greetings!

December 23, 2011 Richard

Nice and non-denominational, right? For all of you celebrating this weekend (and really, you should celebrate every weekend. And weekday, if you can manage it), may it be a merry one.

xoxo (and ho, ho, ho),
The SA Crew

Big Data

Mo Data, Mo Problems, Episode 01: HBase

December 21, 2011 Richard

In the inaugural episode of our big data series, we give you a high-level overview of HBase, the transactional database built on top of HDFS. Recommended for anyone who enjoys OLTP, random reads and writes and extended metaphors.

Big Data, Events

For the Data Scientists: 5 Upcoming Big Data Conferences You Shouldn’t Miss

December 20, 2011 Richard

Big data is a big deal right now, and it’s only going to become a bigger deal in the future, so it makes sense to learn about as many of its aspects as you can, as quickly as you can. Or pick one and learn it very well. Or don’t pick any, if you are a staunch believer in the shelf-life of traditional data warehouses. From a machine learning deep-dive to an open-source buffet, the following five conferences provide educational and networking opportunities for both the specialists and renaissance persons among you. Attending a cool one I’ve missed? Let me know in the comments!

O’Reilly Strata Conference
- What: Data science, data-driven business, visuailzation, hadoop and big data, policy and privacy and domain specific data, a startup showcase and an Expo Hall.
- Where: Santa Clara, CA
- When: February 28-March 1st, 2012
- Who: Doug Cutting, Ben Goldacre, Avinash Kaushik, Coco Krumme, Hal Varian, Pete Warden
- Why: Strata brings together practitioners, researchers, IT leaders and entrepreneurs to discuss big data, Hadoop, analytics, visualization and data markets.
- How much: from $99 for the Expo Hall only to $1795-2045 for the all-access pass
- @strataconf Continue reading For the Data Scientists: 5 Upcoming Big Data Conferences You Shouldn’t Miss →

SoftArtisans

All posts by Richard

Mo’ Data, Mo’ Problems, E03: DynamoDB vs. HBase

Boston Hadoop Meetup Group: The Trumpet of the Elephant

Boston’s Big Datascape, Part 1

Mo’ Data, Mo’ Problems, E02: HDFS

The Tiger Springs in the New Year

Season’s Greetings!

Mo Data, Mo Problems, Episode 01: HBase

For the Data Scientists: 5 Upcoming Big Data Conferences You Shouldn’t Miss

Blogged