All posts by Richard

Boston’s Big Datascape, Part 2: Nasuni, VoltDB, Lexalytics, Totutek, Cloudant

[Excerpted from the Riparian Data blog]

This ongoing series examines some of the key, exciting players in Boston’s emerging Big Data arena. The companies I’m highlighting differ in growth stages, target markets and revenue models, but converge around their belief that the data is the castle, and their tools the keys. You can read about the first five companies here.

6) Nasuni

  • Product: Nasuni is an cloud enterprise storage system. Their Nasuni Filers propagate data from a local disk cache to cloud storage, essentially giving users a unified file share in the cloud that doesn’t require replication of file servers.
  • Founder: Andres Rodriguez
  • Technologies used: on-premise storage, UniFS™ file system, VMs, cloud storage
  • Target Industries: Manufacturing, Construction, Legal, Education
  • Location: Natick, MA


7) VoltDB

  • Product: VoltDB is an in-memory relational database designed to handle millions of operations per second (125k TPS per commodity server) with near-perfect fault tolerance and automatic scale-out. It has three flavors—an Enterprise, startup/ISV, and community edition.
  • Founder: Michael Stonebraker (ln)
  • Technologies used: in-memory DBMS, OLTP, ACID, SQL
  • Target industries: Capital Markets, Digital Advertising, Online Games, Network Services
  • Location: Billerica, MA

[Read the full post]

Boston Hadoop Meetup Group: The Trumpet of the Elephant

Heheh. But seriously, if you live in the Boston area and are working with Hadoop, or interested in working with Hadoop, or just think the name is fun to say, you should absolutely clear your calendar the night of February 15. Why? Because it’s the first Boston Hadoop Meetup Group since November, and judging by the presenter line-up, it’s going to be a doozie (or an Oozie, if you want to get all topical).

First up, MapR’s Chief Application Architect Ted Dunning (t|l) on using Machine Learning within Hadoop. I’m really excited about this one.

Second, Cloudera Systems Engineer Adam Smieszy (t|l) on integrating Hadoop into your existing data management and analysis workflows.

Last, Hadapt’s CTO Philip Wickline (t|ln) “will give a high-level discussion about the differences between HBase and Hive, and about transactional versus analytical workloads more generally speaking, and dive into the systems required for each type of workload. ”

Each talk will run about 15-20 minutes, with time for Q&A after, followed by (free) beer and mingling.

The Boston Hadoop MeetUp Group is organized by Hadapt’s Reed Shea (t|l). Hadapt is doing some very very cool stuff with unstructured and structured data processing and analytics–cool enough that founder/Chief Scientist Daniel Abadi took teaching leave from Yale to turn his research into a product.

This particular MeetUp is sponsored by Hadapt, MapR, Cloudera and Fidelity, and is being held at Fidelity’s downtown office, from 6 to about 8:30 pm. For more information and to sign up, visit the event page.

See you there!

#Meme15 Assignment 2: All A’Twitter

sqlfamilyA new monthly blog series has entered the #sqlfamily. The brainchild of Jason Strate (b|t), “#Meme15” focuses around the ways social networks can further our professional development.  This month’s assignment is one dear to my own heart (and brain. And fingers): Twitter. I’ve written before about what Twitter can do for your company—how it can give high-tech B2Bs personality, credibility and new leads. What I haven’t covered as much is what it can do for you, the employee. There are two questions in the assignment:


  • Why should average Jane or Joe professional consider using twitter?
  • What benefit have you seen in your career because of twitter?

As a person whose primary job responsibilities involve social media, I’m going to go with the first option—for an excellent answer to the second, check out Stacia Misner’s response.

So, why should you, the non Social Media Marketer/Specialist/Strategist etc use Twitter? In short, there are three main reasons: build relationships, gain knowledge and enhance your public image.

In slightly longer, Twitter is a public conversation, a place to learn, share and connect. Someone posts a link to a blog post about Power View; you read it and learn something new about Power View (animated data points, oh my!). Someone asks a question about stored procedures, aka your pride and joy, and you answer them. Bonds form between the teachers and the taught, the @er and @ed, tweeter and retweeter—but they can also form, albeit more loosely, between all of the above and their networks of listeners. When you perform any activity on Twitter, from favoriting a Tweet to organizing a Tweetup, it deepens your digital profile to anyone who thinks to look or happens to listen at the right time.

Twitter allows you to join  (or start!) non-geographically-restricted communities grouped around any interest or combination of interests. It lets you play pin the avatar on the body at conferences. It’s a virtual kickstarter for eventual IRL relationships. For all the banality of some of its content, Twitter’s function as a connector is far from trivial.

 [#Meme15 logo by Matt Velic]

Boston’s Big Datascape, Part 1

[Excerpted from the Riparian Data blog]
Big Data, or the technologies, languages, databases and platforms used to efficiently store, analyze and extract conclusions from massive data sets, is a Big Trend right now. Why? In a nutshell, because a) we are generating ever increasing amounts of data, and b) we keep learning faster, easier and more accurate ways of handling and extracting business value from it. On Wall Street, some investment banks and hedgefunds are incorporating sentiment analysis of web documents into their trading strategies. In healthcare, companies like WellPoint, Explorys and Apixio are using distributed computing to mine health records, practice guidelines, studies and medical/service costs to more accurately and affordably insure, diagnose and treat patients.

Unsurprisingly, Silicon Valley is big data’s epicenter, but Boston, long a bastion of Life Sciences, Healthcare, High Tech and Higher Ed, is becoming an important player, particularly in the storage and analytics arenas. This series aims to spotlight some of the current and future game changers. These companies differ in growth stages, target markets and revenue models, but converge around their belief that the data is the castle, and their tools the keys.

1)      Recorded Future

  • Product: Recorded Future is an API that scans, analyzes and visualizes the sentiment and momentum of specified references in publically available web documents (news sites, blogs, govt. sites, social media sites etc)
  • Founder/CEO: Christopher Ahlberg
  • Technologies used: JSON, real-time data feeds, predictive modeling, sentiment analysis
  • Target Industries: Financial Services, Competitive Intelligence, Defense Intelligence
  • Located: Cambridge, MA

2)      Hadapt

  • Product: The Hadapt Adaptive Analytical Platform is a single system for processing, querying and analyzing both structured and unstructured data. The platform doesn’t need connectors, and supports SQL queries.
  • Founders: Justin Borgman (CEO); Dr. Daniel Abadi (Chief Scientist)
  •  Technologies used: Hadoop, SQL, Adaptive Query Execution™
  • Target Industries: Financial Services, Healthcare, Telecom, Government

[Read the full post]

For the Data Scientists: 5 Upcoming Big Data Conferences You Shouldn’t Miss

Big data is a big deal right now, and it’s only going to become a bigger deal in the future, so it makes sense to learn about as many of its aspects as you can, as quickly as you can. Or pick one and learn it very well. Or don’t pick any, if you are a staunch believer in the shelf-life of traditional data warehouses. From a machine learning deep-dive to an open-source buffet,  the following five conferences provide educational and networking opportunities for both the specialists and renaissance persons among you. Attending a cool one I’ve missed? Let me know in the comments!