30 Hadoop and Big Data Spelunkers Worth Following

Understanding the basic purpose of Hadoop is easy: it offers a way to quickly store, process and extract deliverable meaning(s) from vast datasets. It does this by breaking the datasets up into commodity-server-sized chunks, replicating these to reduce failure, and sending them out to a connected web (cluster) of commodity servers (nodes) . Understanding how it can integrate with the current big data landscape and may integrate with the future one is a little harder—for that, I’ve turned to the experts. Luckily for me, and for you, if you’re in my boat, many of them maintain active twitter and blogging presences. Even more luckily, the quality and clarity of writing is really, really high. The following list is by no means exhaustive, but poking into the thoughts of even a few can elucidate everything from machine learning to data modeling and distributed systems.

1.) Hilary Mason

Continue reading 30 Hadoop and Big Data Spelunkers Worth Following

Post-Processing SSRS Reports using OfficeWriter in .NET

Using OfficeWriter‘s integration with SSRS in conjunction with the Designer is typically a straightforward process with no programmatic manipulation of the reports. A developer designs the report in Visual Studio BIDS, opens the .rdl using the Designer, designs the template in Word/Excel, and publishes the report. The report is then rendered inside the Report Manager using the custom OfficeWriter export option. However, there are times that situations call for post-processing the report programmatically and that’s where the ExcelApplication and WordApplication objects come in. Accessing and rendering the reports through the SSRS API is straightforward and the resulting byte array can be turned into a MemoryStream and passed to OfficeWriter.

Adding the SSRS Web Service

The first step necessary to tapping into the SSRS API is to add the Report Execution Service to your web references inside of Visual Studio. The URL for the web service is likely along the lines of *http://localhost/reportserver/reportexecution2005.asmx*, where localhost/reportserver is the hostname and virtual directory of the SSRS server. Note that this is for SQL Server 2008, despite the 2005. This web service is located in the directory C:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\Reporting Services\ReportServer in the example instance I am using. Continue reading Post-Processing SSRS Reports using OfficeWriter in .NET

Create an Excel Spreadsheet in Powershell

This post shows you how to create an excel spreadsheet in Powershell with OfficeWriter.

PowerShell


##################################################################

##

## Create an Excel Spreadsheet with ExcelApplication in Powershell ##

## by Jim Stallings (http://www.officewriter.com)

##

##################################################################

# Add the assembly Add-Type -Path ‘C:\Program Files (x86)\SoftArtisans\OfficeWriter\bin\dotnet\SoftArtisans.OfficeWriter.ExcelWriter.dll

# Create a new ExcelApplication object

$xla = New-Object “SoftArtisans.OfficeWriter.ExcelWriter.ExcelApplication

# Create a new workbook

$wb = $xla.Create()

# Add a worksheet to the workbook

$ws = $wb.Worksheets[0]

# Add some text to the first cell in the sheet

$ws.Cells[‘A1’].Value = “Welcome to SoftArtisans OfficeWriter!

# Save the workbook to disk

$xla.Save($wb, “C:\myfile.xls“)

Stories from the WIT Trenches: Stacia Misner

[This is the sixth in a series of posts exploring the personal stories of real women in technology. Every woman in tech overcame at the very last statistical odds to be here; this blog series aims to find out why, and what they found along the way. Those of you who work in the SQL Server BI arena are mostly likely familiar with Stacia Misner— the consultant, instructor and prolific author is one of  the MS BI stack’s greatest champions. Here, she talks tractors, the SQLBI community’s collective consciousness and growing up in the stars. For guidance and in-depth tutorials on all things SQL Server, SSRS, SharePoint and BI, check out Stacia’s blog and books! And if reading her story inspires you to share yours, please feel to email me.]

I’m Stacia Misner, a business intelligence consultant, author, and instructor specializing in the Microsoft business intelligence stack. I have been working in the business intelligence field since 1999 and started my own consulting company in 2006.

1)      Can you take us back to your “eureka!” moment—a particular instance or event that got you interested in technology?

I’ve always been interested in technology in one way or another. My parents were both programmers, although I don’t recall growing up thinking that I would follow in their footsteps. I was always very good at math and science, and was properly encouraged in those areas. I had the privilege of growing up in Houston, in the heart of the space industry, so all my friends’ parents (mostly fathers at the time, I suppose) were engineers or scientists. Technology seemed a normal part of life, and my friends and I grew up expecting that it would become more and more like Star Trek as time went on. Continue reading Stories from the WIT Trenches: Stacia Misner

Dealing with Bugs in Scrum

One of the big questions we had to answer for our Scrum development process whether or not to story point bugs. Development wanted to story point bugs to help with sprint planning. The Product Owner (me) didn’t want to story point bugs to maintain an accurate velocity. We experimented with a few approaches and settled on one that works well for us. In this post I’ll run through the issues and solutions we came up with.

Our Problems with Bugs in Scrum

Bugs pose a bit of a problem in Scrum because of how velocity and scheduling work. The measure of progress is the amount of value added to the product, and we use story points to estimate how much of that value we can fit into a release. This is tracked as “velocity.” For developers, velocity also acts as a benchmark for deciding how many stories to include in any given sprint.

This use of velocity is basic Agile practice and works pretty well for the most part. However, bugs are going to happen no matter how good your developers are. As a Product Owner, I evaluate the bugs that come in and decide if/when to fix them. From my point of view, the bugs are scheduled just like stories… the most important things get done first.

When we get into sprint planning, Development has to figure out how much to pull into the sprint, including both stories and bugs. This is a little tougher because bugs don’t have story points, so you have to “groom” each bug during sprint planning to get a sense of how much work it requires. Sprint planning gets much more painful when it also involves grooming. You might argue that there shouldn’t be enough bugs to make this a big problem, and I wouldn’t disagree. But we’re transitioning an existing project to use Scrum, so we have a number of bugs from the B.S. (before Scrum) times. Even for new projects, sometimes bad things happen to good people and you end up with more bugs then you’d like.

Development requested that we start story pointing bugs to make sprint planning a bit easier. There are a number of reasons why I strongly suggest you DON’T do that, but we’ll leave that for another day. So, short of story pointing bugs, how can we make dealing with bugs easier in Scrum?

A Possible Solution: Bug Points

We experimented with a few solutions and settled on using both Story Points and Bug Points. A Bug Point is a Story Point, but only gets used for bugs and does not get counted in the velocity. We use the same scale as Story Points and assign them during our normal grooming sessions. Really the only difference is that we put them in a different field in JIRA (our issue tracker). Development also keeps track of “total work” for sprints, which includes both stories and bugs. However, this metric is NEVER used outside of sprint planning. At no point will I (the product owner) ever look at that metric and say “Well, our velocity is 15, but really we’re doing 20 points of total work, so I bet we can squeeze in these extra stories for the next release.”

When it comes to sprint planning, the team gets together and works out what can fit into the next sprint. The bugs in the backlog are prioritized along with the stories because some bugs need to be fixed now and others can wait for a while. Since bugs are also pointed, the team can easily look at the backlog and determine how far they can get. They don’t have to reevaluate each bug as it comes up in the backlog; they’ve already done that work and can just keep rolling.

This works well for us because it’s easier to know what’s going to fit in the sprint. It also allows the team to have a somewhat more realistic view of how much they can fit in a particular sprint. Note that I said a particular sprint, not a release. Remember:  NEVER consider bug points during release planning! If the team has bugs in 4 straight sprints, then don’t have any they have a better idea of how much they can fit without any bugs to work on.

There are a number of other benefits. For one, you get a better sense of how much extra work you’re creating by letting bugs slip through the cracks when implementing a story. This helps to spot quality issues. If you’re constantly creating more bugs than value then you know something is wrong.

What’s Not So Good About This?

It’s very easy to fall into the trap of using bug points like story points outside of sprint planning. As I mentioned earlier, including bugs in your long term planning is a bad idea. Bug points are so similar to story points that it’ easy to forget that they AREN’T. Other than that, this seems like a pretty good solution to me.

It works for us, but how about you? I’d love to hear about how others are handling bugs in the backlog.

It’s Hadoop’s world. We just live in it

It’s Hadoop’s world. We just live in it. Welcome to #hw2011!

That was the starting battle-cry from Mike Olsen, CEO of Cloudera as he kicked off the third Hadoop World Conference. Indeed, after drinking the kool-aid for two days, I’ve been almost fully ingested, stored and transformed even if I have yet to be accessed, let alone managed.

For it seems that within a few years, all digital information, including my electronic Freudian id and perhaps my ego as well, will be deposited into Hadoop, forever ready to be accessed via any number of different social and structured graphs.

Hadoop background

Some have speculated that within five years, Hadoop will hold 50% of the world’s information. I now believe that to be true, albeit potentially a copy of the other 50%, if not uniquely in Hadoop.

Hadoop and its Google ancestors enable storage on a scale and scope never previously possible with baked-in redundancy and resiliency at lower operational cost than big iron solutions. And the software is free, needing nothing more common commodity hardware and a Java host.

Google created and shared the concept. Doug Cutting and a colleague started an independent Apache licensed implementation five years ago. Since then, it has been adopted by the largest web properties: Facebook, Twitter, eBay among others. Even major enterprises like JP Morgan and Disney have been using it in production for at least two years.

Commercially supported releases are available from Cloudera and Hortonworks.

The Conference

Hadoop is still in the early adopter stage and has not yet crossed Geoff Moore’s Chasm. This is most reminiscent of the state of the web circa 1994. Forward looking companies are making incredible strides in competitive advantage using primitive tools and smart developers.

Cloudera is doing great job in championing the ecosystem. They recognize that growing the overall market and adoption is the correct long term path to riches. I look forward to #hw2012.

NEUGS Part 11: Workflows, AKA Lifesavers for the Lazy

Not going to lie, guys, I’ve been putting this post off for a while. (Ironic, as procrastination is exactly what Workflows aim to prevent.) To me, the term connotates TPS reports and dingy cubicles and unsheathed florescent overhead lights and perpetually sweaty officeworkers in greasy button-downs and Bluetooth headsets. Also, blandly enthusiastic sales execs talking about connection and knowledge share and  koi ponds, though I’m not sure where that last image comes from. Butttt, here’s the thing: workflows provide a pretty useful method for keeping individuals and teams on track, through a series of automated steps triggered by the initialization or completion of a designated action.

Eg let’s say, completely hypothetically, that I am a fairly low-ranking business analyst at Kibble ‘n’ Krunchy Bits Corp. Let’s also say, again completely hypothetically, that I have this habit of uploading my weekly sales report to the sales team site, then wandering off to gchat for hours. So the reports just sit there without anyone looking at them for like, weeks at a time, and then at the end of the quarter everyone is surprised by how much sales of KrunchExtreme Lite with Passionfruit Extract ™ have grown. (Even though, hello, they should have known this because the factory workers and delivery men have all been putting in on average 13 hours of overtime a week for the past four months, figures which someone in a different department really should be keeping a better eye on.) A Workflow – in this case a modified Approval Workflow – provides me and my managers with an easy solution to this lack-of-awareness problem. Continue reading NEUGS Part 11: Workflows, AKA Lifesavers for the Lazy

A Simple Bash Script to Clone a Drupal Core GIT Repository

This is a simple bash script used to clone a Drupal Core GIT repository locally when passed a branch number and a new directory name (to host the Drupal repo). I use the issue number as the directory to keep Drupal API documentation patches coordinated with the Drupal.org issue number.

To execute the script, copy and past the commands in the code block into a text editor then save the script as filename.sh (e.g. clonerepo.sh).

Run from a bash prompt (in the same directory) by typing: sh clonerepo.sh