OfficeWriter 8.0: Getting Started with XLSX in C#

XLSX in OfficeWriter 8.0

We just released OfficeWriter 8.0, and the biggest feature in this release is full support for Excel 2007/2010 files with the OfficeWriter API.  So, if you’ve written OfficeWriter applications that process XLS files, what do you need to do to get started with using XLSX files?  In short, nothing!

We took special care to keep compatibility in mind, so you can largely use the same code to process your XLSX files that you use with XLS.  There are a couple scenarios in which you’ll need to be specific about what you’re using, but they’re mainly around creating new files where you clearly need to tell OfficeWriter what kind of files to create.

Creating New Workbooks

Use the Create method on an ExcelApplication instance to create a new, blank workbook. For back compat, calling Create with no arguments will create an Excel 2003 (XLS) file.  To create an XLSX, pass it a FileFormat enum like this:

Workbook wb = xla.Create(ExcelApplication.FileFormat.Xlsx);

Continue reading OfficeWriter 8.0: Getting Started with XLSX in C#

Combiners: The Optional Step to MapReduce

Most of us know that hadoop mapreduce is made up of mappers and reducers. A map task runs on a task tracker. Then all the data for each key is collected from all the mappers and sent to another task tracker for reducing, one reduce task per key. But what slightly less than most of us know about are combiners. Combiners are an optimization that can occur after mapping but before the data is segregated to other machines based on key. Combiners often perform the exact same function as reducers, but only on the subset of data created on one mapper. This allows the task tracker an opportunity to reduce the size of the intermediate data it must send along to the reducers.

For instance, if we take the ubiquitous word count example. Two mappers may produce results like this:

Mapper A Mapper B
X - 1
Y - 1
Z - 1
X - 1
X - 1
X - 1
Z - 1
Y - 1
Y - 1

All those key-value pairs will need to passed to the reducers to tabulate the values. But suppose the reducer is also used as a combiner (which is quite often the case) and suppose it gets called on both results before they’re passed along:

Mapper A Mapper B
X - 2
Y - 1
Z - 1
X - 2
Z - 1
Y - 2

The traffic load has been reduced. Now all that’s left to do is call the reducers on the keys across all map results to produce:

X - 4
Z - 2
Y - 3

An important point to keep in mind is that the combiner is not always called, even when you assign one. The mapper will generally only call the combiner if the intermediate it’s producing is getting large, perhaps to the point that it must be written to disk before it can be sent. That’s why it’s important to make sure that the combiner does not change the inherit form of the data it processes. It must produce the same sort of content that it reads in. In the above example, the combiners read (word – sum) pairs and wrote out (word – sum) pairs.

image via: wpclipart.com

Speculative Execution: Proceed with Caution (or Not at All)

speculative execution

When a job tracker receives a map reduce job, it will divvy out tasks to several task trackers in order to complete the job. If any of those tasks fails for whatever reason (perhaps they threw an exception), then it’s up to the job tracker to restart the job on another slave. This process can occur up to three times before the job tracker gives up. But what happens if a task doesn’t fail, but it doesn’t succeed either? What if it just hangs? Perhaps that map task received an extra large or extra tough block to work with. Maybe some other application on that task tracker is running and it’s hogging the entire CPU. Maybe the task tracker has entered an infinite loop. Either way, the task tracker continues to check in from time to time, which prevents it from being killed outright, but it just isn’t finishing. The job tracker can’t possibly know why this task tracker is taking longer nor can it know when or if it will finish. What does the job tracker do?

Speculative Execution!

Without shutting down the first task tracker, it goes to another task tracker and gives it the same job. Then it’s a race. Whoever finishes first is the one that gets to submit its results. The other is killed (a most cutthroat race). That’s it.

Speculative execution isn’t always appropriate. In fact, some people recommend that you disable it for reduce jobs entirely. Why? Continue reading Speculative Execution: Proceed with Caution (or Not at All)

Traversing Graphs with MapReduce

Hadoop can be used to perform breadth-first searches through graphs. One such way is done through a series of mapreduce jobs where each mapreduce is another layer of the breadth first search. Here is a very high-level explanation of what I mean. Suppose we have the simple graph:

 E <-- C <-- F
 ^     ^     ^
 |     |     |
 A --> B --> D

This data would likely be represented in our Hadoop cluster as list of connections, like: Continue reading Traversing Graphs with MapReduce

OfficeWriter for the IT Pro: Automated Dell Warranty Lookup using Powershell and ExcelTemplate

OfficeWriter for the IT Pro posts are aimed at exploring ways to extend the use of OfficeWriter to the IT work space.

This script will dynamically query Dell’s Warranty web-service via PowerShell and export the results to an Excel (xlsx) file using OfficeWriter’s ExcelTemplate object. I’ve added colored conditional formatting depending on how many days are left before the warranty expires.

In the script, we leverage two external community provided PowerShell functions, Out-DataTable and Get-DellWarranty. Get-DellWarranty accepts a computer name then returns the results as a PowerShell object. The ExcelTemplate object will not bind a PowerShell object so we use Out-DataTable to convert the object into a .NET DataTable.

You will need proper permissions and PowerShell access to run the script against remote servers. You will need to modify the $myComputerList variable to include the computers that you want to query. You will need to download the resources.zip file attached to this post. It contains the required PowerShell modules, DellWarrantyExporttoExcel script, DellWarrantyLook.xlsx excel template, and a sample excel output file (output.xlsx). The final requirement to run the script is a copy of OfficeWriter Standard. You can download a free evaluation here . Continue reading OfficeWriter for the IT Pro: Automated Dell Warranty Lookup using Powershell and ExcelTemplate

For the Data Scientists: 5 Upcoming Big Data Conferences You Shouldn’t Miss

Big data is a big deal right now, and it’s only going to become a bigger deal in the future, so it makes sense to learn about as many of its aspects as you can, as quickly as you can. Or pick one and learn it very well. Or don’t pick any, if you are a staunch believer in the shelf-life of traditional data warehouses. From a machine learning deep-dive to an open-source buffet,  the following five conferences provide educational and networking opportunities for both the specialists and renaissance persons among you. Attending a cool one I’ve missed? Let me know in the comments!

 

 

Ruminants’ Ruminations, or The Coolest Things We Ingested This Year

Another best-of list blog post! Another best-of list blog post whose preface warns you it is a best-of list blog post! So sue me. Or don’t read it. ‘Tis the season, and I’m a copycat.

2011’s been a kind of wild and crazy year, both for us as a company and for the software world as a whole. But rather than do a straight recap, I decided to poll our crew on the hands-down coolest thing/language/trick/product/comestible/visual symphony/regular symphony they’ve ingested this year, and let you extrapolate your own state-of-the-union conclusions from these. Alors:

  • Sean Kermes:
    • Sugru! Sugru is super frigging cool.  It starts life as putty that can be hand-molded at room temperature for a bit upwards of half an hour, then over the next 24 hours it adheres to whatever you stuck it to and becomes a flexible (but tough) and slightly grippy silicone.  I’ve used it to repair and craft drawer and cabinet handles and fix some random crap, and I’m planning on starting to make some custom-fitted mouse grips so that I’m not dragging my fingers across the desk all day. Continue reading Ruminants’ Ruminations, or The Coolest Things We Ingested This Year

Blogged