Joining DataTables in LINQ

[Image via Patou Fine Art]

The Problem

Recently, I encountered a situation where I had to join two Data Tables from databases on two different servers. Had they been on the same server, I would do this in SQL with a simple JOIN statement, but since the databases were on different servers, my options were limited. I could:

  1. Use a linked server (http://msdn.microsoft.com/en-us/library/ms188279.aspx)
    But: This wasn’t really an option because I wouldn’t be able to modify production servers
  2. Use System.Data and create a DataRelation
    But: DataRelations are clunky and don’t perform LEFT OUTER JOIN’s very well
  3. Create a new DataTable and copy rows over manually (with, for example, a foreach loop)
    But: This would be painfully slow
    And: I’d have to duplicate a lot of null-row handling logic that LINQ and SQL do themselves
  4. Join the tables in the codebehind using C#’s own SQL-like syntax, LINQ.
    But: I had no idea how to use LINQ.

The Solution

After some research, I decided on option #4. To get a basic overview of LINQ, I started by going through 101 LINQ Samples and trying to learn the LINQ syntax. This was my first attempt at getting a method to join two DataTables:

For the most part, this code worked well. It managed to join the two tables. There were two issues, however:

  1. The code return a LINQ result-set, not a DataTable, so I couldn’t bind the data to the report using OfficeWriter.
  2. The type of JOIN LINQ performed was an INNER JOIN, which only selects rows that are not null in both databases.
    1. In the case of this query, for certain dates, the Table1 had non-null rows where Table2 had null rows
    2. For certain dates, the information from the Table1 would disappear because of the INNER JOIN

The first issue seemed very difficult because there was an inherent lack in interoperability between System.Data and System.LINQ, since I couldn’t turn one result set in the other. Luckily for me, someone had already figured this out, so I used their method to convert the LINQ result set into a DataTable I could use. This method involves a really clever use of System.Reflection:

So, that method easily took care of the data structure conversion issue. The next problem was changing the kind of JOIN that LINQ performed. For that issue, I found a solution here. Now my method looked like this:

There was one final problem, however. In the last line of the LINQ “select new” satement, I used a ternary statement to turn null values into zeros (so that LINQ won’t be upset with null reference exceptions). Since I was only selecting one column from the DataTable that was going to be partially null (due to the LEFT OUTER JOIN), I thought I only had to perform the ternary statement on that column. Boy, was I wrong. During debugging, I kept getting errors along the lines of “The object reference is not set to an instance of an object“. I had no idea what was going on, and for a long time I kept slightly changing the ternary statement to try to get correct results. Then, I realized my error: it wasn’t just a column from the Table2 that was null, it would be an entire row, since no data in the table existed for certain dates. The object reference exception arose from LINQ trying to reference the “Row2” column from a row that simply didn’t exist. I changed the last line of the select statement to:

and, lo and behold, the report worked! I was elated.

Summary

Some of the things I’ll remember from this simple but interesting problem:

  1. You can’t ask for a column from a completely null DataRow
  2. LINQ is like backwards SQL, but it’s less forgiving.
  3. Using someone else’s method isn’t necessarily a bad thing, as long as they get credited
  4. Data structures in C# aren’t as interchangeable as I would like them to be.

If you’ve gotten this far into this blog post, congrats, I didn’t bore you to death. But thanks for reading anyway!

Related posts: