Importing CSV Data into SQL Server

On a fairly regular basis at work I have to take .csv files with data from a customer’s database and import them into my SQL Server instance. This should be a simple operation, but the design of the native SQL Server Data Import tool makes it much, much more complicated than it needs to be. The biggest pain point by far is having to specify the data types of every single column in the data source by hand. Not only does it assume that everything is a string, forcing me to go through and figure out which types are a date, an int, a float or whatever else I have in the database, but by default changing a column’s type to a decimal uses decimal(18,0), truncating all my floating point values.

It always seemed a bit absurd that I should have to manually specify the types of all my data since I was importing into a database where the data types of the columns are explicitly specified. Once the import tool knew what table I was importing into, shouldn’t it be able to figure out the types for me?

So I wrote a Python script to import data for me. It uses the pymssql module to connect to the database, and the main insert function requires only three arguments: a connection to a database, the name of the table to import into, and an object with a function to return the names of the columns and then be able to iterate over the data. Conveniently, I also wrote a CSVFile class that wraps around a .csv file and provides just that functionality.

This is a first draft of the code and is pretty feature-limited. It requires that the column names be in the first row of your csv file, and will simply crash if pymssql doesn’t recognize a piece of data as being of the type it expects. It’s also vulnerable to SQL injection attacks in a few places, so don’t use it on data you don’t trust. It only has a command-line interface with braindead argument parsing. It only accepts files with comma delimiters. Finally, it requires an extra call to the database to figure out the column types, but unless your database is awfully remote that’s probably okay.

What it can do is properly handle null values (despite this issue: http://sourceforge.net/forum/forum.php?thread_id=1633905&forum_id=124111) and correctly match columns that aren’t in the same order as they are in the database. For all it’s limitations, saving me five minutes of frustration every time I have to import a table with 30 columns is well worth it. Grab the code here.

T-SQL Tuesday 016: When is yesterday, exactly?

January Boston Tech Meetups to Attend

What's new in OfficeWriter 8.5

SoftArtisans

Importing CSV Data into SQL Server

Related posts:

Blogged