ASP.NET Cafe
Tips and Tricks

CSV to DataTable Parser

Wednesday, 6 February 2008 03:30 by Dmitriy

Recently I need to parse CSV (comma separated values) text data into datatable. Standard format where first line is column names, values are separated by commas and possible quoted.

I've used powerful regex in .net to parse that data:

"(\"([^\"]*|\"{2})*\"(,|$))|\"[^\"]*\"(,|$)|[^,]+(,|$)|(,)"

As usually it's really unreadable. :)

And as result I've a class to parse CSV to datatable. Not much comments, just want to post the class, because was really sad after search in Google... there are no ready solution for this common problem (if you know some - let me know in comments).

CSVParser.cs (2.41 kb)

Easy to use - just create object and pass your stream with CSV data as parameter. And call ParseToDataTable().

Currently rated 3.2 by 5 people

  • Currently 3.2/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:  
Categories:   C#
Actions: E-mail | del.icio.us | Kick it! | DZone it! | Permalink | Comments (3) | Comment RSSRSS comment feed

Related posts

Comments

February 6. 2008 10:32

Will Sullivan

I just had to do this during my last iteration; parse a delimited file into a datatable. I didn't use regexes because the delimeters and qualifiers might be different at runtime and I didn't want to string.Format my regex to change the qualifiers and delimeters.

I also read the file into a string, then parsed it as the process runs in a worker thread within a service. I wanted to get in and get out of the file as quickly as possible and within a critical section so that I wouldn't leak any file handles if killed during the process.

Will Sullivan

February 8. 2008 20:53

Sam

I downloaded your .CS file and will look this over soon. I love these custom classes that can be used over and over again, and this sounds like it could be very useful in many, many projects.

Sam

May 1. 2008 12:46

Carl Kelley

// Thank you for your excellent post, Will.
// Below, I enhanced ParseCSV to:
// + support DataTables with pre-defined columns
// + support CSV files that do not have a header row
// + not crash on column overflow
// + open a file of your choosing to provide the stream
// Just trying to give back, bro.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;


namespace Streamline.Utilities
{
public class CSVHelper
{
private System.Text.RegularExpressions.Regex _CSVregEx = new System.Text.RegularExpressions.Regex("(\"([^\"]*|\"{2})*\"(,|$))|\"[^\"]*\"(,|$)|[^,]+(,|$)|(,)");
const string _DEFAULT_TABLENAME = "CSV";
private DataTable _DataTable = new DataTable(_DEFAULT_TABLENAME);
private char[] _DELIMITER = ",".ToCharArray();
private char[] _QUOTES = "\"'".ToCharArray();

public string TableName
{
get { return _DataTable.TableName; }
set { _DataTable.TableName = value; }
}

public CSVHelper()
{
}

public CSVHelper(DataTable dt)
{
_DataTable = dt;
if (string.IsNullOrEmpty(_DataTable.TableName))
{
_DataTable.TableName = _DEFAULT_TABLENAME;
}
}

public CSVHelper(string tableName, string[] columnNames)
{
_DataTable.TableName = tableName;
GenerateDataColumns(columnNames, _DataTable);
}

public DataTable CSVFileToDataTable(string csvFilePath, bool firstRowHasColumnNames)
{
// check that the file exists before opening it
if (File.Exists(csvFilePath))
{
using (FileStream stream = new FileStream(csvFilePath, FileMode.Open))
{
try
{
DataTable dt = ParseCSVStreamToDataTable(stream, firstRowHasColumnNames);
return dt;
}
finally
{
stream.Dispose();
}
}
}
else
{
throw new FileNotFoundException(string.Format("File '{0}' not found.", csvFilePath));
}
}

public DataTable ParseCSVStreamToDataTable(Stream stream, bool firstRowHasColumnNames)
{
using (StreamReader reader = new StreamReader(stream))
{
try
{
string line = reader.ReadLine();
string[] data = BreakCSV(line);
if (_DataTable.Columns.Count == 0)
{
if (firstRowHasColumnNames)
{
GenerateDataColumns(data, _DataTable);
}
else
{
// assume the first row has the correct number of columns
GenerateDataColumns(data.Length, _DataTable);
}
}
while (!reader.EndOfStream)
{
data = BreakCSV(line);
DataRow dr = _DataTable.NewRow();
// protect against overflow
int maxColumns = Math.Min(_DataTable.Columns.Count, data.Length);
for (int i = 0; i < maxColumns; i++)
{
dr[i] = data[i];
}
_DataTable.Rows.Add(dr);
line = reader.ReadLine();
}
return _DataTable;
}
finally
{
reader.Close();
reader.Dispose();
stream.Close();
}
}
}

protected string[] BreakCSV(string line)
{
MatchCollection matches = _CSVregEx.Matches(line);
string[] data = new string[matches.Count];
int i = 0;
foreach (Match m in matches)
{
data[i] = m.Groups[0].Value.TrimEnd(_DELIMITER).Trim(_QUOTES);
i++;
}
return data;
}

private static void GenerateDataColumns(int columnCount, DataTable dt)
{
for (int i = 0; i < columnCount; i++)
{
string columnName = String.Format("col{0:000}", i);
DataColumn stringColumn = new DataColumn(columnName, typeof(String));
stringColumn.ColumnMapping = MappingType.Attribute;
dt.Columns.Add(stringColumn);
}
}

private void GenerateDataColumns(string[] columnNames, DataTable dt)
{
foreach (string s in columnNames)
{
DataColumn stringColumn = new DataColumn(s, typeof(String));
stringColumn.ColumnMapping = MappingType.Attribute;
dt.Columns.Add(stringColumn);
}
}
}
}

Carl Kelley

Add comment


(Will show your Gravatar icon)  

  Country flag

[b][/b] - [i][/i] - [u][/u]- [quote][/quote]



Live preview

August 27. 2008 23:01