Wednesday, January 26, 2011

LINQ to SQL and Memory Leaks

This happened when I parsed several gigabytes of HTML pages. I've noticed that program's process ate my memory tragically - over 1 GB in 5-10 minutes.

The first thing I thought was memory leaks in HtmlAgilityPack, that I heard being updated not very often. But removing it was not successful. After several minutes of removing all components I figured out that the reason is LINQ to SQL. My code like the following:

var dcTableOne = new TableOneDataContext(_connectionString);
while (true)
{
    var row = new TableOne() { Status = 0, Name = stringToSave };
    dc.TableOnes.InsertOnSubmit(row);
    dc.SubmitChanges();
}

Besides of that I had notice that Entity Framework v4 has memory leaks too. The only method that woks fine was direct SQL executing - via SqlCommand.ExecuteNonQuery or via the same but over LINQ to SQL:

dcTableOne.ExecuteCommand("INSERT INTO TableOne(Status,Name) VALUES({0},{1})", 0, stringToSave);

After minutes of googling the reason was found. LINQ to SQL (and EF) has the feature, that tracks all objects that passed through it. And what I needed to do is turn it off. But it's caused my code to fail on inserts. So, I wend in further searches. They said me that DataContext object was designed for short lifetime and then has to be used in such way:

while (true)
{
    using (var dc = new TableOneDataContext(_connectionString))
    {
        var row = new TableOne() { Status = 0, Name = stringToSave };
        dc.TableOnes.InsertOnSubmit(row);
        dc.SubmitChanges();
    }
}

That fixed my problems, but after a moment of working on my project I got similar problem: I required to read big array of data from database, and it couldn't be solved with the method mentioned above by the technical reason - I had to read all data sequentially. Turning tracking off was the right decision:

dcTableOne.ObjectTrackingEnabled = false;
foreach (var elem in dcTableOne.TableOnes)
{
    // working with elem
}