Monday, January 28, 2008

LINQ To SQL Performance

In the interest of full disclosure, I must say that I like LINQ. I think the declarative approach to data manipulation is just wonderful! It makes code much cleaner and code maintenance much easier. There is an overhead, of course, but with LINQ to Objects all calls are in-process, so it's still going to be fast.

When it comes to LINQ to SQL, though, the game changes somewhat: not only database access is out-of-process, it is also notoriously tricky. Poorly designed queries can take minutes instead of seconds and drain server resources, so it is very important to know what T-SQL is being generated for a given LINQ query. But that is a topic for another post...

What I really wanted to know is - all other things being equal - how much performance overhead does LINQ to SQL add on top of ADO.NET. So, I built my test harness around one simple query against Northwind database:

SELECT [t0].[OrderID], [t0].[OrderDate], [t0].[CustomerID], [t1].[CompanyName], (
SELECT COUNT(*)
FROM [dbo].[Order Details] AS [t2]
WHERE [t0].[OrderID] = [t2].[OrderID]
) AS [ProdCount]
FROM [dbo].[Orders] AS [t0]
INNER JOIN [dbo].[Customers] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID]
WHERE [t0].[OrderID] = @OrderID

Basically, for a given order ID, we are retrieving a single row with information from order, customer, and order details tables:

OrderID OrderDate CustomerID CompanyName ProdCount
----------- ----------------------- ---------- ---------------------------------------- -----------
10248 1996-07-04 00:00:00.000 VINET Vins et alcools Chevalier 3

(1 row(s) affected)

In order to get measurable results, I wanted to run this query for every one of 830 orders (independently, to simulate a multi-user application). For benchmarking, I used three alternative approaches: dynamic SQL, dynamic SQL with parameters, and stored procedure. Because SQL Server caches query execution plans, I restarted it before switching to a different approach.

Dynamic SQL

In the method below, I am concatenating order ID value to the end of the WHERE clause. This technique is known to have poor performance, because SQL Server doesn't realize we are using the same query and will have to compile it every time. Indeed, the initial run took approximately 4100 ms on my laptop. SQL Server caches query plans, though, so subsequent executions of the test yielded a much better result of roughly 200 ms.
private TimeSpan RunADOTest()
{
using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["Northwind"].ConnectionString))
{
conn.Open();
SqlCommand cmd = conn.CreateCommand();
cmd.CommandType = CommandType.Text;
DateTime dtStart = DateTime.Now;
for (int orderId = 10248; orderId < 11078; orderId++)
{
cmd.CommandText = @"SELECT [t0].[OrderID], [t0].[OrderDate], ...
WHERE [t0].[OrderID] = " + orderId.ToString();
SqlDataReader dr = cmd.ExecuteReader();
dr.Close();
}
return DateTime.Now.Subtract(dtStart);
}
}

Parameterized Dynamic SQL

The only difference from previous method was that query WHERE clause changed to "WHERE [t0].[OrderID] = @OrderID". I also added these lines before calling cmd.ExecuteReader():

cmd.Parameters.Clear();
cmd.Parameters.Add(new SqlParameter("@OrderID", orderId));

As expected, performance has improved. After SQL Server restart, the method completed in 460 ms, and subsequent executions were around 190 ms. It's also important to realize that SQL Server has cached only one query plan and not 830 as in the previous example.

Stored Procedure

Ever since Microsoft added query plan caching for dynamic queries in SQL Server 2000, there really is no performance difference between stored procedure and a parameterized dynamic query. Of course, there are many good reasons for writing stored procedures (greater security, better code reuse, smaller network traffic).

In my test harness, results were nearly identical to parameterized dynamic SQL: initial run (after SQL Server restart) took 453 ms, and subsequent executions took 187 ms.

LINQ To SQL

I dropped Customers, Orders, and Order Details tables to the surface of Visual Studio object relational designer to create a LINQ to SQL classes. In order to ensure that all calls are executed using a single database connection, NorthwindDataContext is initialized with an open SqlConnection object. I also wanted to make sure T-SQL generated by LINQ is the same as in previous tests, so I installed SqlServerQueryVisualizer component and ran a SQLProfiler trace. Indeed, both Parameterized DSQL and LINQ to SQL tests issued the same exact exec sp_executesql command. I had to use query variable in a foreach statement to make it run (because of deferred execution).

Test results were, frankly, disappointing. After SQL Server restarted, the method took 4250 ms. Subsequent executions yielded between 3300 and 3400 ms, or more than 10 times slower than all other tests.

private TimeSpan RunLINQTest()
{
DateTime dtStart = DateTime.Now;
using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["Northwind"].ConnectionString))
{
conn.Open();
NorthwindDataContext db = new NorthwindDataContext(conn);

for (int orderId = 10248; orderId < 11078; orderId++)
{
var query = from o in db.Orders
join c in db.Customers on o.CustomerID equals c.CustomerID
where o.OrderID == orderId
select new
{
o.OrderID,
o.OrderDate,
o.CustomerID,
c.CompanyName,
ProdCount = o.Order_Details.Count
};

foreach (var item in query)
{
string s = item.CompanyName;
}
}

return DateTime.Now.Subtract(dtStart);
}
}

Test Summary

The table below summarizes test results. As you can see, after initial query plans are cached, dynamic SQL and stored procedure have essentially the same performance. Either approach is 94% faster than LINQ to SQL.

Can't say I'm happy with these results (after all, I do like LINQ), but at this point I don't see what else can be contributing to the delay. Unfortunately, I only have Professional version of Visual Studio which doesn't include a profiler, so I can't research further. However, I welcome any comments or corrections. Source code of the test harness is posted here: http://members.cox.net/rmamedov/blog/LINQ2SQLPerformanceTest.zip.

Initial Subsequent

Dynamic SQL 4100 200

Parameterized
Dynamic SQL 460 190

Stored Procedure 453 187

LINQ to SQL 4250 3350