Tuesday, December 4, 2007

ADO.NET performance unleashed

Introduction It seems that everyone has a different opinion about the best way to access data in a database and present it to the user for viewing or editing. Some say a DataTable or DataSet offer the most flexibility, less code, etc. Other people say no roll your own custom objects since they are the fastests and most flexible. There are certainly pros and cons of each, and I will highlight some of the them. This blog will not get into which is better since every situation is different. Sometimes it makes sense to choose one over the other. What I will cover is raw performance and generally why you may use one over the other. It is up to you decide if it makes sense in your situation which method to use. Below I will explore the performance of four different ways of accessing data from a SQL Server database. These include: Custom (objects that I used a code generator to create), Doodads (a framework that also uses code generator to create specific custom objects), DataTable, and finally DataSet. DataTable DataTable and a DataAdapter to load data from the database is a very common way to manage the access of your data. DataTables are disconnected objects meaning that they give the connection back after the object is loaded with data. This is important to handle many connections a second. You don't want your connections staying open very long because this is a limited resource. Once the limit is reached your application will fail. At the very least, it will get sluggish. DataTables require little coding (virtually none if using Designer in Visual Studio 2005), so they are nice for RAD. DataSet DataSet works very much the same as DataTable except a DataSet potentially contains multiple DataTables. This means that any performance DataTables has, DataSet will be a little slower just due to the overhead of creating the DataSet. Doodads Doodads is a framework that is open source and included with My Generation ( a code generation tool). Its biggest strength is that it handles complex transactions between tables at the application level, and that you can perform adhoc queries against a single table without any stored procedures or writing any SQL. This is a powerful feature to have when building some query pages. Doodads uses DataAdapters and DataTables under the hood so they are prone to the same performance issues as DataTables. Custom Objects Custom Objects can take a variety of forms. Typically they are created with My Generation or CodeSmith. If coded properly, these objects can extremely light weight and fast. This comes at the cost of limited functionality or at least functionality that you need to implement. In many cases, especially for read only data these objects perform the best. This assumes that your custom objects do not use DataTables and DataAdapters. If you do, they will perform no better than a DataTable and probably somewhere between a DataTable and a DataSet due to the overhead. With that said, the choice is yours. Here are some real numbers that I recorded from my tests. Results are taken from a SQL Server 2005 Developer Edition, and the database table had six rows to virtually guarantee each query comes back in the same amount of time regardless of SQL Server caching, etc. I ran two tests. In the first test, I loaded a record by primary key. In the second test, I loaded multiple records (select all rows). Here is what I found. Units for the numbers are ticks. Single Record Load Test Custom Objects (which use DataReader) is the fastests. They are approximately 30% faster than DataAdapter / DataTable / DataSet. Doodads does not perform as well in comparison because over its additional overhead.
Custom ObjectsDoodadsDataTableDataSet
Average 1403207517711825
Minimum1083156314471471
Maximum2717714633252909
Standard Deviation 298721227233
Below you can see that Custom Objects do much better overall that any of the other methods. Doodads is the worst as expected due to basically being DataTables with additional overhead. 6 Record Load Test

As you might expect DataReader is the best performer when loading 6 records into array of entities. The standard deviation is even the least which means less volitility. .Net does not give an API to determine how much memory each class uses, but I used Red Gate Ants profiling tool to see that the number of objects created by DataTable and DataSet is MUCH greater than a Custom Object. This was not a surprise to me since DataTable and DataSet have much more functionility for tracking changes, etc than my Custom Object has. This is important when caching data.

Custom ObjectsDoodadsDataTableDataSet
Average 1374198818331841
Minimum1179170015941644
Maximum2174498627802937
Standard Deviation 177364209186

Below you can see that Custom Objects out performs the other Methods in nearly all cases. The gap increases between Custom Objects and the other methods now that we are loading multiple records. You can see at the end of the tests the graph becomes more volatile. I believe this is because of contention for the connections. At that point, we are starting to run out of connections and waiting is starting to occur. Some of the spikes are most likely due to garbage collection and memory allocation, but I can't prove that theory. These tests are performed in a loop that loops for 100 times, so garbage collection is sure play a part.

The big picture While Custom Objects are the best performer in general, its limited feature set may require more features be added or a framework be used that supplies the additional functionality. It is important to remember that the biggest bottleneck on most websites is database connectivity. This is fairly obvious in my opinion, but can also be seen if you profile your existing application (no matter how you connect to the database) the slowests methods will be the ones that connect to the database. It makes sense to spend your time optimizing here. It is important to remember that the best way to optimize database connectivity is to limit the times it is executed. One way to do this is to cache results, perhaps use ViewState and only load data on page load the first time, cache pages, etc. This is the largest reward. However, this is not always an option. In those cases, it is good to know which method here is going to give you the best performance. References These references more or less mirror what I am saying, and also provide some other solutions that may be helpful in some cases. If you have long queries and lots of users, you may want to keep those worker threads free by using Asynchronoous queries against the database. http://msdn.microsoft.com/msdnmag/issues/07/03/WickedCode/ Are you scared of using ViewState, this article may change your mind (if you're using ASP.Net 2.0 or greater). http://msdn.microsoft.com/msdnmag/issues/04/10/ViewState/ This article does a similar comparison to what I did, but focuses on Requests per second and Response time when the application is put under different loads. This comes to very similar conclusions, but from a different angle. http://msdn2.microsoft.com/en-us/library/ms978388.aspx More of the same, but also takes OLEDB into account. Think twice before using OLEDB. http://www.devx.com/vb2themax/Article/19887/1954?pf=true

Best Practices from TechEd http://www.theserverside.net/news/thread.tss?thread_id=26210

No comments: