This is just a short article (more of a blog-ette than a full blog) about some things we as developers need to consider when sending and parameterizing queries in Microsoft’s SQL Server. While some of this information may also be true for other flavors of database servers, these things are known to be true for SQL Server.
By now, we should all be familiar with the dangers of inserting user-entered data directly into a string that will become an SQL query. Your code should never have something like this in it:
String sql = “select * from mySchema.myTable where keyField = ‘“ + userEnteredValue + “‘“;
You always want to parameterize such things, like:
String sql = “select * from mySchema.myTable where keyField = :userData“; sqlParms.put(“userData”, userEnteredValue);
This allows the query to be checked at multiple levels to ensure it is safe and usable.
But let’s say your validation routines have made sure that the user-entered data is safe from SQL injection. Where’s the harm in that? Is there any? Let’s discuss …
Execution Plan Optimization
Whenever you send a query to SQL Server, it wants to run that query in the most efficient way possible.
If it has never seen that particular query before, then it will compile a new execution plan for it and put that in its internal execution plan cache. If your query has inlined the data, then every time your query is called with different data, it will appear as a new query.
So, if your query looks something like this:
String sql = “select * from Permissions where userId=‘“ + userId + “‘“;
Then each of your application’s users will create a new execution plan in the cache. This takes time – not much perhaps, but time nonetheless, and time adds up quickly.
This is where parameterizing queries can come in handy for SQL servers. If you change your call to use a parameter instead, the query will always look the same to the server, and it will find an execution plan in the cache to use. This will save the execution plan compilation time every time after the first.
I want to make sure to point out that if the query has a value in it that never changes (for example, a particular query always looking for mobile phone numbers), then you wouldn’t have to parameterize the number type. Because it never changes, and it wouldn’t cause a new execution plan to be created, that could be hard coded in the query.
So, we should always parameterize everything and our problems are over, right? Or are they …
Data Sensitive Plans
Once SQL Server has an execution plan in the cache, it tries to use it. But sometimes it doesn’t fit the data very well. If you have a query that has the same structure every time but gets wildly different results based on the value of the parameters, this affects how well the cached execution plan will work.
For example, if you are returning a row for every address within a zip code, the number of records will vary widely between rural and urban codes. So, if the cached plan was created for rural western Kansas, it probably wouldn’t be very efficient for somewhere in the Chicago loop and vice versa.
In cases like this, you may want to spend the extra time to create a new execution plan every time. There are several ways you can make that happen. One of the simplest is to give SQL Server a hint about what you want it to do.
String sql = “select * from Addresses where zip=:zipcode option(recompile)“;
Passing the above line of code tells SQL Server that you always want to recompile a new execution plan.
Another option is to avoid that type of situation in the first place — if you can. For example, if you were looking for a breakdown of all the phone numbers for an area code by type of number, you would expect to get a lot more data looking for mobile numbers than faxes. If you were to create two queries in your code with the phone type hard coded rather than passed in as a variable, this would cause SQL Server to create two different execution plans. And each plan could be optimized for the expected difference in the size of the dataset.
Getting SQL Server to perform at its best means paying attention to what you are sending it and how it has to interpret that. Having a cached execution plan can help, especially if your query is relatively complex. (We have had some plans take up to 20 seconds to compile and cache.) But having the wrong plan cached can cause frustratingly intermittent performance problems.
Parameterizing queries correctly in SQL takes time and effort! My best advice to you is more a word of encouragement. Don’t give up, just keep digging. Eventually, you will prevail!
About the Author
Clayton Neff has been programming computers since you had to throw switches to get the darn things to boot up. He was introduced to object-oriented programming in the mid-1980s, and that has been so much fun for him that he still enjoys what he does.
If you have any questions or comments, please feel free to contact him at [email protected].[Editor’s Note: Never miss a post by subscribing to receive new blogs by email. No spam, opt-out anytime.]