SQL Smack down is back!
It's been a while since the last issue, and I don't have any major topics to cover at the moment, so I thought of a few tips that may be helpful to you. All examples use the Pubs sample database.
I know I have covered joins a couple times in the past, but there are a few things you may find useful here.
We all know how to use the ON clause of the JOIN statement right? Well if you have looked at many queries with joins you may have noticed something like what is below:
USE Pubs
SELECT t.title_id, t.title, s.qty
FROM titles t
LEFT OUTER JOIN sales s
ON (t.title_id = s.title_id
AND s.stor_id = '7131')
After looking at this you may have wondered why the stor_id field was evaluated in the ON clause instead of the WHERE clause. The answer is because the query needs to return all rows from the titles table, but only the matching rows from the sales table where the stor_id is 7131. If the stor_id field was validated in the WHERE clause, the query would only return the rows where the stor_id from sales equaled 7131, thereby not return all titles in the titles table, which would basically be the same as using an INNER JOIN instead of our OUTER JOIN.
I know many of you have come across this problem before. You have a query where you want to dynamically return a set number of rows, however T-SQL won't let you do the following:
SELECT TOP @rows title
FROM titles
Well there are 2 ways I know of to get around this. The first, and most elegant method is to use the SET ROWCOUNT statement. What this does is to tell the query engine to stop processing after a set number of rows. The basic usage is as follows:
USE Pubs
DECLARE @rows INT
SET @rows = 10
SET ROWCOUNT @rows
SELECT *
FROM titles
SET ROWCOUNT 0
This statement will return the top 10 rows from the Products table. Notice the last line in the above statements. The most important thing to remember when using SET ROWCOUNT is that you must set it back to 0 when you are done. Setting the rowcount to 0 tells the query engine to process all rows again. If you do not reset the rowcount, all queries on your database with this connection will be limited to the rowcount you set until it is reset.
You can use SET ROWCOUNT with SELECT, UPDATE, and INSERT statements, so it is very versatile.
The other method of selecting the top # rows is to use dynamic SQL. This is really ugly and very bad for performance. Here is a quick example that should be fairly easy to understand.
USE Pubs DECLARE @sql VARCHAR(1000), @rows INT SET @rows = 10 SET @sql = 'SELECT TOP ' + CONVERT(VARCHAR, @rows) + ' * FROM titles' EXEC(@sql)
Ugly, slow, not very exciting. I would recommend using the SET ROWCOUNT if possible.
Few of you have probably heard of the COALESCE function, and fewer yet have most likely used it. This function is one of the most useful built-in function in SQL Server in my opinion. In this section I will go over what this function does, and some of the common uses for it.
All the COALESCE function does is to return the first non-NULL value passed to it. If all the values passed to the function are NULL, COALESCE will return NULL. Here is the basic syntax:
COALESCE(expression1 [, ...n])
The expressions passed to the function can be variables, field names, or function results. Here is a quick example.
DECLARE @var1 VARCHAR(4), @var2 VARCHAR(4) SET @var1 = NULL SET @var2 = 'test' SELECT COALESCE(@var1, @var2)
---- test
So you can see that @var1 was skipped since it's value was NULL, and @var2 was returned since it was the first non-NULL value given to the COALESCE function. Now let's get to some more practical uses.
The most basic (and most likely common) use for COALESCE is for reports, computing aggregate values, and for displaying result sets. For this example we will use the query from the first example:
USE Pubs
SELECT t.title_id, t.title, s.qty
FROM titles t
LEFT OUTER JOIN sales s
ON (t.title_id = s.title_id
AND s.stor_id = '7131')
Notice when we execute this statement that the qty field has a lot of NULL values in it. These are rows where the sales table did not have any sales by the store with stor_id of '7131'. If we wanted to make a report for Rudy, we wouldn't want to have all those NULL values in it, so we want to replace them with zeros. We could do this in our code, but it is actually very easy to do it in T-SQL using the COALESCE function.
USE Pubs
SELECT t.title_id, t.title, COALESCE(s.qty, 0) AS qty
FROM titles t
LEFT OUTER JOIN sales s
ON (t.title_id = s.title_id
AND s.stor_id = '7131')
Now if we execute this statement you will see that the qty field has 0 instead of NULL where there was not a match with the sales table. The COALESCE function actually executed on every row of the result set, and if the result for the s.qty field was going to be NULL, went to the next expression passed to it, which happens to be a zero, which is not NULL, and therefore returned it instead.
To implement a dynamic WHERE clause without using dynamic SQL, we will employ the COALESCE function. Here is an example:
USE Pubs
CREATE PROCEDURE dbo.usp_PubsTest
(
@stor_id CHAR(4) = NULL,
@title_ID VARCHAR(6) = NULL,
@ord_num VARCHAR(20) = NULL
)
AS
SELECT t.title_id, t.title, s.qty, s.ord_num, s.stor_id
FROM titles t
INNER JOIN sales s
ON (t.title_id = s.title_id)
WHERE s.stor_id = COALESCE(@stor_id, s.stor_id)
AND s.title_id = COALESCE(@title_id, s.title_id)
AND s.ord_num = COALESCE(@ord_num, s.ord_num)
GO
So as we can see from the query above, we have 3 variables that our stored procedure will accept. These are the possible choices for our WHERE clause. Notice that after the data type declaration for each variable I have included " = NULL". This sets the default value for the variable, and makes it optional. In the WHERE clause you will see that we make use of the COALESCE statement. Lets break down what is happening there.
WHERE s.stor_id = COALESCE(@stor_id, s.stor_id)
What we are saying here is we only want rows where the stor_id is equal to the result of the right side of the statement. The COALESCE function will return the first non-NULL expression passed to it, so if @stor_id's value is not NULL, we will check stor_id against that variable. However, if @stor_id is NULL, then the field s.stor_id is returned, which happens to be the same as the field we are evaluating on the left side of our expression. This means that if @stor_id is NULL, we are basically saying this:
WHERE s.stor_id = s.stor_id
From this we can see that these two value obviously must be equal to one another, so all rows will be returned for this part of the WHERE clause. If we had a value of '7131' in the WHERE clause, this would be equivalent to what the query engine would evaluate:
WHERE s.stor_id = '7131'
I hope this gives you an idea of how dynamic WHERE clauses with COALESCE work. I know it can be a little confusing, but once you get a grasp on it you will find this very useful. Here is an example of how you would call the stored procedure if you wanted to pass just 1 value into it.
EXEC usp_PubsTest @stor_id = '7131'
This will return only the rows where the stor_id field is equal to the value we passed it, which is '7131'. This can be a bit confusing, but play with it, try some different things, and it will all start falling into place.
If you have ever had to concatenate some fields together in a query, you have probably noticed that if one of the fields is NULL, the entire result value will return as NULL. If this is not the intended result, simply do the following before executing your query:
SET CONCAT_YIELDS_NULL OFFThis just tells the query engine to treat NULL values as empty strings when they are concatenated to something else, rather than returning NULL for the entire value.
INNER LOOP JOINBy now we should all know what the different types of JOINs are, but how many of you know about INNER LOOP JOINs? Basically this tells the query engine to perform a nested loop comparison when doing the join rather than doing a hash join. Hash joins are great for when you have 2 data sets that are fairly close in size, or there is a large amount of data to be returned. However if one data set is small and the other is very large, using a loop join to join the larger table to the small one may be faster than using a hash join. The reason is hash and merge joins require a lot of processing and temporary storage (you got it, in our old friend tempdb), where a loop join simply iterates through one table pulling matching rows from the second.
There are some problems with this technique however. You really only want to use this hint when you are going to be returning smaller amounts of rows. Large data sets used to populate long reports or views are probably not good candidates for a loop join. You also must ensure that your table order in the query is optimal to see any improvement. If you are selecting from a very large table and joining a very small table to the large one, this hint will probably hurt performance more than help.
The most important thing to remember when deciding which way to implement a join, as with most things in SQL Server, is to try it out and see what happens. Test the results, modify the query, and test some more.
SET NOCOUNT ONI can't stress how many times this one option can has caused people problems by not using it. If you have multiple things going on in your stored procedure before returning a result set, be sure to set this option! It tells SQL Server not to return the row counts that you see in the messages in the query analyzer. Sometimes these row counts can be mistaken as a result set by ADO, which causes errors in your programs. By setting this option on at the beginning of your stored procedures you fix these problems before they happen.
An added bonus to using this command is that it cuts down on network traffic by not sending all the extra row count data over the network, when most likely you will ignore it anyway.
sp_executesqlThis stored procedure acts very much like EXEC or EXECUTE which many of us use on a daily basis. It accepts a parameter of a SQL statement, and executes it. The main difference between this and EXEC is that this stored procedure can cache execution plans if the same statement is executed, and only the parameters of the SQL statement passed in change. All object names in the SQL Statement must be fully qualified in order for the execution plan to be cached (i.e. pubs.dbo.titles). Try this out on anything that you run often and see if you get any performance gain.
Well that's it for this edition of SQL Smack down. As always, if you have any questions about these or any other topics, please talk to Barry.