Blog Archives
A viable alternative to dynamic SQL in administration scripts
As a DBA, you probably have your toolbox full of scripts, procedures and functions that you use for the day-to-day administration of your instances.
I’m no exception and my hard drive is full of scripts that I tend to accumulate and never throw away, even if I know I will never need (or find?) them again.
However, my preferred way to organize and maintain my administration scripts is a database called “TOOLS”, which contains all the scripts I regularly use.
One of the challenges involved in keeping the scripts in a database rather than in a script file is the inability to choose a database context for the execution. When a statement is encapsulated in a view, function or stored procedure in a database, every reference to a database-specific object is limited to the database that contains the programmable object itself. The only way to overcome this limitation is the use of dynamic sql.
For instance, if I want to query the name of the tables in a database, I can use the following statement:
SELECT name FROM sys.tables
The statement references a catalog view specific to a single database, so if I enclose it in a stored procedure, the table names returned by this query are those found in the database that contains the stored procedure itself:
USE TOOLS; GO CREATE PROCEDURE getTableNames AS SELECT name FROM sys.tables; GO EXEC getTableNames;
This is usually not an issue, since most stored procedures will not cross the boundaries of the database they are created in. Administration scripts are different, because they are meant as a single entry point to maintain the whole SQL server instance.
In order to let the statement work against a different database, you need to choose one of the following solutions:
- dynamic SQL
- sp_executesql
- marking as a system object
- … an alternative way
Each of these techniques has its PROs and its CONs and I will try to describe them in this post.
1. Dynamic SQL
It’s probably the easiest way to solve the issue: you just have to concatenate the database name to the objects names.
USE TOOLS;
GO
ALTER PROCEDURE getTableNames @db_name sysname
AS
BEGIN
DECLARE @sql nvarchar(max)
SET @sql = 'SELECT name FROM '+ QUOTENAME(@db_name) +'.sys.tables';
EXEC(@sql)
END
GO
EXEC getTableNames 'msdb';
PROS:
- very easy to implement for simple statements
CONS:
- can rapidly turn to a nightmare with big, complicated statements, as each object must be concatenated with the database name. Different objects have different ways to be related to the database: tables and views can be concatenated directly, while functions such as OBJECT_NAME accept an additional parameter to specify the database name.
- the statement has to be treated as a string and enclosed in quotes, which means that:
- quotes must be escaped, and escaped quotes must be escaped again and escaped and re-escaped quotes… ok, you know what I mean
- no development aids such as intellisense, just-in-time syntax checks and syntax coloring
2. sp_executesql
It’s a neater way to avoid concatenating the database name to each object referenced in the statement.
USE TOOLS;
GO
ALTER PROCEDURE getTableNames @db_name sysname
AS
BEGIN
-- use a @sql variable to store the whole query
-- without concatenating the database name
DECLARE @sql nvarchar(max);
SET @sql = 'SELECT name FROM sys.tables';
-- concatenate the database name to the
-- sp_executesql call, just once
DECLARE @cmd nvarchar(max);
SET @cmd = 'EXEC '+ QUOTENAME(@db_name) +'.sys.sp_executesql @sql';
EXEC sp_executesql @cmd, N'@sql nvarchar(max)', @sql
END
GO
EXEC getTableNames 'msdb';
PROS:
- the dynamic sql is taken as a whole and does not need to be cluttered with multiple concatenations
CONS:
- needs some more work than a straight concatenation and can be seen as “obscure”
- suffers from the same issues found with plain dynamic sql, because the statement is, again, treated as a string
3. System object
Nice and easy: every stored procedure you create in the master database with the “sp_” prefix can be executed from any database context.
Using the undocumented stored procedure sp_MS_marksystemobject you can also mark the stored procedure as a “system object” and let it reference the tables in the database from which it is invoked.
USE master;
GO
ALTER PROCEDURE sp_getTableNames
AS
BEGIN
SELECT name FROM sys.tables
END
GO
EXEC sys.sp_MS_marksystemobject 'sp_getTableNames'
GO
USE msdb;
GO
EXEC sp_getTableNames;
PROS:
- no need to use dynamic sql
CONS:
- requires creating objects in the “master” database, which is something I tend to avoid
- works with stored procedures only (actually, it works with other objects, such as tables and views, but you have to use the “sp_” prefix. The day I will find a view named “sp_getTableNames” in the master database it won’t be safe to stay near me)
An alternative method:
It would be really helpful if we could store the statement we want to execute inside an object that doesn’t involve dynamic sql and doesn’t need to be stored in the master database. In other words, we need a way to get the best of both worlds.
Is there such a solution? Apparently, there isn’t.
The ideal object to store a statement and reuse it later is a view, but there is no way to “execute” a view against a different database. In fact you don’t execute a view, you just select from it, which is quite a different thing.
What you “execute” when you select from a view is the statement in its definition (not really, but let me simplify).
So, what we would need to do is just read the definition from a view and use the statement against the target database. Sounds straightforward, but it’s not.
The definition of a view also contains the “CREATE VIEW” statement and stripping it off is not just as easy as it seems.
Let’s see the issue with an example: I will create a view to query the last update date of the index statistics in a database, using the query from Glenn Berry’s Diagnostic Queries.
USE TOOLS;
GO
-- When were Statistics last updated on all indexes? (Query 48)
CREATE VIEW statisticsLastUpdate
AS
SELECT
DB_NAME() AS database_name
,o.NAME AS stat_name
,i.NAME AS [Index Name]
,STATS_DATE(i.[object_id], i.index_id) AS [Statistics Date]
,s.auto_created
,s.no_recompute
,s.user_created
,st.row_count
FROM sys.objects AS o WITH (NOLOCK)
INNER JOIN sys.indexes AS i WITH (NOLOCK)
ON o.[object_id] = i.[object_id]
INNER JOIN sys.stats AS s WITH (NOLOCK)
ON i.[object_id] = s.[object_id]
AND i.index_id = s.stats_id
INNER JOIN sys.dm_db_partition_stats AS st WITH (NOLOCK)
ON o.[object_id] = st.[object_id]
AND i.[index_id] = st.[index_id]
WHERE o.[type] = 'U';
I just had to remove ORDER BY and OPTION(RECOMPILE) because query hints cannot be used in views.
Querying the object definition returns the whole definition of the view, not only the SELECT statement:
SELECT OBJECT_DEFINITION(OBJECT_ID('statisticsLastUpdate')) AS definition
definition
-------------------------------------------------------------------
-- When were Statistics last updated on all indexes? (Query 48)
CREATE VIEW statisticsLastUpdate
AS
SELECT
DB_NAME() AS database_name
,o.NAME AS stat_name
,i.NAME AS [Index Name]
,STATS_DATE(i.[object_id], i.index_id) AS [Statistics Date]
,s.auto_created
,s.no_recompute
(1 row(s) affected)
In order to extract the SELECT statement, we would need something able to parse (properly!) the view definition and we all know how complex it can be.
Fortunately, SQL Server ships with an undocumented function used in replication that can help solving the problem: its name is fn_replgetparsedddlcmd.
This function accepts some parameters, lightly documented in the code: fn_replgetparsedddlcmd (@ddlcmd, @FirstToken, @objectType, @dbname, @owner, @objname, @targetobject)
Going back to the example, we can use this function to extract the SELECT statement from the view definition:
SELECT master.sys.fn_replgetparsedddlcmd(
OBJECT_DEFINITION(OBJECT_ID('statisticsLastUpdate'))
,'CREATE'
,'VIEW'
,DB_NAME()
,'dbo'
,'statisticsLastUpdate'
,NULL
) AS statement
statement
---------------------------------------------------------------------
AS
SELECT
DB_NAME() AS database_name
,o.NAME AS stat_name
,i.NAME AS [Index Name]
,STATS_DATE(i.[object_id], i.index_id) AS [Statistics Date]
,s.auto_created
,s.no_recompute
,s.user_created
,st.row_count
(1 row(s) affected)
The text returned by the function still contains the “AS” keyword, but removing it is a no-brainer:
DECLARE @cmd nvarchar(max)
SELECT @cmd = master.sys.fn_replgetparsedddlcmd(
OBJECT_DEFINITION(OBJECT_ID('statisticsLastUpdate'))
,'CREATE'
,'VIEW'
,DB_NAME()
,'dbo'
,'statisticsLastUpdate'
,NULL
)
SELECT @cmd = RIGHT(@cmd, LEN(@cmd) - 2) -- Removes "AS"
SELECT @cmd AS statement
statement
-------------------------------------------------------------------
SELECT
DB_NAME() AS database_name
,o.NAME AS stat_name
,i.NAME AS [Index Name]
,STATS_DATE(i.[object_id], i.index_id) AS [Statistics Date]
,s.auto_created
,s.no_recompute
,s.user_created
,st.row_count
(1 row(s) affected)
Now that we are able to read the SELECT statement from a view’s definition, we can execute that statement against any database we like, or even against all the databases in the instance.
-- =============================================
-- Author: Gianluca Sartori - spaghettidba
-- Create date: 2013-04-16
-- Description: Extracts the view definition
-- and runs the statement in the
-- database specified by @db_name
-- If the target database is a pattern,
-- the statement gets executed against
-- all databases matching the pattern.
-- =============================================
CREATE PROCEDURE [dba_execute_view]
@view_name sysname
,@db_name sysname
AS
BEGIN
SET NOCOUNT,
XACT_ABORT,
QUOTED_IDENTIFIER,
ANSI_NULLS,
ANSI_PADDING,
ANSI_WARNINGS,
ARITHABORT,
CONCAT_NULL_YIELDS_NULL ON;
SET NUMERIC_ROUNDABORT OFF;
DECLARE @cmd nvarchar(max)
DECLARE @sql nvarchar(max)
DECLARE @vw_schema sysname
DECLARE @vw_name sysname
IF OBJECT_ID(@view_name) IS NULL
BEGIN
RAISERROR('No suitable object found for name %s',16,1,@view_name)
RETURN
END
IF DB_ID(@db_name) IS NULL
AND @db_name NOT IN ('[USER]','[SYSTEM]')
AND @db_name IS NOT NULL
BEGIN
RAISERROR('No suitable database found for name %s',16,1,@view_name)
RETURN
END
SELECT @vw_schema = OBJECT_SCHEMA_NAME(OBJECT_ID(@view_name)),
@vw_name = OBJECT_NAME(OBJECT_ID(@view_name))
SELECT @cmd = master.sys.fn_replgetparsedddlcmd(
OBJECT_DEFINITION(OBJECT_ID(@view_name))
,'CREATE'
,'VIEW'
,DB_NAME()
,@vw_schema
,@vw_name
,NULL
)
SELECT @cmd = RIGHT(@cmd, LEN(@cmd) - 2) -- Removes "AS"
-- CREATE A TARGET TEMP TABLE
SET @sql = N'
SELECT TOP(0) * INTO #results FROM ' + @view_name + ';
INSERT #results
EXEC [dba_ForEachDB]
@statement = @cmd,
@name_pattern = @db_name;
SELECT * FROM #results;'
EXEC sp_executesql
@sql
,N'@cmd nvarchar(max), @db_name sysname'
,@cmd
,@db_name
END
The procedure depends on dba_ForEachDB, the stored procedure I posted a couple of years ago that replaces the one shipped by Microsoft. If you still prefer their version, you’re free to modify the code as you wish.
Now that we have a stored procedure that “executes” a view, we can use it to query statistics update information from a different database:
EXEC [dba_execute_view] 'statisticsLastUpdate', 'msdb'
We could also query the same information from all user databases:
EXEC [dba_execute_view] 'statisticsLastUpdate', '[USER]'
That’s it, very easy and straightforward.
Just one suggestion for the SELECT statements in the views: add a DB_NAME() column, in order to understand where the data comes from, or it’s going to be a total mess.
Next steps:
This is just the basic idea, the code can be improved in many ways.
For instance, we could add a parameter to decide whether the results must be piped to a temporary table or not. As you probably know, INSERT…EXEC cannot be nested, so you might want to pipe the results to a table in a different way.
Another thing you might want to add is the ability to order the results according to an additional parameter.
To sum it up, with a little help from Microsoft, we can now safely create a database packed with all our administration stuff and execute the queries against any database in our instance.
Moving system databases to the default data and log paths
Recently I had to assess and tune quite a lot of SQL Server instances and one the things that are often overlooked is the location of the system databases.
I often see instance where the system databases are located in the system drives under the SQL Server default installation path, which is bad for many reasons, especially for tempdb.
I had to move the system databases so many times that I ended up coding a script to automate the process.
The script finds all system databases that are not sitting in the default data and log paths and issues the ALTER DATABASE statements needed to move the files to the default paths.
Obviously, to let the script work, the default data and log paths must have been set in the instance properties:
You may also point out that moving all system databases to the default data and log paths is not always a good idea. And you would be right: for instance, if possible, the tempdb database should be working on a fast dedicated disk. However, very often I find myself dealing with low-end servers where separate data and log disks are a luxury, not to mention a dedicated tempdb disk. If you are concerned about moving tempd to the default data and log paths, you can modify the script accordingly.
-- =============================================
-- Author: Gianluca Sartori - spaghettidba
-- Create date: 2013-03-22
-- Description: Moves the system databases to the
-- default data and log paths and
-- updates SQL Server startup params
-- accordingly.
-- =============================================
SET NOCOUNT ON;
USE master;
-- Find default data and log paths
-- reading from the registry
DECLARE @defaultDataLocation nvarchar(4000)
DECLARE @defaultLogLocation nvarchar(4000)
EXEC master.dbo.xp_instance_regread
N'HKEY_LOCAL_MACHINE',
N'Software\Microsoft\MSSQLServer\MSSQLServer',
N'DefaultData',
@defaultDataLocation OUTPUT
EXEC master.dbo.xp_instance_regread
N'HKEY_LOCAL_MACHINE',
N'Software\Microsoft\MSSQLServer\MSSQLServer',
N'DefaultLog',
@defaultLogLocation OUTPUT
-- Loop through all system databases
-- and move to the default data and log paths
DECLARE @sql nvarchar(max)
DECLARE stmts CURSOR STATIC LOCAL FORWARD_ONLY
FOR
SELECT
' ALTER DATABASE '+ DB_NAME(database_id) +
' MODIFY FILE ( ' +
' NAME = '''+ name +''', ' +
' FILENAME = '''+
CASE type_desc
WHEN 'ROWS' THEN @defaultDataLocation
ELSE @defaultLogLocation
END +
'\'+ RIGHT(physical_name,CHARINDEX('\',REVERSE(physical_name),1)-1) +'''' +
' )'
FROM sys.master_files
WHERE DB_NAME(database_id) IN ('master','model','msdb','tempdb')
AND (
physical_name NOT LIKE @defaultDataLocation + '%'
OR physical_name NOT LIKE @defaultLogLocation + '%'
)
OPEN stmts
FETCH NEXT FROM stmts INTO @sql
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT @sql
EXEC(@sql)
FETCH NEXT FROM stmts INTO @sql
END
CLOSE stmts
DEALLOCATE stmts
-- Update SQL Server startup parameters
-- to reflect the new master data and log
-- files locations
DECLARE @val nvarchar(500)
DECLARE @key nvarchar(100)
DECLARE @regvalues TABLE (
parameter nvarchar(100),
value nvarchar(500)
)
INSERT @regvalues
EXEC master.dbo.xp_instance_regenumvalues
N'HKEY_LOCAL_MACHINE',
N'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer\Parameters'
DECLARE reg CURSOR STATIC LOCAL FORWARD_ONLY
FOR
SELECT *
FROM @regvalues
WHERE value LIKE '-d%'
OR value LIKE '-l%'
OPEN reg
FETCH NEXT FROM reg INTO @key, @val
WHILE @@FETCH_STATUS = 0
BEGIN
IF @val LIKE '-d%'
SET @val = '-d' + (
SELECT physical_name
FROM sys.master_files
WHERE DB_NAME(database_id) = 'master'
AND type_desc = 'ROWS'
)
IF @val LIKE '-l%'
SET @val = '-l' + (
SELECT physical_name
FROM sys.master_files
WHERE DB_NAME(database_id) = 'master'
AND type_desc = 'LOG'
)
EXEC master.dbo.xp_instance_regwrite
N'HKEY_LOCAL_MACHINE',
N'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer\Parameters',
@key,
N'REG_SZ',
@val
FETCH NEXT FROM reg INTO @key, @val
END
CLOSE reg
DEALLOCATE reg
After running this script, you can shut down the SQL Server service and move the data and log files to the appropriate locations.
When the files are ready, you can bring SQL Server back online.
BE CAREFUL! Before running this script against a clustered instance, check what the xp_instance_regread commands return: I have seen cases with SQL Server not reading from the appropriate keys.
Data Collector Clustering Woes
During the last few days I’ve been struggling to work around something that seems to be a bug in SQL Server 2008 R2 Data Collector in a clustered environment. It’s been quite a struggle, so I decided to post my findings and my resolution, hoping I didn’t contend in vain.
SYMPTOMS:
After setting up a Utility Control Point, I started to enroll my instances to the UCP and everything was looking fine.
When an instance is enrolled to a UCP, the sysutility_mdw database can be used as a target Management Datawarehouse for non-utility collection sets, such as the built-in system collection sets. Actually, the utility database is the only possible target for any collection set, since the data collection configuration is shared between utility and non-utility collection sets.
That said, I enabled and started the system collection sets and eagerly waited for some data to show up on the built-in reports. As nothing turned up, I checked the data collector jobs and they were executing successfully, so the data had to be there, hidden somewhere.
In fact, the data had been collected and uploaded successfully, but it didn’t show up in the reports because of the way the data source had been registered in the Management Datawarehouse internal tables.
A quick look at [core].[source_info_internal] unveiled the root cause of the issue: the clustered instances had been registered with the physical name of the cluster node and not with the virtual server name of the instance.
The built-in Data Collector reports filter data in this table using the server name connected in SSMS, which is obviously very different from what found in the data sources table. For instance, when connected to the clustered instance VIRTUAL1\INST01, I didn’t see any performance data showing in the report because the data had been collected using the physical network name of the owner node (eg. PHYSICAL1\INST01).
I know it may sound confusing at this point, so keep the following picture in mind while looking at servers and instances in the example.
So, what was wrong with my setup? How could it be fixed?
I tried everything that came to my mind to no avail. In order, I tried:
- Wiping out the datawarehouse database (sysutility_mdw)
- Reinstalling the instance hosting the datawarehouse
- Changing the upload jobs (this one looked promising, because some job steps contained the $(MACH)\$(INST) tokens instead of the $(SRVR) token I would have expected, but it didn’t work either)
- Updating data in the tables directly
Nothing I tried solved the issue: every time the upload jobs ran at the enrolled instances, the wrong instance names turned up in the data sources table.
I suspect something was wrong in the Management Datawarehouse instance, since the same issue affected all the enrolled instances, no matter where they were installed. Question is I was unable to find a way to make it work.
The only thing that worked for me was forcing SQL Server to understand what my cluster setup looks like and preventing it from using cluster node names instead of virtual instance names.
As ugly as it can be, the only fix that worked was a view + INSTEAD OF TRIGGER combination.
First of all we need some tables to store the cluster layout, with nodes and instances.
-- Create a couple of tables in msdb to
-- describe the cluster topology
USE msdb;
GO
CREATE TABLE sysutility_ucp_managed_cluster_nodes (
cluster_id int NOT NULL,
node_name sysname NOT NULL PRIMARY KEY CLUSTERED
)
GO
CREATE TABLE sysutility_ucp_managed_cluster_servers (
cluster_id int NOT NULL,
virtual_server_name sysname NOT NULL PRIMARY KEY CLUSTERED,
instance_name sysname NULL,
server_name AS virtual_server_name + ISNULL('\' + NULLIF(instance_name,'MSSQLSERVER'),'')
)
GO
INSERT INTO sysutility_ucp_managed_cluster_nodes (cluster_id, node_name)
VALUES (1,'PHYSICAL1'),
(1,'PHYSICAL2'),
(2,'PHYSICAL3'),
(2,'PHYSICAL4')
GO
INSERT INTO sysutility_ucp_managed_cluster_servers (cluster_id, virtual_server_name, instance_name)
VALUES (1,'VIRTUAL1','INST01'),
(1,'VIRTUAL2','INST02'),
(2,'VIRTUAL3','INST03')
GO
GRANT SELECT ON object::sysutility_ucp_managed_cluster_nodes TO [dc_proxy];
GRANT SELECT ON object::sysutility_ucp_managed_cluster_servers TO [dc_proxy];
Then we need to update the data sources collected using the cluster node name instead of the virtual name:
USE [sysutility_mdw]
GO
UPDATE trg
SET trg.instance_name = REPLACE(trg.instance_name COLLATE database_default, nodes.node_name + '\', srv.virtual_server_name + '\')
FROM [core].[source_info_internal] AS trg
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_nodes AS nodes
ON nodes.node_name = SUBSTRING(trg.instance_name, 1, LEN(nodes.node_name)) COLLATE database_default
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_servers AS srv
ON nodes.cluster_id = srv.cluster_id
-- Now server names should be ok
SELECT *
FROM sysutility_mdw.[core].[source_info_internal]
GO
Now we will replace the data sources table with a view that multiplies virtual server names for each possible owner node. This is required because the collection sets keep trying to upload data using the cluster node name and they fail miserably when the data source is not found in the table (“the specified collection set is not valid in this data warehouse”)
USE [sysutility_mdw]
GO
-- Rename the data sources table
EXEC sp_rename 'core.source_info_internal', 'source_info_internal_ms'
USE [sysutility_mdw]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [core].[source_info_internal]
AS
SELECT sii.source_id
,sii.collection_set_uid
,instance_name = (nds.node_name + ISNULL('\' + NULLIF(srv.instance_name,'MSSQLSERVER'),'')) COLLATE Latin1_General_CI_AI
,sii.days_until_expiration
,sii.operator
FROM core.source_info_internal_ms AS sii
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_servers AS srv
ON sii.instance_name COLLATE database_default = srv.server_name
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_nodes AS nds
ON nds.cluster_id = srv.cluster_id
UNION ALL
SELECT *
FROM core.source_info_internal_ms
GO
And now the last thing we need to create is a trigger on the view, in order to control what gets written to the original table.
With this in place, we should have only “good” server names showing up in the instance_name column.
CREATE TRIGGER [core].[TR_source_info_internal_IU]
ON [core].[source_info_internal]
INSTEAD OF INSERT, UPDATE, DELETE
AS
BEGIN
SET NOCOUNT ON;
-- Update rows using the source_id
UPDATE trg
SET days_until_expiration = i.days_until_expiration
FROM core.source_info_internal_ms AS trg
INNER JOIN inserted AS i
ON trg.source_id = i.source_id
WHERE EXISTS (
SELECT 1
FROM deleted
WHERE source_id = i.source_id
)
-- Turn INSERTs into UPDATEs using the
-- cluster physical / virtual conversion
UPDATE trg
SET days_until_expiration = i.days_until_expiration
FROM core.source_info_internal_ms AS trg
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_servers AS srv
ON srv.server_name = trg.instance_name COLLATE database_default
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_nodes AS nodes
ON nodes.cluster_id = srv.cluster_id
INNER JOIN inserted AS i
ON trg.collection_set_uid = i.collection_set_uid
AND trg.operator = i.operator
AND nodes.node_name + ISNULL('\' + NULLIF(srv.instance_name,'MSSQLSERVER'),'') = i.instance_name COLLATE database_default
WHERE NOT EXISTS (
SELECT 1
FROM deleted
WHERE source_id = i.source_id
)
-- Handle proper INSERTs
;WITH newrows AS (
SELECT collection_set_uid, v_server.instance_name, days_until_expiration, operator
FROM inserted AS i
CROSS APPLY (
SELECT instance_name = COALESCE((
SELECT srv.server_name
FROM msdb.dbo.sysutility_ucp_managed_cluster_servers AS srv
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_nodes AS nodes
ON nodes.cluster_id = srv.cluster_id
WHERE srv.server_name = i.instance_name COLLATE database_default
),(
SELECT srv.server_name
FROM msdb.dbo.sysutility_ucp_managed_cluster_servers AS srv
INNER JOIN msdb.dbo.sysutility_ucp_managed_cluster_nodes AS nodes
ON nodes.cluster_id = srv.cluster_id
WHERE nodes.node_name + ISNULL('\' + NULLIF(srv.instance_name,'MSSQLSERVER'),'') = i.instance_name COLLATE database_default
), (
SELECT i.instance_name COLLATE database_default
)
) -- end coalesce
) AS v_server
WHERE NOT EXISTS (
SELECT 1
FROM deleted
WHERE source_id = i.source_id
)
)
INSERT INTO core.source_info_internal_ms (collection_set_uid, instance_name, days_until_expiration, operator)
SELECT collection_set_uid, instance_name, days_until_expiration, operator
FROM newrows
WHERE NOT EXISTS (
SELECT 1
FROM core.source_info_internal_ms
WHERE collection_set_uid = newrows.collection_set_uid
AND instance_name = newrows.instance_name
AND operator = newrows.operator
)
DELETE trg
FROM core.source_info_internal_ms AS trg
WHERE EXISTS (
SELECT 1
FROM deleted
WHERE source_id = trg.source_id
)
AND NOT EXISTS (
SELECT 1
FROM inserted
WHERE source_id = trg.source_id
)
END
Obviously I don’t have access to the source code of the Data Collector, but I suspect it uses a component which is not cluster-aware (dcexec.exe) and for some reason it ends up using the wrong IP address to communicate with the management datawarehouse. I have nothing to support my supposition: it only seems reasonable to me and it resembles something I experienced in the past with non-cluster-aware services, such as the SQL Browser.
Is this a bug? Definitely!
Should I report it on Connect? Maybe: the results with past items are so discouraging that I don’t think I’ll bother taking the time to file it. It must be said that reproducing this error is not easy: any other failover cluster instance I have laying around was not affected by this issue, so I guess it’s an edge case. Nevertheless, worth fixing.
dba_runCHECKDB v2(012)
If you are one among the many that downloaded my consistency check stored procedure called “dba_RunCHECKDB”, you may have noticed a “small” glitch… it doesn’t work on SQL Server 2012!
This is due to the resultset definition of DBCC CHECKDB, which has changed again in SQL Server 2012. Trying to pipe the results of that command in the table definition for SQL Server 2008 produces a column mismatch and it obviously fails.
Fixing the code is very easy indeed, but I could never find the time to post the corrected version until today.
Also, I had to discover the new table definition for DBCC CHECKDB, and it was not just as easy as it used to be in SQL Server 2008. In fact, a couple of days ago I posted a way to discover the new resultset definition working around the cumbersome metadata discovery feature introduced in SQL Server 2012.
Basically, the new output of DBCC CHECKDB now includes 6 new columns:
CREATE TABLE ##DBCC_OUTPUT(
Error int NULL,
[Level] int NULL,
State int NULL,
MessageText nvarchar(2048) NULL,
RepairLevel nvarchar(22) NULL,
Status int NULL,
DbId int NULL, -- was smallint in SQL2005
DbFragId int NULL, -- new in SQL2012
ObjectId int NULL,
IndexId int NULL,
PartitionId bigint NULL,
AllocUnitId bigint NULL,
RidDbId smallint NULL, -- new in SQL2012
RidPruId smallint NULL, -- new in SQL2012
[File] smallint NULL,
Page int NULL,
Slot int NULL,
RefDbId smallint NULL, -- new in SQL2012
RefPruId smallint NULL, -- new in SQL2012
RefFile smallint NULL, -- new in SQL2012
RefPage int NULL,
RefSlot int NULL,
Allocation smallint NULL
)
If you Google the name of one of these new columns, you will probably find a lot of blog posts (no official documentation, unfortunately) that describes the new output of DBCC CHECKDB, but none of them is strictly correct: all of them indicate the smallint columns as int.
Not a big deal, actually, but still incorrect.
I will refrain from posting the whole procedure here: I updated the code in the original post, that you can find clicking here. You can also download the code from the Code Repository.
As usual, suggestions and comments are welcome.
Discovering resultset definition of DBCC commands in SQL Server 2012
Back in 2011 I showed a method to discover the resultset definition of DBCC undocumented commands.
At the time, SQL Server 2012 had not been released yet and nothing suggested that the linked server trick could stop working on the new major version. Surprisingly enough it did.
If you try to run the same code showed in that old post on a 2012 instance, you will get a quite explicit error message:
DECLARE @srv nvarchar(4000);
SET @srv = @@SERVERNAME; -- gather this server name
-- Create the linked server
EXEC master.dbo.sp_addlinkedserver
@server = N'LOOPBACK',
@srvproduct = N'SQLServ', -- it’s not a typo: it can’t be “SQLServer”
@provider = N'SQLNCLI', -- change to SQLOLEDB for SQLServer 2000
@datasrc = @srv;
-- Set the authentication to "current security context"
EXEC master.dbo.sp_addlinkedsrvlogin
@rmtsrvname = N'LOOPBACK',
@useself = N'True',
@locallogin = NULL,
@rmtuser = NULL,
@rmtpassword = NULL;
USE tempdb;
GO
CREATE PROCEDURE loginfo
AS
BEGIN
SET NOCOUNT ON;
DBCC LOGINFO();
END
GO
SELECT *
INTO tempdb.dbo.loginfo_output
FROM OPENQUERY(LOOPBACK, 'SET FMTONLY OFF; EXEC tempdb.dbo.loginfo');
DROP PROCEDURE loginfo;
GO
Msg 11528, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1 The metadata could not be determined because statement 'DBCC LOGINFO();' in procedure 'loginfo' does not support metadata discovery.
This behaviour has to do with the way SQL Server 2012 tries to discover metadata at parse/bind time, when the resultset is not available yet for DBCC commands.
Fortunately, there is still a way to discover metadata when you have a SQL Server instance of a previous version available.
On my laptop I have a 2008R2 instance I can use to query the 2012 instance with a linked server:
-- run this on the 2008R2 instance
USE [master]
GO
EXEC master.dbo.sp_addlinkedserver
@server = N'LOCALHOST\SQL2012'
,@srvproduct = N'SQL Server'
EXEC master.dbo.sp_addlinkedsrvlogin
@rmtsrvname = N'LOCALHOST\SQL2012'
,@useself = N'True'
,@locallogin = NULL
,@rmtuser = NULL
,@rmtpassword = NULL
GO
SELECT *
INTO tempdb.dbo.loginfo_output
FROM OPENQUERY([LOCALHOST\SQL2012], 'SET FMTONLY OFF; EXEC tempdb.dbo.loginfo');
GO
This code pipes the results of the DBCC command into a table in the tempdb database in my 2008R2 instance. The table can now be scripted using SSMS:

Using the 2008R2 instance as a “Trojan Horse” for the metadata discovery, you can see that the resultset definition of DBCC LOGINFO() has changed again in SQL Server 2012:
CREATE TABLE [dbo].[loginfo_output]( [RecoveryUnitId] [int] NULL, -- new in SQL2012 [FileId] [int] NULL, [FileSize] [bigint] NULL, [StartOffset] [bigint] NULL, [FSeqNo] [int] NULL, [Status] [int] NULL, [Parity] [tinyint] NULL, [CreateLSN] [numeric](25, 0) NULL )
This trick will be particularly useful for an upcoming (and long overdue) post, so… stay tuned!
Manual Log Shipping with PowerShell
Recently I had to implement log shipping as a HA strategy for a set of databases which were originally running under the simple recovery model.
Actually, the databases were subscribers for a merge publication, which leaves database mirroring out of the possible HA options. Clustering was not an option either, due to lack of shared storage at the subscribers.
After turning all databases to full recovery model and setting up log shipping, I started to wonder if there was a better way to implement it. Log shipping provides lots of flexibility, which I didn’t need: I just had to ship the transaction log from the primary to a single secondary and have transaction logs restored immediately. Preserving transaction log backups was not needed, because the secondary database was considered a sufficient backup in this case.
Another thing that I observed was the insane amount of memory consumed by SQLLogShip.exe (over 300 MB), which ended up even failing due to OutOfMemoryException at times.
After reading Edwin Sarmiento‘s fine chapter on SQL Server MVP Deep Dives ”The poor man’s SQL Server log shipping”, some ideas started to flow.
First of all I needed a table to hold the configuration for my manual log shipping:
-- =============================================
-- Author: Gianluca Sartori - @spaghettidba
-- Create date: 2013-02-07
-- Description: Creates a table to hold manual
-- log shipping configuration
-- =============================================
CREATE TABLE msdb.dbo.ManualLogShippingConfig (
secondary sysname PRIMARY KEY CLUSTERED, -- Name of the secondary SQL Server instance
sharedBackupFolder varchar(255), -- UNC path to the backup path on the secondary
remoteBackupFolder varchar(255) -- Path to the backup folder on the secondary
-- It's the same path as sharedBackupFolder,
-- as seen from the secondary server
)
GO
INSERT INTO msdb.dbo.ManualLogShippingConfig (
secondary,
sharedBackupFolder,
remoteBackupFolder
)
VALUES (
'SomeServer',
'\\SomeShare',
'Local path of SomeShare on secondary'
)
GO
And then I just needed a PowerShell script to do the heavy lifting.
I think the code is commented and readable enough to show what happens behind the scenes.
## =============================================
## Author: Gianluca Sartori - @spaghettidba
## Create date: 2013-02-07
## Description: Ships the log to a secondary server
## =============================================
$ErrorActionPreference = "Stop"
$primary = "$(ESCAPE_DQUOTE(SRVR))"
#
# Read Configuration from the table in msdb
#
$SQL_Config = @"
SELECT * FROM msdb.dbo.ManualLogShippingConfig
"@
$info = Invoke-sqlcmd -Query $SQL_Config -ServerInstance $primary
$secondary = $info.secondary
$sharedFolder = $info.sharedBackupFolder
$remoteSharedFolder = $info.remoteBackupFolder
$ts = Get-Date -Format yyyyMMddHHmmss
#
# Read default backup path of the primary from the registry
#
$SQL_BackupDirectory = @"
EXEC master.dbo.xp_instance_regread
N'HKEY_LOCAL_MACHINE',
N'Software\Microsoft\MSSQLServer\MSSQLServer',
N'BackupDirectory'
"@
$info = Invoke-sqlcmd -Query $SQL_BackupDirectory -ServerInstance $primary
$BackupDirectory = $info.Data
#
# Ship the log of all databases in FULL recovery model
# You can change this to ship a single database's log
#
$SQL_FullRecoveryDatabases = @"
SELECT name
FROM master.sys.databases
WHERE recovery_model_desc = 'FULL'
"@
$info = Invoke-sqlcmd -Query $SQL_FullRecoveryDatabases -ServerInstance $primary
$info | ForEach-Object {
$DatabaseName = $_.Name
Write-Output "Processing database $DatabaseName"
$BackupFile = $DatabaseName + "_" + $ts + ".trn"
$BackupPath = Join-Path $BackupDirectory $BackupFile
$RemoteBackupPath = Join-Path $remoteSharedFolder $BackupFile
$SQL_BackupDatabase = "BACKUP LOG $DatabaseName TO DISK='$BackupPath' WITH INIT;"
$SQL_NonCopiedBackups = "
SELECT physical_device_name
FROM msdb.dbo.backupset AS BS
INNER JOIN msdb.dbo.backupmediaset AS BMS
ON BS.media_set_id = BMS.media_set_id
INNER JOIN msdb.dbo.backupmediafamily AS BMF
ON BMS.media_set_id = BMF.media_set_id
WHERE BS.database_name = '$DatabaseName'
AND BS.type = 'L'
AND expiration_date IS NULL
ORDER BY BS.backup_start_date
"
#
# Backup log to local path
#
Invoke-Sqlcmd -Query $SQL_BackupDatabase -ServerInstance $primary -QueryTimeout 65535
Write-Output "LOG backed up to $BackupPath"
#
# Query noncopied backups...
#
$nonCopiedBackups = Invoke-Sqlcmd -Query $SQL_NonCopiedBackups -ServerInstance $primary
$nonCopiedBackups | ForEach-Object {
$BackupPath = $_.physical_device_name
$BackupFile = Split-Path $BackupPath -Leaf
$RemoteBackupPath = Join-Path $sharedFolder $BackupFile
$SQL_RestoreDatabase = "
RESTORE LOG $DatabaseName
FROM DISK='$RemoteBackupPath'
WITH NORECOVERY;
"
$SQL_ExpireBackupSet = "
UPDATE BS
SET expiration_date = GETDATE()
FROM msdb.dbo.backupset AS BS
INNER JOIN msdb.dbo.backupmediaset AS BMS
ON BS.media_set_id = BMS.media_set_id
INNER JOIN msdb.dbo.backupmediafamily AS BMF
ON BMS.media_set_id = BMF.media_set_id
WHERE BS.database_name = '$DatabaseName'
AND BS.type = 'L'
AND physical_device_name = '$BackupPath'
"
#
# Move the transaction log backup to the secondary
#
Write-Output "Moving $BackupPath to $sharedFolder"
if (Test-Path $BackupPath) {
Move-Item $BackupPath $sharedFolder
}
#
# Restore the backup on the secondary
#
Invoke-Sqlcmd -Query $SQL_RestoreDatabase -ServerInstance $secondary -QueryTimeout 65535
Write-Output "Restored LOG from $RemoteBackupPath"
#
# Delete the backup file
#
Write-Output "Deleting $RemoteBackupPath"
Remove-Item $RemoteBackupPath -ErrorAction SilentlyContinue
#
# Mark the backup as expired
#
Write-Output "Expiring backup set $BackupPath"
Invoke-Sqlcmd -Query $SQL_ExpireBackupSet -ServerInstance $primary
}
}
The script can be used in a SQLAgent PowerShell job step and it’s all you need to start shipping your transaction logs.
Obviously, you need to take a full backup on the primary server and restore it to the secondary WITH NORECOVERY.
Once you’re ready, you can schedule the job to ship the transaction logs.
Do you need sysadmin rights to backup a database?
Looks like a silly question, doesn’t it? - Well, you would be surprised to know it’s not.
Obviously, you don’t need to be a sysadmin to simply issue a BACKUP statement. If you look up the BACKUP statement on BOL you’ll see in the “Security” section that
BACKUP DATABASE and BACKUP LOG permissions default to members of the sysadmin fixed server role and the db_owner and db_backupoperator fixed database roles.
But there’s more to it than just permissions on the database itself: in order to complete successfully, the backup device must be accessible:
[...] SQL Server must be able to read and write to the device; the account under which the SQL Server service runs must have write permissions. [...]
While this statement sound sensible or even obvious when talking about file system devices, with other types of device it’s less obvious what “permissions” means. With other types of device I mean tapes and Virtual Backup Devices. Since probably nobody uses tapes directly anymore, basically I’m referring to Virtual Backup Devices.
VDI (Virtual Backup device Interface) is the standard API intended for use by third-party backup software vendors to perform backup operations. Basically, it allows an application to act as a storage device.
The VDI specification is available here (you just need the vbackup.chm help file contained in the self-extracting archive).
If you download and browse the documentation, under the “Security” topic, you will find a worrying statement:
The server connection for SQL Server that is used to issue the BACKUP or RESTORE commands must be logged in with the sysadmin fixed server role.
Wait, … what???!?!??!! Sysadmin???????
Sad but true, sysadmin is the only way to let an application take backups using the VDI API. There is no individual permission you can grant: it’s sysadmin or nothing.
Since most third-party backup sofwares rely on the VDI API, this looks like a serious security flaw: every SQL Server instance around the world that uses third-party backup utilities has a special sysadmin login used by the backup tool, or, even worse, the tool runs under the sa login.
In my opinion, this is an unacceptable limitation and I would like to see a better implementation in a future version, so I filed a suggestion on Connect.
If you agree with me, feel free to upvote and comment it.
SQL Server and Custom Date Formats
Today SQL Server Central is featuring my article Dealing with custom date formats in T-SQL.
There’s a lot of code on that page and I thought that making it available for download would make it easier to play with.
You can download the code from this page or from the Code Repository.
- CustomDateFormat.cs
- formatDate_Islands_iTVF.sql
- formatDate_Recursive_iTVF.sql
- formatDate_scalarUDF.sql
- parseDate_Islands_iTVF.sql
I hope you enjoy reading the article as much as I enjoyed writing it.
How to Eat a SQL Elephant in 10 Bites
Sometimes, when you have to optimize a poor performing query, you may find yourself staring at a huge statement, wondering where to start.
Some developers think that a single elephant statement is better than multiple small statements, but this is not always the case.
Let’s try to look from the perspective of software quality:
- Efficiency
The optimizer will likely come up with a suboptimal plan, giving up early on optimizations and transformations. - Reliability
Any slight change in statistics could lead the optimizer to produce a different and less efficient plan. - Maintainability
A single huge statement is less readable and maintainable than multiple small statements.
With those points in mind, the only sensible thing to do is cut the elephant into smaller pieces and eat them one at a time.
This is how I do it:
- Lay out the original code and read the statement carefully
- Decide whether a full rewrite is more convenient
- Set up a test environment
- Identify the query parts
- Identify the main tables
- Identify non correlated subqueries and UNIONs
- Identify correlated subqueries
- Write a query outline
- Break the statement into parts with CTEs, views, functions and temporary tables
- Merge redundant subqueries
- Put it all together
- Verify the output based on multiple different input values
- Comment your work thoroughly
1. Lay out the original code and read the statement carefully
Use one of the many SQL formatters you can find online. My favorite one is Tao Klerk’s Poor Man’s T-SQL Formatter: it’s very easy to use and configure and it comes with a handy SSMS add-in and plugins for Notepad++ and WinMerge. Moreover, it’s free and open source. A must-have.
Once your code is readable, don’t rush to the keyboard: take your time and read it carefully.
- Do you understand (more or less) what it is supposed to do?
- Do you think you could have coded it yourself?
- Do you know all the T-SQL constructs it contains?
If you answered “yes” to all the above, you’re ready to go to the next step.
2. Decide whether a full rewrite is more convenient
OK, that code sucks and you have to do something. It’s time to make a decision:
- Take the business rules behind the statement and rewrite it from scratch
When the statement is too complicated and unreadable, it might be less time-consuming to throw the old statement away and write your own version.
Usually it is quite easy when you know exactly what the code is supposed to do. Just make sure you’re not making wrong assumptions and be prepared to compare your query with the original one many times. - Refactor the statement
When the business rules are unclear (or unknown) starting from scratch is not an option. No, don’t laugh! The business logic may have been buried in the sands of time or simply you may be working on a query without any will to understand the business processes behind it.
Bring a big knife: you’re going to cut the elephant in pieces. - Leave the statement unchanged
Sometimes the statement is too big or too complicated to bother taking the time to rewrite it. For instance, this query would take months to rewrite manually.
It works? Great: leave it alone.
3. Set up a test environment
It doesn’t matter how you decide to do it: at the end of the day you will have to compare the results of your rewritten query with the results of the “elephant” and make sure you did not introduce errors in your code.
The best way to do this is to prepare a script that compares the results of the original query with the results of your rewritten version. This is the script I am using (you will find it in the code repository, as usual).
-- =============================================
-- Author: Gianluca Sartori - spaghettidba
-- Create date: 2012-03-14
-- Description: Runs two T-SQL statements and
-- compares the results
-- =============================================
-- Drop temporary tables
IF OBJECT_ID('tempdb..#original') IS NOT NULL
DROP TABLE #original;
IF OBJECT_ID('tempdb..#rewritten') IS NOT NULL
DROP TABLE #rewritten;
-- Store the results of the original
-- query into a temporary table
WITH original AS (
<original, text, >
)
SELECT *
INTO #original
FROM original;
-- Add a sort column
ALTER TABLE #original ADD [______sortcolumn] int identity(1,1);
-- Store the results of the rewritten
-- query into a temporary table
WITH rewritten AS (
<rewritten, text, >
)
SELECT *
INTO #rewritten
FROM rewritten;
-- Add a sort column
ALTER TABLE #rewritten ADD [______sortcolumn] int identity(1,1);
-- Compare the results
SELECT 'original' AS source, *
FROM (
SELECT *
FROM #original
EXCEPT
SELECT *
FROM #rewritten
) AS A
UNION ALL
SELECT 'rewritten' AS source, *
FROM (
SELECT *
FROM #rewritten
EXCEPT
SELECT *
FROM #original
) AS B;
The script is a SSMS query template that takes the results of the original and the rewritten query and compares the resultsets, returning all the missing or different rows. The script uses two CTEs to wrap the two queries: this means that the ORDER BY predicate (if any) will have to be moved outside the CTE.
Also, the results of the two queries are piped to temporary tables, which means that you can’t have duplicate column names in the result set.
Another thing worth noting is that the statements to compare cannot be stored procedures. One simple way to overcome this limitation is to use the technique I described in this post.
The queries inside the CTEs should then be rewritten as:
SELECT * FROM OPENQUERY(LOOPBACK,'<original, text,>')
Obviously, all the quotes must be doubled, which is the reason why I didn’t set up the script this way in the first place. It’s annoying, but it’s the only way I know of to pipe the output of a stored procedure into a temporary table without knowing the resultset definition in advance. If you can do better, suggestions are always welcome.
4. Identify the query parts
OK, now you have everything ready and you can start eating the elephant. The first thing to do is to identify all the autonomous blocks in the query and give them a name. You can do this at any granularity and repeat the task as many times as you like: the important thing is that at the end of this process you have a list of query parts and a name for each part.
Identify the main tables
Usually I like the idea that the data comes from one “main” table and all the rest comes from correlated tables. For instance, if I have to return a resultset containing some columns from the “SalesOrderHeader” table and some columns from the “SalesOrderDetail” table, I consider SalesOrderHeader the main table and SalesOrderHeader a correlated table. It fits well with my mindset, but you are free to see things the way you prefer.
Probably these tables are already identified by an alias: note down the aliases and move on.
Identify non correlated subqueries and UNIONs
Non-correlated subqueries are considered as inline views. Often these subqueries are joined to the main tables to enrich the resultset with additional columns.
Don’t be scared away by huge subqueries: you can always repeat all the steps for any single subquery and rewrite it to be more compact and readable.
Again, just note down the aliases and move to the next step.
Identify correlated subqueries
Correlated subqueries are not different from non-correlated subqueries, with the exception that you will have less freedom to move them from their current position in the query. However, that difference doesn’t matter for the moment: give them a name and note it down.
5. Write a query outline
Use the names you identified in the previous step and write a query outline. It won’t execute, but it gives you the big picture.
If you really want the big picture, print the query. It may seem crazy, but sometimes I find it useful to be able to see the query as a whole, with all the parts with their names highlighted in different colors.
Yes, that’s a single SELECT statement, printed in Courier new 8 pt. on 9 letter sheets, hanging on the wall in my office.
6. Break the statement in parts with CTEs, views, functions and temporary tables
SQL Server offers a fair amount of tools that allow breaking a single statement into parts:
- Common Table Expressions
- Subqueries
- Views
- Inline Table Valued Functions
- Multi-Statement Table Valued Functions
- Stored procedures
- Temporary Tables
- Table Variables
Ideally, you will choose the one that performs best in your scenario, but you could also take usability and modularity into account.
CTEs and subqueries are a good choice when the statement they contain is not used elsewhere and there is no need to reuse that code.
Table Valued functions and views, on the contrary, are most suitable when there is an actual need to incapsulate the code in modules to be reused in multiple places.
Generally speaking, you will use temporary tables or table variables when the subquery gets used more than once in the statement, thus reducing the load.
Though I would really like to go into deeper details on the performance pros and cons of each construct, that would take an insane amount of time and space. You can find a number of articles and blogs on those topics and I will refrain from echoing them here.
7. Merge redundant subqueries
Some parts of your query may be redundant and you may have the opportunity to merge those parts. The merged query will be more compact and will likely perform significantly better.
For instance, you could have multiple subqueries that perform aggregate calculations on the same row set:
SELECT ProductID
,Name
,AverageSellOutPrice = (
SELECT AVG(UnitPrice)
FROM Sales.SalesOrderDetail
WHERE ProductID = PR.ProductID
)
,MinimumSellOutPrice = (
SELECT MIN(UnitPrice)
FROM Sales.SalesOrderDetail
WHERE ProductID = PR.ProductID
)
,MaximumSellOutPrice = (
SELECT MAX(UnitPrice)
FROM Sales.SalesOrderDetail
WHERE ProductID = PR.ProductID
)
FROM Production.Product AS PR;
The above query can be rewritten easily to avoid hitting the SalesOrderDetail table multiple times:
SELECT ProductID
,Name
,AverageSellOutPrice
,MinimumSellOutPrice
,MaximumSellOutPrice
FROM Production.Product AS PR
CROSS APPLY (
SELECT AVG(UnitPrice), MIN(UnitPrice), MAX(UnitPrice)
FROM Sales.SalesOrderDetail
WHERE ProductID = PR.ProductID
) AS SellOuPrices (AverageSellOutPrice, MinimumSellOutPrice, MaximumSellOutPrice);
Another typical situation where you can merge some parts is when multiple subqueries perform counts on slightly different row sets:
SELECT ProductID
,Name
,OnlineOrders = (
SELECT COUNT(*)
FROM Sales.SalesOrderHeader AS SOH
WHERE SOH.OnlineOrderFlag = 1
AND EXISTS (
SELECT *
FROM Sales.SalesOrderDetail
WHERE SalesOrderID = SOH.SalesOrderID
AND ProductID = PR.ProductID
)
)
,OfflineOrders = (
SELECT COUNT(*)
FROM Sales.SalesOrderHeader AS SOH
WHERE SOH.OnlineOrderFlag = 0
AND EXISTS (
SELECT *
FROM Sales.SalesOrderDetail
WHERE SalesOrderID = SOH.SalesOrderID
AND ProductID = PR.ProductID
)
)
FROM Production.Product AS PR;
The only difference between the two subqueries is the predicate on SOH.OnlineOrderFlag. The two queries can be merged introducing a CASE expression in the aggregate:
SELECT ProductID
,Name
,ISNULL(OnlineOrders,0) AS OnlineOrders
,ISNULL(OfflineOrders,0) AS OfflineOrders
FROM Production.Product AS PR
CROSS APPLY (
SELECT SUM(CASE WHEN SOH.OnlineOrderFlag = 1 THEN 1 ELSE 0 END),
SUM(CASE WHEN SOH.OnlineOrderFlag = 0 THEN 1 ELSE 0 END)
FROM Sales.SalesOrderHeader AS SOH
WHERE EXISTS (
SELECT *
FROM Sales.SalesOrderDetail
WHERE SalesOrderID = SOH.SalesOrderID
AND ProductID = PR.ProductID
)
) AS Orderscount (OnlineOrders, OfflineOrders);
There are infinite possibilities and enumerating them all would be far beyond the scope of this post. This is one of the topics that my students often find hard to understand and I realize that it really takes some experience to identify merge opportunities and implement them.
8. Put it all together
Remember the query outline you wrote previously? It’s time to put it into action.
Some of the identifiers may have gone away in the merge process, some others are still there and have been transformed into different SQL constructs, such as CTEs, iTVFs or temporary tables.
9. Verify the output based on multiple different input values
Now it’s time to see if your new query works exactly like the original one. You already have a script for that: you can go on and use it.
Remember that the test can be considered meaningful only if you repeat it a reasonably large number of times, with different parameters. Some queries could appear to be identical, but still be semantically different. Make sure the rewritten version handles NULLs and out-of-range parameters in the same way.
10.Comment your work thoroughly
If you don’t comment your work, somebody will find it even more difficult to maintain than the elephant you found when you started.
Comments are for free and don’t affect the query performance in any way. Don’t add comments that mimic what the query does, instead, write a meaningful description of the output of the query.
For instance, given a code fragment like this:
SELECT SalesOrderID, OrderDate, ProductID
INTO #orders
FROM Sales.SalesOrderHeader AS H
INNER JOIN Sales.SalesOrderDetail AS D
ON H.SalesOrderID = D.SalesOrderID
WHERE OrderDate BETWEEN @StartDate AND @EndDate
a comment like “joins OrderHeader to OrderDetail” adds nothing to the clarity of the code. A comment like “Selects the orders placed between the @StartDate and @EndDate and saves the results in a temporary table for later use” would be a much better choice.
Elephant eaten. (Burp!)
After all, it was not too big, was it?
My first year in blogging
WordPress.com prepared a 2011 annual report for this blog and I want to share it with you.
Here’s an excerpt:
A San Francisco cable car holds 60 people. This blog was viewed about 2,900 times in 2011. If it were a cable car, it would take about 48 trips to carry that many people.
Click here to see the complete report.
This is my first year as a blogger and I’m very happy with it. I didn’t post very often (25 posts means that I could put together only 2 posts per month), but it was challenging and entertaining at the same time.
Blogging takes a hell lot of time. Moreover, my blog is written in English, which is a foreign language for me. Blogging on technical topics requires checking the code you post thoroughly. Finding something new to blog about is not always easy and I want to avoid writing about topics that are already covered in thousands of articles and blogs.
Posting on this blog is difficult and time consuming, but I enjoy it a lot. I hope I will be able to keep writing at this pace (or a better one) in 2012.
Just a few words on WordPress: it’s a great blogging platform, hosted in an outstanding manner at wordpress.com. If you want to start a blog, choosing the right platform today is a no-brainer. Thanks to all the folks at wordpress.com for the great work.






















