Blog Archives
Uniquifiers: all rows or the duplicate keys only?
Some days ago I was talking with my friend Davide Mauri about the uniquifier that SQL Server adds to clustered indexes when they are not declared as UNIQUE.
We were not completely sure whether this behaviour applied to duplicate keys only or to all keys, even when unique.
The best way to discover the truth is a script to test what happens behind the scenes:
-- ============================================= -- Author: Gianluca Sartori - @spaghettidba -- Create date: 2014-03-15 -- Description: Checks whether the UNIQUIFIER column -- is added to a column only on -- duplicate clustering keys or all -- keys, regardless of uniqueness -- ============================================= USE tempdb GO IF OBJECT_ID('sizeOfMyTable') IS NOT NULL DROP VIEW sizeOfMyTable; GO -- Create a view to query table size information -- Not very elegant, but saves a lot of typing CREATE VIEW sizeOfMyTable AS SELECT OBJECT_NAME(si.object_id) AS table_name, si.name AS index_name, SUM(total_pages) AS total_pages, SUM(used_pages) AS used_pages, SUM(data_pages) AS data_pages FROM sys.partitions AS p INNER JOIN sys.allocation_units AS AU ON P.hobt_id = AU.container_id INNER JOIN sys.indexes AS si ON si.index_id = p.index_id AND si.object_id = p.object_id WHERE si.object_id = OBJECT_ID('#testUniquifier') GROUP BY OBJECT_NAME(si.object_id), si.name GO IF OBJECT_ID('#testUniquifier') IS NOT NULL DROP TABLE #testUniquifier; -- Create a test table CREATE TABLE #testUniquifier ( i int NOT NULL ) -- Results table: will receive table size -- in different scenarios DECLARE @results TABLE( description varchar(500), table_name sysname, index_name sysname, total_pages int, used_pages int, data_pages int ); -- INSERTS 100K UNIQUE VALUES INSERT INTO #testUniquifier SELECT TOP(100000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM sys.all_columns AS AC CROSS JOIN sys.all_columns AS AC1; -- ----------------------------------------------------------------- -- TEST1: CREATES A UNIQUE CLUSTERED INDEX (NO UNIQUIFIER) -- ----------------------------------------------------------------- CREATE UNIQUE CLUSTERED INDEX UK_test ON #testUniquifier(i); INSERT @results SELECT 'Unique clustered index' AS description, * FROM sizeOfMyTable; DROP INDEX UK_test ON #testUniquifier -- ----------------------------------------------------------------- -- TEST2: CREATES A NON-UNIQUE CLUSTERED INDEX -- NO DUPLICATES ARE PRESENT YET -- ----------------------------------------------------------------- CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i) INSERT @results SELECT 'Non-Unique clustered index, no duplicates' AS description, * FROM sizeOfMyTable DROP INDEX IX_test ON #testUniquifier -- ----------------------------------------------------------------- -- TEST3: CREATES A NON-UNIQUE CLUSTERED INDEX -- 10000 DUPLICATE VALUES ARE PRESENT -- ----------------------------------------------------------------- UPDATE TOP(10000) #testUniquifier SET i = 1 CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i) INSERT @results SELECT 'Non-Unique clustered index, some duplicates' AS description, * FROM sizeOfMyTable DROP INDEX IX_test ON #testUniquifier -- ----------------------------------------------------------------- -- TEST4: CREATES A NON-UNIQUE CLUSTERED INDEX -- ALL ROWS CONTAIN THE SAME VALUE (1) -- ----------------------------------------------------------------- UPDATE #testUniquifier SET i = 1 CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i) INSERT @results SELECT 'Non-Unique clustered index, all duplicates' AS description, * FROM sizeOfMyTable -- ----------------------------------------------------------------- -- Display results -- ----------------------------------------------------------------- SELECT * FROM @results;
As you can see, the uniquifier is added only to the keys that are duplicated:
Another way to discover the same results would be looking at the output of DBCC PAGE().
Looking at the text output of DBCC PAGE, uniquifiers are displayed as 0 (zero) when the values are not set, but the values are actually missing from the page.
This becomes even clearer when using DBCC PAGE WITH TABLERESULTS:
IF OBJECT_ID('tempdb..#formatteddata') IS NOT NULL DROP TABLE #formatteddata; SELECT *, ROWNUM = ROW_NUMBER() OVER (ORDER BY page_id, slot_id) INTO #formatteddata FROM #testUniquifier CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%); IF OBJECT_ID('tempdb..#dbccpage') IS NOT NULL DROP TABLE #dbccpage; CREATE TABLE #dbccpage ( page_id int, ParentObject varchar(128), Object varchar(128), Field varchar(128), value varchar(4000), Slot AS SUBSTRING(Object, NULLIF(CHARINDEX('Slot ',Object,1),0) + 5, ISNULL(NULLIF(CHARINDEX(' ',Object,6),0),0) - 5) ) DECLARE @current_page_id int; DECLARE pages CURSOR STATIC LOCAL FORWARD_ONLY READ_ONLY FOR SELECT DISTINCT page_id FROM #formatteddata OPEN pages FETCH NEXT FROM pages INTO @current_page_id WHILE @@FETCH_STATUS = 0 BEGIN INSERT INTO #dbccpage (ParentObject, Object, Field, value) EXEC sp_executesql N'DBCC PAGE (2, 1, @pageid, 3) WITH TABLERESULTS;', N'@pageid int', @current_page_id UPDATE #dbccpage SET page_id = @current_page_id WHERE page_id IS NULL FETCH NEXT FROM pages INTO @current_page_id END CLOSE pages; DEALLOCATE pages; WITH PageData AS ( SELECT page_id, slot, field, value FROM #dbccpage WHERE field IN ('i', 'UNIQUIFIER') ), Uniquifiers AS ( SELECT * FROM PageData PIVOT (MAX(value) FOR field IN ([i], [UNIQUIFIER])) AS pvt ), sourceData AS ( SELECT * FROM #formatteddata ) SELECT src.ROWNUM, src.i, src.page_id, src.slot_id, UNIQUIFIER FROM sourceData AS src LEFT JOIN Uniquifiers AS unq ON src.slot_id = unq.slot AND src.page_id = unq.page_id ORDER BY ROWNUM;
If you run the code in the different situations outlined before (unique clustered index, non-unique clustered index with or without duplicate keys) you will find the uniquifiers associated with each duplicate key and you will also notice that no uniquifier is generated for the keys that are unique.
SQL2014: Defining non-unique indexes in the CREATE TABLE statement
Now that my SQL Server 2014 CTP1 virtual machine is ready, I started to play with it and some new features and differences with the previous versions are starting to appear.
What I want to write about today is a T-SQL enhancement to DDL statements that brings in some new interesting considerations.
SQL Server 2014 now supports a new T-SQL syntax that allows defining an index in the CREATE TABLE statement without having to issue separate CREATE INDEX statements.
Up to now, the same could be achieved only with PRIMARY KEY and UNIQUE constraints, thus allowing UNIQUE indexes only.
For instance, the following statement creates a table with a unique clustered index on order_id and a unique nonclustered index on PO_number:
CREATE TABLE #orders ( order_id uniqueidentifier NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID() ,PO_number varchar(50) NOT NULL UNIQUE ,order_date datetime NOT NULL ,total_amount decimal(18,3) )
OK, but what if I want to add a non-unique index to my table?
SQL Server 2014 offers a new syntax to do that inline with the table DDL:
CREATE TABLE #orders ( order_id uniqueidentifier NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID() ,PO_number varchar(50) NOT NULL UNIQUE -- creates a nonclustered index on order_date ,order_date datetime NOT NULL INDEX IX_order_date ,total_amount decimal(18,3) )
A similar syntax can be used to create a compound index:
CREATE TABLE #orders ( order_id uniqueidentifier NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID() ,PO_number varchar(50) NOT NULL UNIQUE ,order_date datetime NOT NULL INDEX IX_order_date ,total_amount decimal(18,3) -- creates a compound index on PO_number and order_date ,INDEX IX_orders_compound(PO_number, order_date) )
An interesting aspect of this new syntax is that it allows creating non-unique nonclustered indexes to table variables, which is something that couldn’t be done in the previous versions.
The syntax to use is the same as for permanent tables:
DECLARE @orders TABLE ( order_id uniqueidentifier NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID() ,PO_number varchar(50) NOT NULL UNIQUE ,order_date datetime NOT NULL INDEX IX_order_date ,total_amount decimal(18,3) )
Cool! But, wait: does this mean that table variables will now behave in the same way permanent tables do?
Not exactly.
Table variables don’t have statistics, and being able to create indexes on them won’t change anything in this regard.
Do you want a proof? OK, the skeptics can run the following code. Please make sure you capture the actual execution plan.
SET NOCOUNT ON; -- create the table variable DECLARE @orders TABLE ( order_id uniqueidentifier NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID() ,PO_number varchar(50) NOT NULL UNIQUE ,order_date datetime NOT NULL INDEX IX_order_date ,total_amount decimal(18,3) ) -- insert some data INSERT INTO @orders (order_date, PO_number, total_amount) SELECT order_date = DATEADD(second, CHECKSUM(NEWID()), GETDATE()) ,PO_number = CAST(NEWID() AS varchar(50)) ,total_amount = CHECKSUM(NEWID()) / 1000.0 FROM sys.all_columns - SELECT COUNT(*) FROM @orders WHERE order_date > GETDATE() OPTION ( -- activate some (undocumented) trace flags to show -- statistics usage. More information on the flags -- can be found on Paul White's blog: -- http://sqlblog.com/blogs/paul_white/archive/2011/09/21/how-to-find-the-statistics-used-to-compile-an-execution-plan.aspx -- redirect output to the messages tab QUERYTRACEON 3604 -- show "interesting" statistics ,QUERYTRACEON 9292 -- show loaded statistics ,QUERYTRACEON 9402 -- add RECOMPILE to let the optimizer "see" -- the table cardinality ,RECOMPILE )
The output of the above batch is empty. Looks like no stats were loaded.
The actual execution plan confirms that no stats were loaded and the estimated cardinality of the table variable is way off:
If we repeat the test with a temporary table, we see a different behaviour.
SET NOCOUNT ON; IF OBJECT_ID('tempdb..#orders') IS NOT NULL DROP TABLE #orders; CREATE TABLE #orders ( order_id uniqueidentifier NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID() ,PO_number varchar(50) NOT NULL UNIQUE ,order_date datetime NOT NULL INDEX IX_order_date ,total_amount decimal(18,3) ) INSERT INTO #orders (order_date, PO_number, total_amount) SELECT order_date = DATEADD(second, CHECKSUM(NEWID()), GETDATE()) ,PO_number = CAST(NEWID() AS varchar(50)) ,total_amount = CHECKSUM(NEWID()) / 1000.0 FROM sys.all_columns SELECT COUNT(*) FROM #orders WHERE order_date > GETDATE() OPTION ( QUERYTRACEON 3604 ,QUERYTRACEON 9292 ,QUERYTRACEON 9402 )
This time the messages tab contains some output:
Stats header loaded: DbName: tempdb, ObjName: #orders, IndexId: 2, ColumnName: order_date, EmptyTable: FALSE Stats header loaded: DbName: tempdb, ObjName: #orders, IndexId: 2, ColumnName: order_date, EmptyTable: FALSE
The optimizer identified the statistics on the oder_date column as “interesting” and then loaded the stats header.
Again, the actual execution plan confirms that a better estimation is available:
The lack of statistics has always been the most significant difference between table variables and temporary tables and SQL2014 doesn’t appear to change the rules (yet).
Discovering resultset definition of DBCC commands in SQL Server 2012
Back in 2011 I showed a method to discover the resultset definition of DBCC undocumented commands.
At the time, SQL Server 2012 had not been released yet and nothing suggested that the linked server trick could stop working on the new major version. Surprisingly enough it did.
If you try to run the same code showed in that old post on a 2012 instance, you will get a quite explicit error message:
DECLARE @srv nvarchar(4000); SET @srv = @@SERVERNAME; -- gather this server name -- Create the linked server EXEC master.dbo.sp_addlinkedserver @server = N'LOOPBACK', @srvproduct = N'SQLServ', -- it’s not a typo: it can’t be “SQLServer” @provider = N'SQLNCLI', -- change to SQLOLEDB for SQLServer 2000 @datasrc = @srv; -- Set the authentication to "current security context" EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N'LOOPBACK', @useself = N'True', @locallogin = NULL, @rmtuser = NULL, @rmtpassword = NULL; USE tempdb; GO CREATE PROCEDURE loginfo AS BEGIN SET NOCOUNT ON; DBCC LOGINFO(); END GO SELECT * INTO tempdb.dbo.loginfo_output FROM OPENQUERY(LOOPBACK, 'SET FMTONLY OFF; EXEC tempdb.dbo.loginfo'); DROP PROCEDURE loginfo; GO
Msg 11528, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1 The metadata could not be determined because statement 'DBCC LOGINFO();' in procedure 'loginfo' does not support metadata discovery.
This behaviour has to do with the way SQL Server 2012 tries to discover metadata at parse/bind time, when the resultset is not available yet for DBCC commands.
Fortunately, there is still a way to discover metadata when you have a SQL Server instance of a previous version available.
On my laptop I have a 2008R2 instance I can use to query the 2012 instance with a linked server:
-- run this on the 2008R2 instance USE [master] GO EXEC master.dbo.sp_addlinkedserver @server = N'LOCALHOST\SQL2012' ,@srvproduct = N'SQL Server' EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N'LOCALHOST\SQL2012' ,@useself = N'True' ,@locallogin = NULL ,@rmtuser = NULL ,@rmtpassword = NULL GO SELECT * INTO tempdb.dbo.loginfo_output FROM OPENQUERY([LOCALHOST\SQL2012], 'SET FMTONLY OFF; EXEC tempdb.dbo.loginfo'); GO
This code pipes the results of the DBCC command into a table in the tempdb database in my 2008R2 instance. The table can now be scripted using SSMS:
Using the 2008R2 instance as a “Trojan Horse” for the metadata discovery, you can see that the resultset definition of DBCC LOGINFO() has changed again in SQL Server 2012:
CREATE TABLE [dbo].[loginfo_output]( [RecoveryUnitId] [int] NULL, -- new in SQL2012 [FileId] [int] NULL, [FileSize] [bigint] NULL, [StartOffset] [bigint] NULL, [FSeqNo] [int] NULL, [Status] [int] NULL, [Parity] [tinyint] NULL, [CreateLSN] [numeric](25, 0) NULL )
This trick will be particularly useful for an upcoming (and long overdue) post, so… stay tuned!
Using QUERYTRACEON in plan guides
Yesterday the CSS team made the QUERYTRACEON hint publicly documented.
This means that now it’s officially supported and you can use it in production code.
After reading the post on the CSS blog, I started to wonder whether there is some actual use in production for this query hint, given that it requires the same privileges as DBCC TRACEON, which means you have to be a member of the sysadmin role.
In fact, if you try to use that hint when connected as a low privileged user, you get a very precise error message, that leaves no room for interpretation:
SELECT * FROM [AdventureWorks2012].[Person].[Person] OPTION (QUERYTRACEON 4199)
Msg 2571, Level 14, State 3, Line 1
User ‘guest’ does not have permission to run DBCC TRACEON.
How can a query hint available to sysadmins only be possibly useful for production?
I posted my doubt on Twitter using the #sqlhelp hashtag and got interesting replies from Paul Randal, Paul White and Robert Davis.
My concerns were not about the usefulness of the hint per se, but about the usefulness in production code. Often 140 chars are not enough when you want to express your thoughts clearly, in fact I decided to write this blog post to clarify what I mean.
As we have seen, the QUERYTRACEON query hint cannot be used directly by users not in the sysadmin role, but it can be used in stored procedures with “EXECUTE AS owner” and in plan guides.
While it’s completely clear what happens when the hint is used in procedures executed in the context of the owner, what happens in plan guides is not so obvious (at least, not to me). In fact, given that the secuirty context is not changed when the plan guide is matched and applied, I would have expected it to fail miserably when executed by a low privileged user, but it’s not the case.
Let’s try and see what happens:
First of all we need a query “complex enough” to let the optimizer take plan guides into account. A straight “SELECT * FROM table” and anything else that results in a trivial plan won’t be enough.
SELECT * FROM [Person].[Person] AS P INNER JOIN [Person].[PersonPhone] AS H ON P.BusinessEntityID = H.BusinessEntityID INNER JOIN [Person].[BusinessEntity] AS BE ON P.BusinessEntityID = BE.BusinessEntityID INNER JOIN [Person].[BusinessEntityAddress] AS BEA ON BE.BusinessEntityID = BEA.BusinessEntityID WHERE BEA.ModifiedDate > '20080101'
Then we need a plan guide to apply the QUERYTRACEON hint:
EXEC sp_create_plan_guide @name = N'[querytraceon]', @stmt = N'SELECT * FROM [Person].[Person] AS P INNER JOIN [Person].[PersonPhone] AS H ON P.BusinessEntityID = H.BusinessEntityID INNER JOIN [Person].[BusinessEntity] AS BE ON P.BusinessEntityID = BE.BusinessEntityID INNER JOIN [Person].[BusinessEntityAddress] AS BEA ON BE.BusinessEntityID = BEA.BusinessEntityID WHERE BEA.ModifiedDate > ''20080101''', @type = N'SQL', @hints = N'OPTION (QUERYTRACEON 4199)'
If we enable the plan guide and try to issue this query in the context of a low privileged user, we can see no errors thrown any more:
CREATE LOGIN testlogin WITH PASSWORD = 'testlogin123'; GO USE AdventureWorks2012; GO CREATE USER testlogin FOR LOGIN testlogin; GO GRANT SELECT TO testlogin; GO EXECUTE AS USER = 'testlogin'; GO SELECT * FROM [Person].[Person] AS P INNER JOIN [Person].[PersonPhone] AS H ON P.BusinessEntityID = H.BusinessEntityID INNER JOIN [Person].[BusinessEntity] AS BE ON P.BusinessEntityID = BE.BusinessEntityID INNER JOIN [Person].[BusinessEntityAddress] AS BEA ON BE.BusinessEntityID = BEA.BusinessEntityID WHERE BEA.ModifiedDate > '20080101'; GO REVERT; GO
If we open a profiler trace and capture the “Plan Guide Successful” and “Plan Guide Unsuccessful” events, we can see that the optimizer matches the plan guide and enforces the use of the query hint.
Lesson learned: even if users are not allowed to issue that particular query hint directly, adding it to a plan guide is a way to let anyone use it indirectly.
Bottom line is OPTION QUERYTRACEON can indeed be very useful when we identify some queries that obtain a decent query plan only when a specific trace flag is active and we don’t want to enable it for the whole instance. In those cases, a plan guide or a stored procedure in the owner’s context can be the answer.
Changing Server Collation
In order to avoid collation conflict issues with TempDB, all user databases on a SQL Server instance should be set to the same collation.
Temporary tables and table variables are stored in TempDB, that means that, unless explicitly defined, all character-based columns are created using the database collation, which is the same of the master database. If user databases have a different collation, joining physical tables to temporary tables may cause a collation conflict.
For instance you could have a user database with collation SQL_Latin1_General_CP1_CI_AS on a server with collation Latin1_General_CI_AS:
SELECT name, collation_name, is_system = CASE WHEN name IN ('master', 'model', 'tempdb', 'msdb') THEN 1 ELSE 0 END FROM sys.databases ORDER BY is_system DESC, name ASC
Every time you create a temporary table and join it to a permanent table in this database, you have to ensure that the collations match, or you will run into a collation conflict:
USE dblocal GO CREATE TABLE TestTable ( testColumn varchar(10) ) INSERT INTO TestTable VALUES ('conflict') CREATE TABLE #TempTable ( tempColumn varchar(10) ) INSERT INTO #TempTable VALUES ('conflict') SELECT * FROM TestTable INNER JOIN #TempTable ON testColumn = tempColumn
The only ways to avoid the conflict are:
- Define explicitly the collation on the temporary table with a collation name or database_default
- Add the COLLATE clause to the join predicate, using a collation name or database_default
-- OPTION 1: define the collation when creating the table CREATE TABLE #TempTable ( tempColumn varchar(10) COLLATE database_default ) -- OPTION 2: force a specific collation when querying the table SELECT * FROM TestTable INNER JOIN #TempTable ON testColumn = tempColumn COLLATE database_default
The first option must be preferred when possible, because it doesn’t need performing scalar calculations that end up making any index referencing the column virtually useless.
However, both methods require editing the code, which is not always possible. In those cases, changing the server collation is the only possible solution.
In the SQL Server 2000 days, there was a “nice” tool called rebuildm, that recreated the master database and could achieve this task simply. From SQL Server 2005 on, the only tool to rebuild system databases is the setup utility.
This is what Microsoft recommends to rebuild system databases: http://msdn.microsoft.com/en-us/library/ms179254.aspx
Setup /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME=InstanceName /SQLSYSADMINACCOUNTS=accounts /[ SAPWD= StrongPassword ] /SQLCOLLATION=CollationName
Be warned that blogs and forums are filled with bad advice on this topic. The most frequent ones are these:
- Change the collation of the model database
TempDB is created from model when the service starts, which makes this method look smart. It’s not! Changing the database collation for model doesn’t change the collation of the objects inside the database: you would end up with a messy template for every new database. - Copy the model database from another instance with the desired collation.
It could work, given that you copy the database from an instance with the exact same @@version. Even if you managed to copy the database, tempdb would have a collation other than those of master and msdb, which would mean moving the original problem among the system databases. Believe me, this is not what you want.
The recommended method requires running the setup and usually is not a big deal. This blog on MSDN (CSS Support Engineers) describes in depth how the setup works and how to troubleshoot some common issues: http://blogs.msdn.com/b/psssql/archive/2008/08/29/how-to-rebuild-system-databases-in-sql-server-2008.aspx
An undocumented method
There seems to be another undocumented method, mentioned (and advertised as safe) by Paul Randal (blog | twitter) in one of his presentations: http://devconnections.com/updates/LasVegas_Fall10/SQL/Randal-SQL-SDB407-Undocumented.pdf (page 17). The method relies on a startup parameter that allows changing the server collation when combined with a couple of trace flags:
sqlservr -m -T4022 -T3659 -q"new collation"
Trace flag 3659 allows logging all errors to sql server logs
Trace flag 4022 forces SQL Server to skip startup stored procedures (if you have any).
Startup option “-m” forces single user mode.
Startup option “-q” rebuilds all databases and contained objects into the specified collation, without reinstalling the instance.
Besides being undocumented, this method seems to have some pitfalls. Gail Shaw (blog | twitter) raises a couple of warnings on this thread at SqlServerCentral: http://www.sqlservercentral.com/Forums/Topic269173-146-1.aspx
So, why recommend a method that could fail or cause harm to your instance? There is a scenario where this method comes extremely handy, mainly because the documented and supported method fails, which is a big SQL Server 2005 cluster.
One of the biggest innovations in SQL Server 2008 setup concerned the cluster installation: in SQL Server 2005 the setup process was run against all passive nodes and then against the active node for the instance, while SQL Server 2008 setup runs only against the local machine and has to be repeated for each node in the cluster.
Running the setup on all the passive nodes, as SQL Server 2005 setup does, can be a pain in the rear. The more nodes you have, more things can go wrong.
Despite all my efforts, I was unable to run the setup successfully with rebuild database action on a SQL Server 2005 cluster. The log files did not report anything useful for troubleshooting: they were only saying that one of the passive nodes could not complete the setup. I was ready to uninstall the instance and install from scratch when I came across the undocumented –q parameter. I tried (I had nothing to lose, after all) and it worked.
I hope it can work for you too.