In my last post I showed a query to identify non-unique indexes that should be unique.
You maybe have some other indexes that could be unique based on the data they contain, but are not.
To find out, you just need to query each of those indexes and group by the whole key, filtering out those that have duplicate values. It may look like an overwhelming amount of work, but the good news is I have a script for that:
DECLARE @sql nvarchar(max); WITH indexes AS ( SELECT QUOTENAME(OBJECT_SCHEMA_NAME(uq.object_id)) AS [schema_name] ,QUOTENAME(OBJECT_NAME(uq.object_id)) AS table_name ,uq.name AS index_name ,cols.name AS cols FROM sys.indexes AS uq CROSS APPLY ( SELECT STUFF(( SELECT ',' + QUOTENAME(sc.name) AS [text()] FROM sys.index_columns AS uc INNER JOIN sys.columns AS sc ON uc.column_id = sc.column_id AND uc.object_id = sc.object_id WHERE uc.object_id = uq.object_id AND uc.index_id = uq.index_id AND uc.is_included_column = 0 FOR XML PATH('') ),1,1,SPACE(0)) ) AS cols (name) WHERE is_unique = 0 AND has_filter = 0 AND is_hypothetical = 0 AND type IN (1,2) AND object_id IN ( SELECT object_id FROM sys.objects WHERE is_ms_shipped = 0 AND type = 'U' ) ) -- Build a big statement to query index data SELECT @sql = ( SELECT 'SELECT ''' + [schema_name] + ''' AS [schema_name], ''' + table_name + ''' AS table_name, ''' + index_name + ''' AS index_name, can_be_unique = CASE WHEN ( SELECT COUNT(*) FROM ( SELECT ' + cols + ',COUNT(*) AS cnt FROM ' + [schema_name] + '.' + [table_name] + ' GROUP BY ' + cols + ' HAVING COUNT(*) > 1 ) AS data ) > 0 THEN 0 ELSE 1 END;' FROM indexes FOR XML PATH(''), TYPE ).value('.','nvarchar(max)'); -- prepare a table to receive results DECLARE @results TABLE ( [schema_name] sysname, [table_name] sysname, [index_name] sysname, [can_be_unique] bit ) -- execute the script and pipe the results INSERT @results EXEC(@sql) -- show candidate unique indexes SELECT * FROM @results WHERE can_be_unique = 1 ORDER BY [schema_name], [table_name], [index_name]
The script should complete quite quickly, since you have convenient indexes in place. However, I suggest that you run it against a non production copy of your database, as it will scan all non unique indexes found in the database.
The results will include all the indexes that don’t contain duplicate data. Whether you should make those indexes UNIQUE, only you can tell.
Some indexes may contain unique data unintentionally, but could definitely store duplicate data in the future. If you know your data domain, you will be able to spot the difference.
Defining the appropriate primary key and unique constraints is fundamental for a good database design.
One thing that I often see overlooked is that all the indexes with a key that includes completely another UNIQUE index’s key should in turn be created as UNIQUE. You could argue that such an index has probably been created by mistake, but it’s not always the case.
If you want to check your database for indexes that can be safely made UNIQUE, you can use the following script:
SELECT OBJECT_SCHEMA_NAME(uq.object_id) AS [schema_name], OBJECT_NAME(uq.object_id) AS table_name, uq.name AS unique_index_name, nui.name AS non_unique_index_name FROM sys.indexes AS uq CROSS APPLY ( SELECT name, object_id, index_id FROM sys.indexes AS nui WHERE nui.object_id = uq.object_id AND nui.index_id <> uq.index_id AND nui.is_unique = 0 AND nui.has_filter = 0 AND nui.is_hypothetical = 0 ) AS nui WHERE is_unique = 1 AND has_filter = 0 AND is_hypothetical = 0 AND uq.object_id IN ( SELECT object_id FROM sys.tables ) AND NOT EXISTS ( SELECT column_id FROM sys.index_columns AS uc WHERE uc.object_id = uq.object_id AND uc.index_id = uq.index_id AND uc.is_included_column = 0 EXCEPT SELECT column_id FROM sys.index_columns AS nuic WHERE nuic.object_id = nui.object_id AND nuic.index_id = nui.index_id AND nuic.is_included_column = 0 ) ORDER BY [schema_name], table_name, unique_index_name
You may wonder why you should bother making those indexes UNIQUE.
The answer is that constraints help the optimizer building better execution plans. Marking an index as UNIQUE tells the optimizer that one and just one row can be found for each key value: it’s a valuable information that can actually help estimating the correct cardinality.
Does the script return any rows? Make those indexes UNIQUE, you’ll thank me later.
Today I noticed in my WordPress site stats that somebody landed on this blog from a search engine with the keywords “spaghetti dba”.
I found it hilarious that somebody was really searching for me that way, and I performed the search on Google to see how this blog would rank.
The third result from Google made me chuckle: The Spaghetti-Western Database Trailer. And here is the video showing on that page:
Awesome. I love Spaghetti-Western movies and I love databases, I couldn’t ask for more!
The video is a trailer to the site www.spaghetti-western.net and… Oh! Wait: is that a NoSQL database? #fail