Monitoring blocking and deadlocking with Extended T-SQL Collector


Monitoring blocking and deadlocking hasn’t always been easy. Sometimes, the tools available in SQL Server are not as friendly as we would like them to be.

Some are very useful for alerting, but don’t provide a simple way to capture additional information about the event. For instance, a simple SQLAgent alert for “deadlocks/sec > 0″ is enough to shoot an email to the DBAs whenever a deadlock happens, but it doesn’t provide a simple way to capture the deadlock graph and include it in the email or save it to a destination table.

Some other methods are very useful for capturing all the information we need, but don’t provide a way to fire an alert when the event is captured. For instance, capturing blocked process reports with an Extended Events session is very easy, but firing an alert when the event is captured is not.

In this post I will show you an easy way to combine monitoring and alerting using the cutting edge monitoring technology available in SQL Server: Extended Events.

Extended Events are cool: they can capture events not available otherwise and they impact the monitored server much less than other capture technologies (e.g. SQL Trace). However, as far as alerting is concerned, they are still lacking a piece of the puzzle. In fact Extended Events don’t provide a built-in way to fire an alert whenever an event is captured. With SQL Trace we had Event Notifications that did exactly this, using a Service Broker queue.

It would be great if there was a way to capture events with an XE session, store them to a table for later analysis and alert as soon as the event occurs.

The good news is that a tool that can do that already exists and even better news is that it’s free and open source! Its name is Extended T-SQL Collector.

Extended T-SQL Collector allows you to combine Extended Events and Data Collector, with the addition of alerting for interesting events. All you have to do is download your copy of Extended T-SQL Collector, install it on your server and then set up a collection set for blocking and deadlocking.

If you ever had the chance to play with Data Collector and tried to create a custom collection set, you probably have noticed that it’s not as easy as it should. This part of the Data Collector is missing a Graphical User Interface and requires messing with XML. Nobody likes XML. If you do, be aware that you’re in a very small minority.

Creating the collection set with the GUI

With Extended T-SQL Collector you can create a collection set to capture blocked processes and deadlocks using the GUI.
Here’s how you do it:

1. Start the Collectionset Manager and connect it to your instance.

2. If the custom collector types provided by Extended T-SQL Collector are not installed, install them

setup

3. Right click the “Collection Sets” node and select “Add”

BlockingDeadlocking0

4. Fill the information about your Collection Set and hit “Save”

BlockingDeadlocking1

5. Right click the Collection Set “Blocking and Deadlocking” and select “Add”. Now enter the information for the first collection item (Blocked Processes)

BlockingDeadlocking2

6. Click the “Parameters” button and fill the dialog with the information requested. The most important part of this step is providing a definition for the Extended Events session that captures blocked processes. This is the session I am using:

CREATE EVENT SESSION [blocked_processes] ON SERVER ADD EVENT sqlserver.blocked_process_report
WITH (
	MAX_MEMORY = 2048 KB
	,EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
	,MAX_DISPATCH_LATENCY = 30 SECONDS
	,MAX_EVENT_SIZE = 0 KB
	,MEMORY_PARTITION_MODE = NONE
	,TRACK_CAUSALITY = OFF
	,STARTUP_STATE = ON
)

You also have to enter values for the output table (the name of the table that will hold the data in the MDW database), an optional filter to futher limiting the rows you want to save to the database and the comma separated list of the event session fields you want to add to the destination table. In this case, the filter is “duration <= 40000000" to capture the "blocked_process" event just once, because a new event will be fired each time the configuration option "blocked_process_threshold" and its multiples are exceeded.

BlockingDeadlocking3

7. Now click the “Alerting” tab and enter the information needed to fire an email every time an event is captured. Extended T-SQL Collector uses Database Mail, so make sure you have a profile that you can use for this purpose. Now hit OK to close this dialog and hit “Save” in the main window.

BlockingDeadlocking4

8. Now you need a collection item to collect deadlocks. Right click the collection set on the tree on the left and click “Add”. Now fill the information as you did for the collection item for blocked processes and hit “Save”

BlockingDeadlocking5

9. Hit “Parameters” and provide all the information needed for the collection item. Again, you need a session definition:

CREATE EVENT SESSION [deadlocks] ON SERVER
ADD EVENT sqlserver.xml_deadlock_report
WITH (
	MAX_MEMORY = 2048 KB
	,EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
	,MAX_DISPATCH_LATENCY = 30 SECONDS
	,MAX_EVENT_SIZE = 0 KB
	,MEMORY_PARTITION_MODE = NONE
	,TRACK_CAUSALITY = OFF
	,STARTUP_STATE = ON
)

Enter information for  Output Table and Columns list as well.

BlockingDeadlocking6

 

10. Click the “Alerting” tab and fill the information for the email setup.

BlockingDeadlocking7

11. Now just hit OK to close the dialog and save in the main window.

12. Hit “Start” in the collection set window to start the collection set.

Creating the collection set with a script

Too complicated? Don’t worry, you can permform the same exact process using a script:

-- Enable editing advanced configuration options
EXEC sp_configure 'advanced', 1
RECONFIGURE
GO

-- Set the blocked process threshold
EXEC sp_configure 'blocked process threshold (s)', 20
RECONFIGURE
GO

BEGIN TRANSACTION
BEGIN TRY

DECLARE @collection_set_id_1 int
DECLARE @collection_set_uid_2 uniqueidentifier
EXEC [msdb].[dbo].[sp_syscollector_create_collection_set]
    @name=N'Blocking and Deadlocking',
    @collection_mode=0,
    @description=N'Collects Blocked Process Reports and Deadlocks using Extended Events',
    @logging_level=1,
    @days_until_expiration=30,
    @schedule_name=N'CollectorSchedule_Every_5min',
    @collection_set_id=@collection_set_id_1 OUTPUT,
    @collection_set_uid= '19AE101D-B30F-4447-8233-1314AEF0A02F'

DECLARE @collector_type_uid_3 uniqueidentifier

SELECT @collector_type_uid_3 = collector_type_uid
FROM [msdb].[dbo].[syscollector_collector_types]
WHERE name = N'Extended XE Reader Collector Type';

DECLARE @collection_item_id_4 int
EXEC [msdb].[dbo].[sp_syscollector_create_collection_item]
    @name=N'Blocked Processes',
    @parameters=N'
<ns:ExtendedXEReaderCollector xmlns:ns="DataCollectorType">
    <Session>
        <Name>blocked_processes</Name>
        <OutputTable>blocked_processes</OutputTable>
        <Definition>
        CREATE EVENT SESSION [blocked_processes] ON SERVER ADD EVENT sqlserver.blocked_process_report
        WITH (
            MAX_MEMORY = 2048 KB
            ,EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
            ,MAX_DISPATCH_LATENCY = 30 SECONDS
            ,MAX_EVENT_SIZE = 0 KB
            ,MEMORY_PARTITION_MODE = NONE
            ,TRACK_CAUSALITY = OFF
            ,STARTUP_STATE = ON
            )
        </Definition>
        <Filter>duration &lt;= 40000000</Filter>
        <ColumnsList>blocked_process</ColumnsList>
    </Session>
    <Alert Enabled="true" WriteToERRORLOG="false" WriteToWindowsLog="false">
        <Sender>MailProfile</Sender>
        <Recipient>dba@localhost.localdomain</Recipient>
        <Subject>Blocked process detected</Subject>
        <Importance>High</Importance>
        <ColumnsList>blocked_process</ColumnsList>
        <Filter>duration &lt;= 40000000</Filter>
        <Mode>Atomic</Mode>
        <Delay>60</Delay>
    </Alert>
</ns:ExtendedXEReaderCollector>',
    @collection_item_id=@collection_item_id_4 OUTPUT,
    @frequency=60,
    @collection_set_id=@collection_set_id_1,
    @collector_type_uid=@collector_type_uid_3

Declare @collection_item_id_6 int
EXEC [msdb].[dbo].[sp_syscollector_create_collection_item] @name=N'Collect deadlocks', @parameters=N'
<ns:ExtendedXEReaderCollector xmlns:ns="DataCollectorType">
    <Session>
        <Name>deadlocks</Name>
        <OutputTable>deadlocks</OutputTable>
        <Definition>
        CREATE EVENT SESSION [deadlocks] ON SERVER
        ADD EVENT sqlserver.xml_deadlock_report
        WITH (
            MAX_MEMORY = 2048 KB
            ,EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
            ,MAX_DISPATCH_LATENCY = 30 SECONDS
            ,MAX_EVENT_SIZE = 0 KB
            ,MEMORY_PARTITION_MODE = NONE
            ,TRACK_CAUSALITY = OFF
            ,STARTUP_STATE = ON
            )
        </Definition>
        <ColumnsList>xml_report</ColumnsList>
    </Session>
    <Alert Enabled="true" WriteToERRORLOG="false" WriteToWindowsLog="false">
        <Sender>MailProfile</Sender>
        <Recipient>dba@localhost.localdomain</Recipient>
        <Subject>Deadlock detected</Subject>
        <Importance>High</Importance>
        <ColumnsList>xml_report</ColumnsList>
        <Mode>Atomic</Mode>
        <Delay>60</Delay>
    </Alert>
</ns:ExtendedXEReaderCollector>',
    @collection_item_id=@collection_item_id_6 OUTPUT,
    @frequency=60,
    @collection_set_id=@collection_set_id_1,
    @collector_type_uid=@collector_type_uid_3

COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    ROLLBACK TRANSACTION;
    DECLARE @ErrorMessage NVARCHAR(4000);
    DECLARE @ErrorSeverity INT;
    DECLARE @ErrorState INT;
    DECLARE @ErrorNumber INT;
    DECLARE @ErrorLine INT;
    DECLARE @ErrorProcedure NVARCHAR(200);
    SELECT @ErrorLine = ERROR_LINE(),
           @ErrorSeverity = ERROR_SEVERITY(),
           @ErrorState = ERROR_STATE(),
           @ErrorNumber = ERROR_NUMBER(),
           @ErrorMessage = ERROR_MESSAGE(),
           @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');
    RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, @ErrorLine, @ErrorMessage);

END CATCH;

GO

Getting Alerts

Now that your session is running, you can be notified via email whenever blocking or deadlocking occurs.

This is a sample of the email you will receive, containing the colums you selected in the setup.

deadlockemail

You can also use the Collectionset Manager UI to query the information captured from the MDW database. You just need to right click a collection item and select “Query” from the context menu. Select a table from the dropdown  and adjust the date/time range with the sliders:

deadlock_query

Bottom line

Extended T-SQL Collector makes monitoring and alerting easy. It uses Extended Events to capture information from your server and alert you when problematic events arise.

Go download your copy! What are you waiting for?

Another good reason to avoid AUTO_CLOSE


Does anybody need another good reason to avoid setting AUTO_CLOSE on a database? Looks like I found one.

Some days ago, all of a sudden, a database started to throw errors along the lines of “The log for database MyDatabase is not available”. The instance was an old 2008 R2 Express (don’t get me started on why an Express Edition is in production…) with some small databases.

The log was definitely there and the database looked online. Actually, I was able to query the tables, but every attempt to update the contents ended up with the “log unavailable” error.

Then I opened the ERRORLOG and found something really interesting: lots and lots of entries similar to “Starting up database MyDatabase” over and over… Does it ring a bell?

Yes, it’s AUTO_CLOSE

Looks like SQL Server closed the database and failed to open it completely, hence the “log unavailable” errors.

What should be done now to bring the database back to normal behaviour? Simply bring the database offline and then back online:

ALTER DATABASE MyDatabase SET OFFLINE;
ALTER DATABASE MyDatabase SET ONLINE;

And while we’re at it, let’s disable AUTO_CLOSE:

ALTER DATABASE MyDatabase SET AUTO_CLOSE OFF;

How can such a situation be prevented? There are many ways to accomplish this, ranging from PBM (Policy Based Management) to scheduled T-SQL health checks (see sp_blitz for instance).

See? Best practices are not for losers!

Installing SQL Server 2014 Language Reference Help from disk


Some weeks ago I had to wipe my machine and reinstall everything from scratch, SQL Server included.

For some reason that I still don’t understand, SQL Server Management Studio installed fine, but I couldn’t install Books Online from the online help repository. Unfortunately, installing from offline is not an option with SQL Server 2014, because the installation media doesn’t include the Language Reference documentation.

The issue is well known: Aaron Bertrand blogged about it back in april when SQL Server 2014 came out and he updated his post in august when the documentation was finally completely published. He also blogged about it at SQLSentry.

However, I couldn’t get that method to work: the Help Library Manager kept firing errors as soon as I clicked the “Install from Online” link. The error message was “An exception has occurred. See the event log for details.

Needless to say that the event log had no interesting information to add.

If you are experiencing the same issue, here is a method to install the language reference from disk without downloading the help content from the Help Library Manager:

1 . Open a web browser and point it to the following url: http://services.mtps.microsoft.com/ServiceAPI/products/dd433097/dn632688/books/dn754848/en-us

2. Download the individual .cab files listed in that page to a location in your disk (e.g. c:\temp\langref\)

3. Create a text file name HelpContentSetup.msha in the same folder as the .cab files and paste the following html:

<html xmlns="http://www.w3.org/1999/xhtml">
<head />
<body class="vendor-book">
    <div class="details">
        <span class="vendor">Microsoft</span>
        <span class="locale">en-us</span>
        <span class="product">SQL Server 2014</span>
        <span class="name">Microsoft SQL Server Language Reference</span>
    </div>
    <div class="package-list">
        <div class="package">
            <span class="name">SQL_Server_2014_Books_Online_B4164_SQL_120_en-us_1</span>
            <span class="deployed">False</span>
            <a class="current-link" href="sql_server_2014_books_online_b4164_sql_120_en-us_1(0b10b277-ad40-ef9d-0d66-22173fb3e568).cab">sql_server_2014_books_online_b4164_sql_120_en-us_1(0b10b277-ad40-ef9d-0d66-22173fb3e568).cab</a>
        </div>
        <div class="package">
            <span class="name">SQL_Server_2014_Microsoft_SQL_Server_Language_Reference_B4246_SQL_120_en-us_1</span>
            <span class="deployed">False</span>
            <a class="current-link" href="sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_1(5c1ad741-d0e3-a4a8-d9c0-057e2ddfa6e1).cab">sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_1(5c1ad741-d0e3-a4a8-d9c0-057e2ddfa6e1).cab</a>
        </div>
        <div class="package">
            <span class="name">SQL_Server_2014_Microsoft_SQL_Server_Language_Reference_B4246_SQL_120_en-us_2</span>
            <span class="deployed">False</span>
            <a class="current-link" href="sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_2(24815f90-9e36-db87-887b-cf20727e5e73).cab">sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_2(24815f90-9e36-db87-887b-cf20727e5e73).cab</a>
        </div>
    </div>
</body>
</html>

4 . Open the Help Library Manager and select “Install content from disk”

5. Browse to the .msha you just created and click Next

langref1

6. The SQL Server 2014 node will appear. Click the Add link

langref2

7. Click the Update button and let the installation start

langref3

8. Installation will start and process the cab files

langref4

9. Installation finished!

langref5

9. To check whether everything is fine, click on the “remove content” link and you should see the documentation.

langref6

Done! It was easy after all, wasn’t it?

Database Free Space Monitoring – The right way


Lately I spent some time evaluating some monitoring tools for SQL Server and one thing that struck me very negatively is how none of them (to date) has been reporting database free space correctly.
I was actively evaluating one of those tools when one of my production databases ran out of space without any sort of warning.
I was so upset that I decided to code my own monitoring script.

Some things to take into account:

  • Hard set limits for file growth have to be considered: a drive with lots of space is useless if the database file cannot grow and take it.
  • If fixed growth is used, there must be enough space in the drive to accomodate the growth amount you set.
  • If percent growth is used, you have to calculate recursively how much your database file will grow before taking all the space in the drive
  • Some scripts found in blogs and books don’t account for mount points. Use sys.dm_os_volume_stats to include mount points in your calculation (unless you’re running SQL Server versions prior to 2012).
  • Database free space alone is not enough. NTFS performance start degrading when the drive free space drops below 20%. Make sure you’re monitoring that as well.
  • 20% of a huge database can be lots of space. You can change that threshold to whatever you find appropriate (for instance, less than 20% AND less than 20 GB)

That said, here is my script, I hope you find it useful.


-- create a temporary table to hold data from sys.master_files
IF OBJECT_ID('tempdb..#masterfiles') IS NOT NULL
    DROP TABLE #masterfiles;

CREATE TABLE #masterfiles (
    database_id int,
    type_desc varchar(10),
    name sysname,
    physical_name varchar(255),
    size_mb int,
    max_size_mb int,
    growth int,
    is_percent_growth bit,
    data_space_id int,
    data_space_name nvarchar(128) NULL,
    drive nvarchar(512),
    mbfree int
);

-- extract file information from sys.master_files
-- and correlate each file to its logical volume
INSERT INTO #masterfiles
SELECT
     mf.database_id
    ,type_desc
    ,name
    ,physical_name
    ,size_mb = size / 128
    ,max_size_mb =
        CASE
            WHEN max_size = 268435456 AND type_desc = 'LOG' THEN -1
            ELSE
                CASE
                    WHEN max_size = -1 THEN -1
                    ELSE max_size / 128
                END
        END
    ,mf.growth
    ,mf.is_percent_growth
    ,mf.data_space_id
    ,NULL
    ,d.volume_mount_point
    ,d.available_bytes / 1024 / 1024
FROM sys.master_files AS mf
CROSS APPLY sys.dm_os_volume_stats(database_id, file_id) AS d;

-- add an "emptyspace" column to hold empty space for each file
ALTER TABLE #masterfiles ADD emptyspace_mb int NULL;

-- iterate through all databases to calculate empty space for its files
DECLARE @name sysname;

DECLARE c CURSOR FORWARD_ONLY READ_ONLY STATIC LOCAL
FOR
SELECT name
FROM sys.databases
WHERE state_desc = 'ONLINE'

OPEN c
FETCH NEXT FROM c INTO @name

WHILE @@FETCH_STATUS = 0
BEGIN

    DECLARE @sql nvarchar(max)
    DECLARE @statement nvarchar(max)
    SET @sql = '
        UPDATE mf
        SET emptyspace_mb = size_mb - FILEPROPERTY(name,''SpaceUsed'') / 128,
            data_space_name =
                ISNULL(
                    (SELECT name FROM sys.data_spaces WHERE data_space_id = mf.data_space_id),
                    ''LOG''
                )
        FROM #masterfiles AS mf
        WHERE database_id = DB_ID();
    '
    SET @statement = 'EXEC ' + QUOTENAME(@name) + '.sys.sp_executesql @sql'
    EXEC sp_executesql @statement, N'@sql nvarchar(max)', @sql

    FETCH NEXT FROM c INTO @name
END

CLOSE c
DEALLOCATE c

-- create a scalar function to simulate the growth of the database in the drive's available space
IF OBJECT_ID('tempdb..calculateAvailableSpace') IS NOT NULL
    EXEC tempdb.sys.sp_executesql N'DROP FUNCTION calculateAvailableSpace'

EXEC tempdb.sys.sp_executesql N'
CREATE FUNCTION calculateAvailableSpace(
    @diskFreeSpaceMB float,
    @currentSizeMB float,
    @growth float,
    @is_percent_growth bit
)
RETURNS int
AS
BEGIN
    IF @currentSizeMB = 0
        SET @currentSizeMB = 1
    DECLARE @returnValue int = 0
    IF @is_percent_growth = 0
    BEGIN
        SET @returnValue = (@growth /128) * CAST((@diskFreeSpaceMB / (@growth / 128)) AS int)
    END
    ELSE
    BEGIN
        DECLARE @prevsize AS float = 0
        DECLARE @calcsize AS float = @currentSizeMB
        WHILE @calcsize < @diskFreeSpaceMB
        BEGIN
            SET @prevsize = @calcsize
            SET @calcsize = @calcsize + @calcsize * @growth / 100.0
        END
        SET @returnValue = @prevsize - @currentSizeMB
        IF @returnValue < 0
            SET @returnValue = 0
    END

    RETURN @returnValue
END
'

-- report database filegroups with less than 20% available space
;WITH masterfiles AS (
    SELECT *
        ,available_space =
            CASE mf.max_size_mb
                WHEN -1 THEN tempdb.dbo.calculateAvailableSpace(mbfree, size_mb, growth, is_percent_growth)
                ELSE max_size_mb - size_mb
            END
            + emptyspace_mb
    FROM #masterfiles AS mf
),
spaces AS (
    SELECT
         DB_NAME(database_id) AS database_name
        ,data_space_name
        ,type_desc
        ,SUM(size_mb) AS size_mb
        ,SUM(available_space) AS available_space_mb
        ,SUM(available_space) * 100 /
            CASE SUM(size_mb)
                WHEN 0 THEN 1
                ELSE SUM(size_mb)
            END AS available_space_percent
    FROM masterfiles
    GROUP BY DB_NAME(database_id)
        ,data_space_name
        ,type_desc
)
SELECT *
FROM spaces
WHERE available_space_percent < 20
ORDER BY available_space_percent ASC

IF OBJECT_ID('tempdb..#masterfiles') IS NOT NULL
    DROP TABLE #masterfiles;

IF OBJECT_ID('tempdb..calculateAvailableSpace') IS NOT NULL
    EXEC tempdb.sys.sp_executesql N'DROP FUNCTION calculateAvailableSpace'

I am sure that there are smarter scripts around that calculate it correctly and I am also sure that there are other ways to obtain the same results (PowerShell, to name one). The important thing is that your script takes every important aspect into account and warns you immediately when the database space drops below your threshold, not when the available space is over.

Last time it happened to me it was a late saturday night and, while I really love my job, I can come up with many better ways to spend my saturday night.

I’m pretty sure you do as well.

Announcing ExtendedTSQLCollector


I haven’t been blogging much lately, actually I haven’t been blogging at all in the last 4 months. The reason behind is I have been putting all my efforts in a new project I started recently, which absorbed all my attention and spare time.

I am proud to announce that my project is now live and available to everyone for download.

1397849640_69471The project name is ExtendedTSQLCollector and you can find it at http://extendedtsqlcollector.codeplex.com. As you may have already guessed, it’s a bridge between two technologies that were not meant to work together, that could instead bring great advantages when combined: Extended Events and Data Collector.

ExtendedTSQLCollector is a set of two Collector Types built to overcome some of the limitations found in the built-in collector types and extend their functionality to include the ability to collect data from XE sessions.

The first Collector Type is the “Extended T-SQL Query” collector type, which was my initial goal when I started the project. If you have had the chance to play with the built-in “Generic T-SQL Query” collector type, you may have noticed that not all datatypes are supported. For instance, it’s impossible to collect data from XML or varchar(max) columns. This is due to the intermediate format used by this collector type: the SSIS raw files.

The “Extended T-SQL Query” collector type uses a different intermediate format, which allows collecting data of any data type. This is particularly useful, because SQL Server exposes lots of information in XML format (just think of the execution plans!) and you no longer need to code custom SSIS packages to collect that data.

The second Collector Type is the “Extended XE Reader” collector type, which takes advantage of the Extended Events streaming APIs to collect data from an Extended Events session, without the need to specify additional targets such as .xel files or ring buffers. This means no file system bloat due to .xel rollover files and no memory consumption for additional ring buffers: all the events are read directly from the session and processed in near real-time.

In addition to the filter predicates defined in the XE session, you can add more filter predicates on the data to collect and upload to the MDW and decide which columns (fields and actions) to collect. The collector will take care of creating the target table in your MDW database and upload all the data that satisfies the filter predicates.

The near real-time behavior of this collector type allowed me to include an additional feature to the mix: the ability to fire alerts in response to Extended Events. The current release (1.5) allows firing email alerts when the events are captured, with additional filter predicates and the ability to include event fields and actions in the email body. You can find more information on XE alerts in the documentation.

Here is an example of the email alerts generated by the XEReader collector type for the blocked_process event:

email

 

Another part of the project is the CollectionSet Manager, a GUI to install the collector types to the target servers and configure collection sets and collection items. I think that one of the reasons why the Data Collector is very underutilized by DBAs is the lack of a Graphical UI. Besides the features specific to the ExtendedTSQLCollector, such as installing the collector type, this small utility aims at providing the features missing in the SSMS Data Collector UI. This part of the project is still at an early stage, but I am planning to release it in the next few months.

My journey through the ins and outs of the Data Collector allowed me to understand deeply how it works and how to set it up and troubleshoot it. Now I am planning to start a blog series on this topic, from the basics to the advanced features. Stay tuned :-)

I don’t want to go into deep details on the setup and configuration of this small project: I just wanted to ignite your curiosity and make you rush to codeplex to download your copy of ExtendedTSQLCollector.

What are you waiting for?

Hangout #18 with Boris Hristov


Yesterday evening I had the honour and pleasure of recording one of his famous SQL Hangout with my friend Boris Hristov (b|t).

We discussed some of the new features in SQL Server 2014, in particular the new Cardinality Estimator and the Delayed Durability. Those are definitely interesting innovations and something everybody should be checking out when planning new work on SQL Server 2014. The two features have nothing to do with each other, but we decided to speak about both of them nevertheless.

I really like the hangout format: it’s informal, fun to record and to watch. I hope you’ll have the same fun watching it.

Enjoy!

 

 

 

Uniquifiers: all rows or the duplicate keys only?


Some days ago I was talking with my friend Davide Mauri about the uniquifier that SQL Server adds to clustered indexes when they are not declared as UNIQUE.

We were not completely sure whether this behaviour applied to duplicate keys only or to all keys, even when unique.

The best way to discover the truth is a script to test what happens behind the scenes:

-- =============================================
-- Author:      Gianluca Sartori - @spaghettidba
-- Create date: 2014-03-15
-- Description: Checks whether the UNIQUIFIER column
--              is added to a column only on
--              duplicate clustering keys or all
--              keys, regardless of uniqueness
-- =============================================

USE tempdb
GO

IF OBJECT_ID('sizeOfMyTable') IS NOT NULL
    DROP VIEW sizeOfMyTable;
GO

-- Create a view to query table size information
-- Not very elegant, but saves a lot of typing

CREATE VIEW sizeOfMyTable
AS
SELECT OBJECT_NAME(si.object_id) AS table_name,
    si.name AS index_name,
    SUM(total_pages) AS total_pages,
    SUM(used_pages) AS used_pages,
    SUM(data_pages) AS data_pages
FROM sys.partitions AS p
INNER JOIN sys.allocation_units AS AU
    ON P.hobt_id = AU.container_id
INNER JOIN sys.indexes AS si
    ON si.index_id = p.index_id
    AND si.object_id = p.object_id
WHERE si.object_id = OBJECT_ID('#testUniquifier')
GROUP BY OBJECT_NAME(si.object_id),
    si.name
GO

IF OBJECT_ID('#testUniquifier') IS NOT NULL
    DROP TABLE #testUniquifier;

-- Create a test table

CREATE TABLE #testUniquifier (
    i int NOT NULL
)

-- Results table: will receive table size
-- in different scenarios

DECLARE @results TABLE(
    description varchar(500),
    table_name sysname,
    index_name sysname,
    total_pages int,
    used_pages int,
    data_pages int
);

-- INSERTS 100K UNIQUE VALUES

INSERT INTO #testUniquifier
SELECT TOP(100000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM sys.all_columns AS AC
CROSS JOIN sys.all_columns AS AC1;

-- -----------------------------------------------------------------
-- TEST1: CREATES A UNIQUE CLUSTERED INDEX (NO UNIQUIFIER)
-- -----------------------------------------------------------------

CREATE UNIQUE CLUSTERED INDEX UK_test ON #testUniquifier(i);

INSERT @results
SELECT 'Unique clustered index' AS description, * FROM sizeOfMyTable;

DROP INDEX UK_test ON #testUniquifier

-- -----------------------------------------------------------------
-- TEST2: CREATES A NON-UNIQUE CLUSTERED INDEX
--        NO DUPLICATES ARE PRESENT YET
-- -----------------------------------------------------------------

CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i)

INSERT @results
SELECT 'Non-Unique clustered index, no duplicates' AS description, * FROM sizeOfMyTable

DROP INDEX IX_test ON #testUniquifier

-- -----------------------------------------------------------------
-- TEST3: CREATES A NON-UNIQUE CLUSTERED INDEX
--        10000 DUPLICATE VALUES ARE PRESENT
-- -----------------------------------------------------------------

UPDATE TOP(10000) #testUniquifier SET i = 1

CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i)

INSERT @results
SELECT 'Non-Unique clustered index, some duplicates' AS description, * FROM sizeOfMyTable

DROP INDEX IX_test ON #testUniquifier

-- -----------------------------------------------------------------
-- TEST4: CREATES A NON-UNIQUE CLUSTERED INDEX
--        ALL ROWS CONTAIN THE SAME VALUE (1)
-- -----------------------------------------------------------------

UPDATE #testUniquifier SET i = 1

CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i)

INSERT @results
SELECT 'Non-Unique clustered index, all duplicates' AS description, * FROM sizeOfMyTable

-- -----------------------------------------------------------------
-- Display results
-- -----------------------------------------------------------------

SELECT *
FROM @results;

As you can see, the uniquifier is added only to the keys that are duplicated:

uniquifier

Another way to discover the same results would be looking at the output of DBCC PAGE().

Looking at the text output of DBCC PAGE, uniquifiers are displayed as 0 (zero) when the values are not set, but the values are actually missing from the page.

This becomes even clearer when using DBCC PAGE WITH TABLERESULTS:

IF OBJECT_ID('tempdb..#formatteddata') IS NOT NULL	
    DROP TABLE #formatteddata;

SELECT *, ROWNUM = ROW_NUMBER() OVER (ORDER BY page_id, slot_id)
INTO #formatteddata
FROM #testUniquifier
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%);

IF OBJECT_ID('tempdb..#dbccpage') IS NOT NULL
    DROP TABLE #dbccpage;


CREATE TABLE #dbccpage (
    page_id int,
    ParentObject varchar(128),
    Object varchar(128),
    Field varchar(128),
    value varchar(4000),
    Slot AS SUBSTRING(Object, NULLIF(CHARINDEX('Slot ',Object,1),0) + 5, ISNULL(NULLIF(CHARINDEX(' ',Object,6),0),0) - 5)
)

DECLARE @current_page_id int;

DECLARE pages CURSOR STATIC LOCAL FORWARD_ONLY READ_ONLY
FOR
SELECT DISTINCT page_id 
FROM #formatteddata

OPEN pages

FETCH NEXT FROM pages INTO @current_page_id

WHILE @@FETCH_STATUS = 0
BEGIN 
    INSERT INTO #dbccpage (ParentObject, Object, Field, value)
    EXEC sp_executesql N'DBCC PAGE (2, 1, @pageid, 3) WITH TABLERESULTS;', N'@pageid int', @current_page_id

    UPDATE #dbccpage 
    SET page_id = @current_page_id
    WHERE page_id IS NULL

    FETCH NEXT FROM pages INTO @current_page_id
END

CLOSE pages;
DEALLOCATE pages;


WITH PageData AS  (
    SELECT page_id, slot, field, value
    FROM #dbccpage
    WHERE field IN ('i', 'UNIQUIFIER')
),
Uniquifiers AS (
    SELECT *
    FROM PageData
    PIVOT (MAX(value) FOR field IN ([i], [UNIQUIFIER])) AS pvt
),
sourceData AS (
    SELECT *
    FROM #formatteddata
)
SELECT src.ROWNUM, 
    src.i, 
    src.page_id,
    src.slot_id,
    UNIQUIFIER
FROM sourceData AS src
LEFT JOIN Uniquifiers AS unq
    ON src.slot_id = unq.slot
    AND src.page_id = unq.page_id
ORDER BY ROWNUM;

If you run the code in the different situations outlined before (unique clustered index, non-unique clustered index with or without duplicate keys) you will find the uniquifiers associated with each duplicate key and you will also notice that no uniquifier is generated for the keys that are unique.

Verdasys Digital Guardian and SQL Server


I’m writing this post as a reminder for myself and possibly to help out the poor souls that may suffer the same fate as me.

There’s a software out there called “Digital Guardian” which is a data loss protection tool. Your computer may be running this software without you knowing: your system administrators may have installed it in order to prevent users from performing operations that don’t comply to corporate policies and may lead to data loss incidents.

For instance, Digital Guardian can prevent users from writing to USB pendrives and walk out of the office with a copy of the data in their pocket. Actually, this is just one of the policies than can be enforced by Digital Guardian: it’s a complete data protection framework that offers many powerful features.

The bad news is Digital Guardian relies on an agent daemon that runs very deep in the operating system and modifies the OS behaviour based on the policies defined by the system administrators. Most of the time, the user is notified of the tool’s intervention with explicit messages, stating that the operation is not permitted by corporate policies.

Sometimes (here comes the painful part) things randomly fail without any meaningful indication that Digital Guardian is responsible of the failure. Instead of getting sensible policy violation messages, you may get generic error messages that won’t be anywhere easy to troubleshoot. Sometimes, errors are not even due to policy violations, but are caused by the modifications in  the OS behaviour introduced by Digital Guardian itself.

For instance, when installing SQL Server, you may be presented this error message:

DG_setup_error

Is the error message “No more data is available” anywhere helpful? Not really.

I spent countless hours trying to understand what went wrong and I finally understood the cause of the failure when a coworker pointed out that Digital Guardian was running on that particular server.

What happened here?

Digital Guardian clumsily tries to hide itself. If you look for it in the installed programs applet in Control Panel you won’t find it. It also tries to hide itself in the registry, so when you enumerate the registry key “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\Digital Guardian Agent” you will get an error.

In one of the early stages, SQL Server’s setup verifies what software is installed in the machine and when it encounters Digital Guardian’s registry key, it fails miserably.

The only way to get past the error is to disable Digital Guardian.

Are you comfortable with running SQL Server on a machine with such a tool installed?

OK, you managed to install SQL Server by disabling Digital Guardian: now what?

  • What if SQL Server crashes?
  • What if everything turns horribly slow?
  • What if you get data corruption?
  • What if…?

Tools that interact with the OS at such low level scare the hell out of me. Anything that you install and run on a machine with such a tool becomes completely unreliable in my opinion. SQL Server was not intended to run against a modified OS and it was not tested to run like that.

SQL Server has its own security tools. They may not be perfect, but it’s how the product was intended to work and, frankly, they’re largely sufficient for 99% of the use cases. Probably, enabling TDE is better than preventing everyone from writing to USB drives.

If you think SQL Server security features are not enough for you, go on and activate one of those pesky tools. But let me ask: are you sure that you fall in that 1% ?

Non-unique indexes that COULD be unique


In my last post I showed a query to identify non-unique indexes that should be unique.

You maybe have some other indexes that could be unique based on the data they contain, but are not.

To find out, you just need to query each of those indexes and group by the whole key, filtering out those that have duplicate values. It may look like an overwhelming amount of work, but the good news is I have a script for that:

DECLARE @sql nvarchar(max);

WITH indexes AS (
    SELECT
         QUOTENAME(OBJECT_SCHEMA_NAME(uq.object_id)) AS [schema_name]
        ,QUOTENAME(OBJECT_NAME(uq.object_id)) AS table_name
        ,uq.name AS index_name
        ,cols.name AS cols
    FROM sys.indexes AS uq
    CROSS APPLY (
        SELECT STUFF((
            SELECT ',' + QUOTENAME(sc.name) AS [text()]
            FROM sys.index_columns AS uc
            INNER JOIN sys.columns AS sc
                ON  uc.column_id = sc.column_id
                AND uc.object_id = sc.object_id
            WHERE uc.object_id = uq.object_id
                AND uc.index_id = uq.index_id
                AND uc.is_included_column = 0
            FOR XML PATH('')
        ),1,1,SPACE(0))
    ) AS cols (name)
    WHERE is_unique = 0
        AND has_filter = 0
        AND is_hypothetical = 0
        AND type IN (1,2)
        AND object_id IN (
            SELECT object_id
            FROM sys.objects
            WHERE is_ms_shipped = 0
            AND type = 'U'
        )
)
-- Build a big statement to query index data
SELECT @sql = (
    SELECT
        'SELECT ''' + [schema_name] + ''' AS [schema_name],
            ''' + table_name + ''' AS table_name,
            ''' + index_name + ''' AS index_name,
            can_be_unique =
                CASE WHEN (
                    SELECT COUNT(*)
                    FROM (
                        SELECT ' + cols + ',COUNT(*) AS cnt
                        FROM ' + [schema_name] + '.' + [table_name] + '
                        GROUP BY ' + cols + '
                        HAVING COUNT(*) > 1
                    ) AS data
                    ) > 0
                THEN 0
                ELSE 1
                END;'
    FROM indexes
    FOR XML PATH(''), TYPE
).value('.','nvarchar(max)');

-- prepare a table to receive results
DECLARE @results TABLE (
    [schema_name] sysname,
    [table_name] sysname,
    [index_name] sysname,
    [can_be_unique] bit
)

-- execute the script and pipe the results
INSERT @results
EXEC(@sql)

-- show candidate unique indexes
SELECT *
FROM @results
WHERE can_be_unique = 1
ORDER BY [schema_name], [table_name], [index_name]

The script should complete quite quickly, since you have convenient indexes in place. However, I suggest that you run it against a non production copy of your database, as it will scan all non unique indexes found in the database.

The results will include all the indexes that don’t contain duplicate data. Whether you should make those indexes UNIQUE, only you can tell.

Some indexes may contain unique data unintentionally, but could definitely store duplicate data in the future. If you know your data domain, you will be able to spot the difference.

Non-unique indexes that should be unique


Defining the appropriate primary key and unique constraints is fundamental for a good database design.

unique

One thing that I often see overlooked is that all the indexes with a key that includes completely another UNIQUE index’s key should in turn be created as UNIQUE. You could argue that such an index has probably been created by mistake, but it’s not always the case.

If you want to check your database for indexes that can be safely made UNIQUE, you can use the following script:

SELECT OBJECT_SCHEMA_NAME(uq.object_id) AS [schema_name],
    OBJECT_NAME(uq.object_id) AS table_name,
    uq.name AS unique_index_name,
    nui.name AS non_unique_index_name
FROM sys.indexes AS uq
CROSS APPLY (
    SELECT name, object_id, index_id
    FROM sys.indexes AS nui
    WHERE nui.object_id = uq.object_id
        AND nui.index_id <> uq.index_id
        AND nui.is_unique = 0
        AND nui.has_filter = 0
        AND nui.is_hypothetical = 0
) AS nui
WHERE is_unique = 1
    AND has_filter = 0
    AND is_hypothetical = 0
    AND uq.object_id IN (
        SELECT object_id
        FROM sys.tables
    )
    AND NOT EXISTS (

        SELECT column_id
        FROM sys.index_columns AS uc
        WHERE uc.object_id = uq.object_id
            AND uc.index_id = uq.index_id
            AND uc.is_included_column = 0

        EXCEPT

        SELECT column_id
        FROM sys.index_columns AS nuic
        WHERE nuic.object_id = nui.object_id
            AND nuic.index_id = nui.index_id
            AND nuic.is_included_column = 0

    )
ORDER BY [schema_name], table_name, unique_index_name

You may wonder why you should bother making those indexes UNIQUE.

The answer is that constraints help the optimizer building better execution plans. Marking an index as UNIQUE tells the optimizer that one and just one row can be found for each key value: it’s a valuable information that can actually help estimating the correct cardinality.

Does the script return any rows? Make those indexes UNIQUE, you’ll thank me later.

Follow

Get every new post delivered to your Inbox.

Join 521 other followers