Category Archives: SQL Server

10 DBA recipes with XESmartTarget

Some time ago, I started a project called XESmartTarget. I find it super useful and you should probably know about it. It’s totally my fault if you’re not using it and I apologize for all the pain that it could have saved you, but it didn’t because I did not promote it enough.

Now I want to remedy my mistake with a 10 days series of blog posts on XESmartTarget, which will show you how useful it can be and how it can be used to accomplish your daily DBA tasks using Extended Events.

In this first post of the series, I will introduce XESmartTarget, show how it works and how to configure it. For the next 10 days I will publish a post to show you how to solve a specific problem using XESmartTarget. Let’s go!

What is XESmartTarget?

XESmartTarget is a small command line utility that can connect to an Extended Events session using the streaming API and can perform actions in response to the events captured by the session. The actions can vary from saving to a table in a database, writing to a CSV file, sending alerts and many more.

You can think of XESmartTarget as a processing engine for Extended Events, that you can run from the command line, without having to write a single line of code.

Where does it run?

XESmartTarget does not need to run on the server, it can run on any Windows machine that can connect to the target SQL Server instance. You can certainly run it on the server, but you don’t need to. XESmartTarget depends on Microsoft Visual C++ 2013 Redistributable: if you have the client utilities (SSMS) on your computer then you’re good to go, otherwise you can always download from Microsoft. It doesn’t run on Linux, I’m sorry.

How do I get it?

It’s open-source software: you can download it from GitHub and install it. You have a x64 setup kit and a x86 setup kit: make sure to pick the correct version for your operating system. Your browser may complain about it being unsafe, despite being signed with a code signing cert (sigh…). Don’t worry, go ahead and download it. Windows may also complain when running the .msi, so you will have to bypass SmartScreen as well. By default, the software gets installed to c:\Program Files\XESmartTarget

Why do I need it?

The built-in targets for Extended Events are great, but they don’t cover 100% of the spectrum. Some targets, like writing to a database table, would be extremely useful but are not there. There are multiple reasons, but mainly this is because of performance concerns: Extended Events have been designed to be fast and have a low performance impact on the server being monitored. Writing to a file or to a memory buffer is a fast operation, writing to a table or applying additional logic can end up slowing down the collection process and the SQL Server instance. However, Microsoft decided to give us the ability to post-process the events in the .xel files or process the events in near real-time using the streaming API for Extended Events. XESmartTarget uses the streaming API to receive the events from the server and the API itself has a built-in protection mechanism that prevents the server from being chocked by the client: if the client can’t keep up with the data rate from the server, it gets disconnected.

Having an API to process events means that we can write code to perform common actions on the events. I created 7 types of Response classes, that can receive data from the events and process them to let you perform actions that you can’t perform using the built-in targets:

CsvAppenderReponse – writes event data to a CSV file
EmailResponse – sends alerts via email based on event data
ExecuteTSQLResponse – runs T-SQL commands for each event captured
TableAppenderReponse – writes event data to a table in a SQL Server database
GroupedTableAppenderReponse – aggregates event data in memory and then merges with existing data in the target table
ReplayResponse – replays sql_batch_completed and rpc_completed events
GelfTcpResponse – writes events to a GrayLog server
TelegrafAppenderReponse – writes to an InfluxDB database using Telegraf

Will I have to write code?

If you really, really want to write code, you can do it: XESmartTarget is a .dll library that you can incorporate in your project. That’s what we did with dbatools. The license is super permissive, so go ahead and do it!

However, one of the strengths of XESmartTarget is that it requires absolutely no coding: all you have to do is configure XESmartTarget to do what you want. It is a command line tool and it accepts some parameters:

-F|--File <path to the .JSON configuration file>
        Uses the supplied .json file to configure the source of the events and the list of responses
-N|--NoLogo 
        Hides copyright banner at startup
-Q|--Quiet
        Suppresses output to console
-G|--GlobalVariables <variable1=value1 variableN=valueN>
        Replaces $variableN with valueN in configuration files
-L|--LogFile <path to log file>
        Writes the log to the file specified

As you can see, you can use a .json file to provide the configuration. Not everyone likes JSON for configuration files, but I find it easy to use and good enough for the purpose. A nice addition to the standard JSON format is the ability to add comments using the javascript notation.

A typical .json configuration file looks like this:

{
    "Target": {
        "ServerName": "server to monitor, where the session is running",
        "SessionName": "name of the session",
        "Responses": [
            {
                // Properties for Response1
            },
            {
                // Properties for ResponseN
            }
        ]
    }
}

Each Response subclass has a set of public properties that can be set in the configuration file. You can visit the documentation page for each Response type to discover what are the properties available to you and see an example json file.

For instance, TableAppenderResponse has some properties to set the target server/database/table for the events and you can set them like this:

{
    "Target": {
        "ServerName": "(local)\\SQLEXPRESS",
        "SessionName": "commands",
        "Responses": [
            {
                "__type": "TableAppenderResponse",
                "ServerName": "(local)\\SQLEXPRESS",
                "DatabaseName": "DBAStuff",
                "TableName": "queries",
                "AutoCreateTargetTable": true,
                "UploadIntervalSeconds": 10,
                "OutputColumns": [
                    "name", 
                    "collection_time", 
                    "client_app_name", 
                    "server_principal_name", 
                    "database_name",
                    "batch_text",
                    "statement"
                ],
                "Events": [
                    "rpc_completed",
                    "sql_batch_completed"
                ]        
            }
        ]
    }
}

Once you have your .json configuration file ready and your Extended Events session running, you can start XESmartTarget. It’s a command like tool, so it won’t show any GUI, but it will print messages to the console or to the log file to indicate that it’s doing some work.

As an example, you can save the above as c:\temp\capture_commands.json and run it with this command line:

“C:\program files\xesmarttarget\xesmarttarget.exe” --File c:\temp\capture_commands.json

You will see something similar to this:

If you look in your database, you will see some rows in the target table:

If you want to stop XESmartTarget, you can press CTRL+C.

What else can it do?

The sky is the limit. In the next posts of this series, I will demonstrate how to accomplish typical DBA tasks using XESmartTarget and you will learn how to use the appropriate Response type for every need. You will also see how to unleash the most advanced features of the configuration files, to filter events, group and summarize data, use fields and actions as parameters and more.

Keep an eye on the xesmarttarget tag on this blog!

Where are my recipes?

There you go:

Posted in SQL Server

1 Comment

Tags: XESmartTarget

Replaying Workloads to a different Database

Mar 31

Posted by spaghettidba

One of the features I was asked to implement for WorkloadTools is the ability to replay commands to a database name different from the one recorded in the source workload.

This is something that I had been planning to implement for a while and it totally makes sense. Usually, you have two identical environments for the workload capture and replay, both with the same databases. Sometimes it makes sense to have two different databases as the source and target for the workload, for some particular reasons: resources constraints, ease of testing and so on.

WorkloadTools now supports replaying commands to a different database, using the DatabaseMap property of the ReplayConsumer.

DatabaseMap is a Dictionary of strings, so it can be expressed in the .json file as a key/value pair, where the key is the original database and the value is the new target database for the command.

Here is an example:

{
    "__type": "ReplayConsumer",
    "ConnectionInfo": {
        "ServerName": "somedatabase.database.windows.net",
        "DatabaseName": "mario",
        "UserName": "itsame",
        "Password": "itsamario"
    },
    "DatabaseMap": {
        "Mario": "Luigi",
        "Peach": "Bowser"
    }
}

In this case, whenever a command from the database “Mario” is found, it is replayed against the database “Luigi”. Similarly, when the database “Peach” is found, the command gets replayed on “Bowser”.

Please note that DatabaseMap only changes the database context and does not substitute any reference to the original database name in the code. For instance, if you had something like EXEC Mario.sys.sp_executesql 'SELECT 1' ,this would not be intercepted by DatabaseMap and would remain unchanged in your code.

Happy benchmarking with WorkladTools!

Posted in SQL Server

9 Comments

Tags: Replay, WorkloadTools

Performing a real-time replay with WorkloadTools

Mar 3

Posted by spaghettidba

In a previous blog post, I showed you how to use WorkloadTools to replay a workload in two different scenarios. However, there is a third scenario that is worth exploring: the real-time replay.

Before we jump to how, I’d better spend some words on why a real-time replay is needed.

The main reason is the complexity involved in capturing and analyzing a workload for extended periods of time. Especially when performing migrations and upgrades, it is crucial to capture the entire business cycle, in order to cover all possible queries issued by the applications. All existing benchmarking tools require to capture the workload to a file before it can be analyzed and/or replayed, but this becomes increasingly complicated when the length of the business cycle grows.

The first complication has to do with the size of the trace files, that will have to be accommodated to a disk location, either local or remote. It is not reasonable to expect to capture a workload on a busy server for, let’s say two weeks, because the size of the trace files can easily get to a few hundred GBs in less than one hour.

The second complication has to do with the ability of the benchmarking tools to process the trace files: bigger and more numerous files increase enormously the chances of breaking the tools. If you ever captured a big workload to a set of trace files to feed it to ReadTrace, you probably know what I’m talking about and chances are that you witnessed a crash or two. If you tried it with DReplay, you now probably have an ample collection of exotic and unhelpful error messages.

In this context, being able to process the events as soon as they occur is a plus, so that storing them to a file of any type is not needed. This is exactly what WorkloadTools does with the real-time replay feature.

Performing a real-time replay

All the considerations made for replaying a saved workload also apply to this scenario. First of all, you will need to set up a target environment that contains an up to date copy of the production database. Log shipping is a great tool for this: you can restore a full backup from production and restore all logs until the two databases are in sync. Immediately after restoring the last log backup with recovery, you can start the capture and replay on the production server.

The .json file for this activity will probably look like this:

{
    "Controller": {

        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "SourceInstance"
            },
            "DatabaseFilter": "YourDatabase"
        },

        "Consumers":
        [
            {
                "__type": "ReplayConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "TargetInstance",
                    "DatabaseName": "YourDatabase"
               }
            },
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

On the target server, you can use SqlWorkload again to capture the performance data produced by the replay, using a .json file similar to the one used when analyzing the replay of a saved workload:

{
    "Controller": {
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "YourDatabase"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    // different schema from SqlWorkload 1
                    "SchemaName": "replay"                 
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

The overall architecture of the real-time replay looks like this:

It is crucial to start both instances of SqlWorkload at the same time, as the time dimension is always measured as the offset from the start of the analysis: starting both instances at the same time ensures that the same queries get executed around the same offset, so that you can compare apples to apples.

It is also extremely important to make sure that the target environment can keep up with the workload being replayed, otherwise the number of queries found in the same interval will never match between the two environments and the two workloads will start to diverge more and more. You can observe the data in WorkloadViewer while is gets written by the two analysis consumers and you can compare the number of batches per seconds to make sure that the target environment does not get overwhelmed by the workload. To refresh the data in WorkloadViewer, simply press F5.

The analysis and comparison of a real-time replay is not different from a deferred replay and you can use the same tools and apply the same considerations to both situations.

The interesting part of a real-time replay is the ability to perform the replay for extended periods of time, without the need to store the workload data to any type of intermediate format and without the need to analyze the workload data as a whole before you can proceed with the replay. The possibilities that this approach opens are really interesting and can be outside the usual scope of benchmarking tools.

As an example, you could decide to have a staging environment where you want to test the performance impact of new implementations directly against a production workload, gaining immediate insights regarding performance and catching runaway queries before they hit production. The traditional approach to this problem has always been based on test harnesses that simulate the critical parts of the workload, but building and maintaining these tools can be time consuming. With WorkloadTools you can measure the performance impact of your changes without having to build new tools and you can focus on what matters to you the most: your business.

Posted in SQL Server

11 Comments

Tags: Replay, SQLServer, WorkloadTools

Collecting Diagnostic data from multiple SQL Server instances with dbatools

Nov 24

Posted by spaghettidba

Keeping their SQL Server instances under control is a crucial part of the job of a DBA. SQL Server offers a wide variety of DMVs to query in order to check the health of the instance and establish a performance baseline.

My favourite DMV queries are the ones crafted and maintained by Glenn Berry: the SQL Server Diagnostic Queries. These queries already pack the right amount of information and can be used to take a snapshot of the instance’s health and performance.

Piping the results of these queries to a set of tables at regular intervals can be a good way to keep an eye on the instance. Automation in SQL Server rhymes with dbatools, so today I will show you how to automate the execution of the diagnostic queries and the storage of the results to a centralized database that you can use as a repository for your whole SQL Server estate.

The script

The script I’m using for this can be found on GitHub and you can download it, modify it and adapt it to your needs.

I won’t include it here, there is really no need for that, as you can find it on Github already. So, go, grab it from this address, save it and open it in your favourite code editor.

Done? Excellent! Let’s go through it together.

The script, explained

What I really love about PowerShell is how simple it is to filter, extend and manipulate tabular data using the pipeline, in a way that resonates a lot with the experience of T-SQL developers.

The main part of the script is the one that invokes all the diagnostic queries included in the list $queries. This is done by invoking the cmdlet Invoke-DbaDiagnosticQuery, that takes care of using a version of the diagnostic query that matches the version of the target server and selecting the data. As usual with dbatools, the -SqlInstance parameter accepts a list of servers, so you can pass in the list of all the SQL Servers in your infrastructure.

Invoke-DbaDiagnosticQuery -SqlInstance $SourceServer  -QueryName $queries

Sometimes the queries do not generate any data, so it is important to filter out the empty result sets.

Where-Object { $_.Result -ne $null }

In order to store the data collected at multiple servers and multiple points in time, you need to attach some additional columns to the result sets before writing them to the destination tables. This is a very simple task in PowerShell and it can be accomplished by using the Select-Object cmdlet.

Select-Object accepts a list of columns taken from the input object and can also add calculated columns using hashtables with label/expression pairs. The syntax is not the friendliest possible (in fact, I have to look it up every time I need it), but it gets the job done.

In this case, you need to add a column for the server name, one for the database name (only for database scoped queries) and one for the snapshot id. I decided to use a timestamp in the yyyyMMdd as the snapshot id. This is what the code to define the properties looks like:

        $TableName = $_.Name
        $DatabaseName = $_.Database
        $ServerName = $_.SqlInstance

        $snapshotProp = @{
            Label = "snapshot_id"
            Expression = {$SnapshotId}
        }
        $serverProp = @{
            Label = "Server Name"
            Expression = {$ServerName}
        }
        $databaseProp = @{
            Label = "Database Name"
            Expression = {$DatabaseName}
        }

Now that the hashtables that define the additional properties are ready, you need to decide whether the input dataset requires the new properties or not: if a property with the same name is already present you need to skip adding the new property.

Unfortunately, this has to be done in two different ways, because the dataset produced by the diagnostic queries could be returned as a collection of System.Data.Datarow objects or as a collection of PsCustomObject.

        if(-not (($_.Result.PSObject.Properties | Select-Object -Expand Name) -contains "Server Name")) {
            if(($_.Result | Get-Member -MemberType NoteProperty -Name "Server Name" | Measure-Object).Count -eq 0) {
                $expr += ' $serverProp, '
            }
        }

Now comes the interesting part of the script: the data has to get written to a destination table in a database. Dbatools has a cmdlet for that called Write-DbaDataTable.

Among the abilities of this nifty cmdlet, you can auto create the destination tables based on the data found in the input object, thus making your life much easier. In order to pass all the parameters to this cmdlet, I will use a splat, which improves readability quite a bit.

        $expr += '*'

        $param = @{
            SqlInstance     = $DestinationServer
            Database        = $DestinationDatabase
            Schema          = $DestinationSchema
            AutoCreateTable = $true
            Table           = $TableName
            InputObject     = Invoke-Expression $expr
        }
        Write-DbaDataTable @param

As you can see, you need to pass a destination server name, a database name, a schema name and a table name. As I already mentioned, Write-DbaDataTable will take care of creating the target table.

One thing to note is how the data is passed to the cmdlet: the InputObject is the result of an expression, based on the dynamic select list generated inside the ForeEach-Object cmdlet. This is very similar to building a dynamic query in T-SQL.

Conclusion

This script can be downloaded from GitHub and you can schedule it on a centralized management server in order to collect diagnostic data across your entire SQL Server estate.

Dbatools is the ultimate toolset for the dba: if you’re still using the GUI or overly complicated T-SQL scripts to administer and maintain your SQL Server estate, you’re missing out.

Dbatools is also a great opportunity for me to learn new tricks in Powershell, which is another great productivity tool that can’t be overlooked by DBAs. What are you waiting for? Go to dbatools.io now and start your journey: you won’t regret it.

Posted in PowerShell, SQL Server

3 Comments

Tags: Dbatools, Diagnostic Queries, PowerShell, SQLServer

Troubleshooting “Could not obtain information about Windows NT group/user”

Aug 16

Posted by spaghettidba

This is one of those typical blog posts that I write for my future self, the guy who keeps fixing the same stuff over and over and forgets what he did the next minute.

If you want to query information about a Windows user or group and its access path in SQLServer, you can use the extended stored procedure “xp_logininfo”. Here’s an example:

EXEC xp_logininfo 'MyDomain\SomeUser','all';

If everything is configured correctly, you will see a list of Windows accounts and the login(s) they are mapped to in SQLServer.

However, in some cases, the command fails with the infamous error message:

Could not obtain information about Windows NT group/user 'MyDomain\SomeUser', error code 0x5

This happens every time SQLServer tries to query information about the Windows user from Active Directory and receives an error.

Understanding where the error comes from can be tricky, but it can become easier to troubleshoot when you understand what happens behind the scenes and what are the most likely causes.

The user does not exist

This is very easy to check: does the user exist in Windows? Did you misspell the name?

You can check this from a cmd window, issuing this command:

net user SomeUser /domain

If you spelled the user correctly, the command will return information about it, like description, password settings, group membership and so on.

If the user name is incorrect and cannot be found in AD, you will get an error message

The user name cannot be found.

Easy peasy: check your spelling and check your AD.

The service account does not have enough privileges to query AD

As I said, SQL Server needs to query AD to retrieve information about the user: if its service account doesn’t have enough privileges, the query will fail.

The most likely cause for this is a misconfiguration of the service account settings in SQL Server. To be more specific, it is very likely that SQL Server is configured to run as a local user who has no access to Active Directory at all. This happens when SQL Server runs as a per-service SID or one of the built-in local accounts (local service or localsystem).

It is very easy to check what account is being used to run SQL Server: all you need to do is query sys.dm_server_services.

SELECT servicename, service_account 
FROM sys.dm_server_services;

If you see a local account being returned, go ahead and change your service account to a domain account, using the Configuration Manager.

If you still can’t query AD, maybe there is something wrong with the permissions on your AD objects. Try impersonating the SQL Server service account, open a cmd windows and issue the net user command.

> net user SomeUser /domain
The request will be processed at a domain controller for domain MyDomain

System error 5 has occurred.
Access is denied

If you get the “Access is denied” error message, you need to go to your AD and grant read permissions on that user/OU to the service account.

The service account does not have enough privileges to impersonate the windows user

This was a bit of a surprise for me. In order to retrieve information about the Windows user, SQL Server needs to impersonate it first and then will contact AD impersonating that user.

In order to impersonate a user, SQL Server needs to run under a service account user that has enough privileges to impersonate another user. This privilege is granted through a local policy.

Open the local security policy MMC (secpol.msc) and expand “Local Policies”, “User Rights Assignment”. Find the policy named “Impersonate a client after authentication” and double click it. You can verify whether the service account for SQL Server is granted this privilege, directly or through one of its groups.

Generally speaking, you don’t have to change this, because by default Windows grants this privilege to the “SERVICE” special identity. Any process running as a service is acting as the SERVICE special identity, including SQL Server. If you don’t find it listed here, add it back.

Windows permissions can get tricky at times. I hope that this post helps you (and me!) taming the beast.

Posted in SQL Server

4 Comments

Workload replay with WorkloadTools

Jun 20

Posted by spaghettidba

In my last post, I described how to capture a workload to a file, in order to run a replay against your target environment at a later time. Well, that later time has come and you’re ready to roll.

Of course, WorkloadTools has got you covered.

Before I show you how SqlWorkload can run the replay, reading all data from the workload file, I need to spend some time describing how to set up your target environment. It may look superfluous, but getting this part right is they key to a successful benchmarking activity and allows you to make sure that you are comparing apples with apples.

Choosing a methodology

First of all, you need to decide what you want to discover and make sure you understand entirely how performing the replay will help you in your investigation. There are mainly two types of methodologies:

Capture in production, analyze the workload, replay in test, analyze and compare the results
Capture in production, replay and analyze in test to establish a baseline, change something and replay again in test to obtain a second benchmark, then compare the results

The first method is useful when you are interested in comparing two different scenarios that cannot be easily reproduced in a test environment. As an example of this situation, imagine a production server that sits on a SAN storage with no more space available to create a test environment. Management wants to buy a new SAN and obtains a box to conduct a POC. In this case you can set up a test environment on the new SAN and compare the benchmarks on the two different storages.

This way of benchmarking is not always ideal, because it tries to compare a workload captured in production with a workload captured as the replay of the production one. The two are not the same: they depend on the filters applied while capturing in production and can be affected by the conditions under which the replay is being performed. For this reason, this methodology should be used only when it is possible to accept the approximation due to resource constraints.

The second method is more convoluted, but it is often able to deliver more accurate results. With this method, both benchmarks are obtained by measuring the replay of the original workload in a controlled test environment, so that the way the replay itself is performed does not affect the comparison.

This second method is easier to use in situations when the test environment can be reused to obtain the two scenarios to measure. Imagine that you want to observe the effect of changing compatibility level or some other database level options: in this case you would need to replay the original workload, change compatibility level, run a second replay and compare the performance in the two scenarios.

However, not even this method is perfect and you really need to make sure that you understand what you want to measure. If you are looking for plan regressions due to changing something at the instance, database or object level, you probably don’t care much about the relative performance of the hardware, because it is unlikely to affect query performance more than the plan regression itself.

Setting up the environment

Another thing that has to be taken into account is what data the replay will be performed against. In order to obtain meaningful performance information, the workload should ideally be performed against the same database, with the data in the same exact state in both environments.

Working on data in different states can produce misleading results. Imagine that the production workload contains thousands of commands that operate changes to a particular order in a database for an e-commerce website: if you tried to replay that workload against a copy of the database taken one week before the order was created, you would not produce the same amount of reads and writes found in the production workload. This means that the two databases have to be synchronized, by performing a point int time restore in the test environment up to the moment in which the capture of the production workload has started.

If you have to replay the workload multiple times, it is recommended to take a database snapshot before you start the replay, so that you can revert to that snapshot before repeating the replay.

Replaying a Workload from production

In this case, the workload that you capture in production will act as the baseline and will be compared to the workload captured in test when performing the replay. WorkloadTools lets you choose when to analyze the source workload: you can do that during the workload capture, you can do that while performing the replay or you can do that at a later moment. In the first case, you just need to add a second consumer to the listener and let it write the performance data to a schema in the analysis database.

{
    "Controller": {

        // This listener connects to the source instance
        // using Extended Events
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "SourceInstance"
            },
            "DatabaseFilter": "YourDatabase"
        },

        "Consumers":
        [
            // This consumer analyzes the workload and saves
            // the analysis to a database, in the schema “baseline”
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            },
            // This consumer writes the workload to a file
            {
                "__type": "WorkloadFileWriterConsumer",
                "OutputFile": "C:\\temp\\SqlWorkload.sqlite"
            }
        ]
    }
}

If you decide to analyze the workload later, you can start a file listener and feed the events to an analysis consumer. This setup can come handy when the analysis database is not reachable from the machine where the capture is being performed. This is an example of how to perform the analysis using a workload file as the source:

{
    "Controller": {

        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "C:\\temp\\SqlWorkload.sqlite",
            "SynchronizationMode": "false"
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

Another option is to analyze the source workload while performing the replay. Here is a sample json file for that:

{
    "Controller": {

        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "C:\\temp\\SqlWorkload.sqlite",
            // in this case you want to simulate the original query rate
            "SynchronizationMode": "true" 
        },

        "Consumers":
        [
            {
                "__type": "ReplayConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "TargetInstance",
                    "DatabaseName": "YourDatabase"
                }
            },
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

The replay workload has to be captured and analyzed as well, but you don’t need to record the queries to a workload file, because you are only after the performance data and you don’t need to replay the queries captured in this environment. All you need in this case is an instance of SqlWorkload with a listener connected to the test environment and a consumer to perform the analysis.

{
    "Controller": {

        // This listener points to the target instance
        // where the replay is being performed
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "DS3"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "replay"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

The analysis data can be saved to the same target database used for the production workload, but it is not a requirement. In case you decide to use the same database, the target schema needs to be different.

Recording multiple benchmarks for the same workload

In this case, the workload captured in production will not be used as the baseline, but the baseline will be obtained by replaying it. This means that you don’t need to analyze the source workload and all you need to do is record it to a file.

Pointing to the target environment, you will need an instance of SqlWorkload with a listener configured to read the workload file and replay the events using a replay consumer.

{
    "Controller": {

        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "C:\\temp\\SqlWorkload.sqlite",
            // in this case you want to simulate the original query rate
            "SynchronizationMode": "true" 
        },

        "Consumers":
        [
            {
                "__type": "ReplayConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "TargetInstance",
                    "DatabaseName": "YourDatabase"
                }
            }
        ]
    }
}

In the same environment, you will have another instance of SqlWorkload with a listener capturing the events being replayed and an analysis consumer to write the performance data to an analysis database.

{
    "Controller": {

        // This listener points to the target instance
        // where the replay is being performed
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "DS3"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "benchmark01"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

In order to obtain the second benchmark, you will now need to rewind the database to its initial state by performing a restore (using backups or a snapshot) and then you are ready to perform replay and capture once again. The .json files to use are almost identical to the ones that you used to obtain the first benchmark, except that you will need to specify a different schema to save the workload analysis.

{
    "Controller": {

        // This listener points to the target instance
        // where the replay is being performed
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "DS3"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "benchmark02"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

Comparing benchmarks using WorkloadViewer

Regardless of the method that you decided to use, at the end of the replays, you will have two distinct sets of tables containing the workload analysis data, sitting in different schemas in the same database or in completely different databases.

WorkloadViewer will let you visualize performance over time, as we have seen for a single workload analysis, but this time it will be able to show you data from both workloads, so that you can compare them.

The first tab will still contain the charts for total duration, cpu and number of batches per second, with two different series:

The grid in the second tab will now show performance data by query for both benchmarks, so that you can easily spot regressions sorting by the difference:

The third tab will show you the details for a single query, with the detail broken down by application, hostname, username and databasename. It will also contain a chart to show you the behavior of the query over time.

Conclusions

Even when replaying a workload, WorkloadTools keep the promise of low complexity and allow you to perform all the activities involved in your benchmarking scenarios.

In the next post I will show you how to leverage the most interesting feature of WorkloadTools: the real-time replay. Stay tuned!

Posted in SQL Server

25 Comments

Tags: Replay, WorkloadTools

Life behind a proxy server: downloading content for SSMS Help Viewer

May 9

Posted by spaghettidba

Life behind a proxy server can be problematic. Not every software out there is tested correctly for proxy interaction and oftentimes the experience for the corporate user is a bit frustrating.

I blogged about this before, regarding Powershell Gallery and regarding how to download and install content for the SSMS Help Viewer in SQL Server 2014 and SQL Server 2016.

When I tried to update my post for SQL Server 2017, I got stuck, because my “hack” stopped working with Help Viewer 2.3 and none of the things I tried was working. Bummer.

The problem:

Microsoft Help Viewer is unable to dowload the help content from the Microsoft website and if you click the error message on the bottom left of the status bar, it shows an error similar to this: “The web server has reported an error for https://services.mtps.microsoft.com/ServiceAPI/catalogs/Dev15/en-US: ProtocolIError/ProxyAuthenticationRequired“

How to fix it:

But there had to be a better way to do this and, I fiddled with it until I got it to work. Basically, all you have to do is instruct your applications to use a proxy server, with default authentication.

Discover what proxy server you are using: sometimes the proxy configuration only contains the URL of the autoconfiguration script (the pac file), but you don’t know what proxy is effectively in use. To display this information, open a cmd prompt and run this:

netsh winhttp show proxy

You should see an output similar to this:
Current WinHTTP proxy settings: Proxy Server(s) : http=proxy.mycompany.lan:8090 Bypass List : (none)

Add the proxy information to the following text fragment and copy it to the clipboard:

<system.net>
    <settings>
        <ipv6 enabled="true" />
    </settings>
    <defaultProxy enabled="true" useDefaultCredentials="true">
        <proxy bypassonlocal="True" proxyaddress="http://MyProxyServer:MyProxyPort"/>
    </defaultProxy>
</system.net>

Run your favourite text editor as Administrator and open the following files in the Help Viewer installation folder (on my computer it’s "C:\Program Files (x86)\Microsoft Help Viewer\v2.3"):

HlpCtntMgr.exe.config
HlpViewer.exe.config

Add the text fragment to both files, inside the <configuration> tag.

This is enough to let the the Help Viewer UI download and display the list of available content from the Microsoft website. Unfortunately, the actual transfer operation is performed by the BITS service, which has to be intructed to use a proxy server and complains with the following error message: “an error occurred while the bits service was transferring“.

This is done by changing a registry value. The key is the following:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\BITS
And the value is UseLmCompat, which has to be set to 0.

You can do this easily by saving the following lines to a text file, save it with the .reg extension and merge it to you registry by double clicking.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\BITS]
"UseLmCompat"=dword:00000000

Restart the BITS service (Background Intelligent Transfer Service).

Now you can go ahead and update you help library. Enjoy!

Posted in SQL Server

Generating a Jupyter Notebook for Glenn Berry’s Diagnostic Queries with PowerShell

Mar 20

Posted by spaghettidba

The March release of Azure Data Studio now supports Jupyter Notebooks with SQL kernels. This is a very interesting feature that opens new possibilities, especially for presentations and for troubleshooting scenarios.

For presentations, it is fairly obvious what the use case is: you can prepare notebooks to show in your presentations, with code and results combined in a convenient way. It helps when you have to establish a workflow in your demos that the attendees can repeat at home when they download the demos for your presentation.

For troubleshooting scenarios, the interesting feature is the ability to include results inside a Notebook file, so that you can create an empty Notebook, send it to your client and make them run the queries and send it back to you with the results populated. For this particular usage scenario, the first thing that came to my mind is running the diagnostic queries by Glenn Berry in a Notebook.

Obviously, I don’t want to create such a Notebook manually by adding all the code cells one by one. Fortunately, PowerShell is my friend and can do the heavy lifting for me.

Unsurprisingly, dbatools comes to the rescue: André Kamman added a cmdlet that downloads, parses and executes Glenn Berry’s diagnostic queries and added the cmdlet to dbatools. The part that can help me is not a public function available to the user, but I can still go to GitHub and download the internal function Invoke-DbaDiagnosticQueryScriptParser for my needs.
The function returns a list of queries that I can use to generate the Jupyter Notebook:

	#
	# Purpose: take the diagnostic queries from Glenn Berry
	# and generate a Jupyter Notebook to run in Azure Data Studio
	#
	# Example usage:
	# create-diagnostic-notebook.ps1 -diagnosticScriptPath "C:\Program Files\WindowsPowerShell\Modules\dbatools\0.9.777\bin\diagnosticquery\SQLServerDiagnosticQueries_2019_201901.sql" -notebookOutputPath "diagnostic-notebook.ipynb"
	#
	[CmdletBinding()]
	Param(
	[parameter(Mandatory)]
	[System.IO.FileInfo]$diagnosticScriptPath,
	[System.IO.FileInfo]$notebookOutputPath
	)

	#
	# Function taken from dbatools https://github.com/sqlcollaborative/dbatools/blob/development/internal/functions/Invoke-DbaDiagnosticQueryScriptParser.ps1
	# Parses the diagnostic script and breaks it into individual queries,
	# with text and description
	#
	function Invoke-DbaDiagnosticQueryScriptParser {
	[CmdletBinding(DefaultParameterSetName = "Default")]

	Param(
	[parameter(Mandatory)]
	[ValidateScript( {Test-Path $_})]
	[System.IO.FileInfo]$filename,
	[Switch]$ExcludeQueryTextColumn,
	[Switch]$ExcludePlanColumn,
	[Switch]$NoColumnParsing
	)

	$out = "Parsing file {0}" -f $filename
	write-verbose -Message $out

	$ParsedScript = @()
	[string]$scriptpart = ""

	$fullscript = Get-Content -Path $filename

	$start = $false
	$querynr = 0
	$DBSpecific = $false

	if ($ExcludeQueryTextColumn) {$QueryTextColumn = ""} else {$QueryTextColumn = ", t.[text] AS [Complete Query Text]"}
	if ($ExcludePlanColumn) {$PlanTextColumn = ""} else {$PlanTextColumn = ", qp.query_plan AS [Query Plan]"}

	foreach ($line in $fullscript) {
	if ($start -eq $false) {
	if (($line -match "You have the correct major version of SQL Server for this diagnostic information script") -or ($line.StartsWith("– Server level queries ***"))) {
	$start = $true
	}
	continue
	}

	if ($line.StartsWith("– Database specific queries *") -or ($line.StartsWith("– Switch to user database "))) {
	$DBSpecific = $true
	}

	if (!$NoColumnParsing) {
	if (($line -match "– uncomment out these columns if not copying results to Excel") -or ($line -match "– comment out this column if copying results to Excel")) {
	$line = $QueryTextColumn + $PlanTextColumn
	}
	}

	if ($line -match "-{2,}\s{1,}(.) $Query (\d)$ $(\D*)$") {
	$prev_querydescription = $Matches[1]
	$prev_querynr = $Matches[2]
	$prev_queryname = $Matches[3]

	if ($querynr -gt 0) {
	$properties = @{QueryNr = $querynr; QueryName = $queryname; DBSpecific = $DBSpecific; Description = $queryDescription; Text = $scriptpart}
	$newscript = New-Object -TypeName PSObject -Property $properties
	$ParsedScript += $newscript
	$scriptpart = ""
	}

	$querydescription = $prev_querydescription
	$querynr = $prev_querynr
	$queryname = $prev_queryname
	} else {
	if (!$line.startswith("–") -and ($line.trim() -ne "") -and ($null -ne $line) -and ($line -ne "\n")) {
	$scriptpart += $line + "`n"
	}
	}
	}

	$properties = @{QueryNr = $querynr; QueryName = $queryname; DBSpecific = $DBSpecific; Description = $queryDescription; Text = $scriptpart}
	$newscript = New-Object -TypeName PSObject -Property $properties
	$ParsedScript += $newscript
	$ParsedScript
	}


	$cells = @()

	Invoke-DbaDiagnosticQueryScriptParser $diagnosticScriptPath \|
	Where-Object { -not $_.DBSpecific } \|
	ForEach-Object {
	$cells += [pscustomobject]@{cell_type = "markdown"; source = "## $($_.QueryName)`n`n$($_.Description)" }
	$cells += [pscustomobject]@{cell_type = "code"; source = $_.Text }
	}

	$preamble = @"
	{
	"metadata": {
	"kernelspec": {
	"name": "SQL",
	"display_name": "SQL",
	"language": "sql"
	},
	"language_info": {
	"name": "sql",
	"version": ""
	}
	},
	"nbformat_minor": 2,
	"nbformat": 4,
	"cells":
	"@


	$preamble \| Out-File $notebookOutputPath
	$cells \| ConvertTo-Json \| Out-File -FilePath $notebookOutputPath -Append
	"}}" \| Out-File -FilePath $notebookOutputPath -Append

view raw

create-diagnostic-notebook.ps1

hosted with ❤ by GitHub

In order to use the script, you need to provide the path to the file that contains the diagnostic queries and the path where the new Jupyter Notebook should be generated. Dbatools includes the latest version of the diagnostic scripts already, so you just need to choose which flavor you want to use. You will find all available scripts in the module directory of dbatools:

$dbatoolsPath = Split-Path -parent (Get-Module -ListAvailable dbatools).path
$dbatoolsPath 
Get-ChildItem "$dbatoolsPath\bin\diagnosticquery" | Select-Object Name

The script above produces this output:

C:\Program Files\WindowsPowerShell\Modules\dbatools\0.9.777

Name
----
SQLServerDiagnosticQueries_2005_201901.sql
SQLServerDiagnosticQueries_2008R2_201901.sql
SQLServerDiagnosticQueries_2008_201901.sql
SQLServerDiagnosticQueries_2012_201901.sql
SQLServerDiagnosticQueries_2014_201901.sql
SQLServerDiagnosticQueries_2016SP2_201901.sql
SQLServerDiagnosticQueries_2016_201901.sql
SQLServerDiagnosticQueries_2017_201901.sql
SQLServerDiagnosticQueries_2019_201901.sql
SQLServerDiagnosticQueries_AzureSQLDatabase_201901.sql

Once you decide which file to use, you can pass it to the script:

create-diagnostic-notebook.ps1 `
    -diagnosticScriptPath "C:\Program Files\WindowsPowerShell\Modules\dbatools\0.9.777\bin\diagnosticquery\SQLServerDiagnosticQueries_2019_201901.sql" `
    -notebookOutputPath "diagnostic-notebook.ipynb"

What you obtain is a Jupyter Notebook that you can open in Azure Data Studio:

diagnostic-notebook

This is nice way to incorporate the code and results in a single file, that you can review offline later. This also allows you to send the empty notebook to a remote client, ask to run one or more queries and send back the notebook including the results for you to review.

Happy Notebooking!

Posted in PowerShell, SQL Server, Uncategorized

11 Comments

Tags: Azure Data Studio, Dbatools, Diagnostic Queries, Jupyter Notebooks, PowerShell

Capturing a Workload with WorkloadTools

Mar 12

Posted by spaghettidba

Last week I showed you how to use WorkloadTools to analyze a workload. As you have seen, using SqlWorkload to extract performance data from your workload is extremely easy and it just takes a few keystrokes in your favorite text editor to craft the perfect .json configuration file.

Today I’m going to show you how to capture a workload and save it to a file. If you’ve ever tried to perform this task with any other traditional benchmarking tool, like RML Utilities or Distributed Replay, your palms are probably sweaty already, but fear not: no complicated traces to set up, no hypertrophic scripts to create extended events captures. WorkloadTools makes it as easy as it can get.

Saving a workload to a file might look superfluous when you think that WorkloadTools has the ability to perform replays in real-time (I’ll discuss this feature in a future post), but there are situations when you want to replay the same exact workload multiple times, maybe changing something in the target database between each benchmark to see precisely what performance looks like under different conditions.

Another scenario where saving the workload to a file comes handy is when the source and destination servers are on different isolated networks and cannot communicate directly: in this case, you will save the workload to a file, copy it to the network where the target server resides and perform the replay there.

Choosing a file format

Before I show you how to do it, please allow me to explain what type of file we are going to write to and why I decided to use this format. If you’re not interested in this kind of geeky stuff, you can skip to the next section, no offense taken.

As already mentioned in my previous blog post, SqlWorkload uses a Listener object that relies on the underlying technology to capture the execution events that form the workload, so we have specialized Listeners for SqlTrace and for Extended Events. However, the Listener reads the events from the underlying source and then discards the source immediately.

When the workload is captured with an Extended Events session and the events are processed with the streaming API for Extended Events, it is pretty obvious how this works and why there is no .xel file at the end of the capture: the events are never written to a file and they are processed on the fly. In the same way, when the workload is captured with SqlTrace, SqlWorkload makes sure to keep the trace files on the server’s file system as shortly as possible, so it reads the events from the tail of the active rollover file and forwards them to the Consumers immediately: again, at the end of the capture, there will be no trace file available.

Long story short: using one of the traditional capture formats would require to instruct SQL Server to leave the files on the file system, which is not what we want and often something we cannot do.

Moreover, choosing one of the traditional formats (SqlTrace or Extended Events) would be problematic: Extended Events only works with SQLServer 2012 and newer and SqlTrace can only be read and written with an old 32 bit COM API.

What I decided to use as the intermediate format for SqlWorkload is a SqLite database, that contains a handful of tables that persist all the queries captured by the Listener. Yeah, sure: when you’re a database professional it feels very natural to use a database to store information (if all you have is a hammer, everything looks like a nail), but I can assure you that it is far from an unreasonable decision. Using a SqLite database allows you to inspect, modify and filter the workload data with extreme ease, using industry standard tools.

But enough talking about my decisional process: let’s go straight to the meat of this post.

Capturing a Workload

As you probably have guessed, capturing a workload is a matter of preparing a .json file with all the parameters and feeding it to SqlWorkload.

All you need is a Listener that captures the workload using SqlTrace or Extended Events and then a Consumer that takes care of writing to a SqLite file.

The .json file will be similar to this:

{
    "Controller": {

        "Listener":
        {
            // I'm using Extended Events here, but it could
            // be any type of listener
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "(local)"
            },
            "DatabaseFilter": "MyDatabase"
        },

        "Consumers":
        [
            {
                // The File Writer consumer takes care
                // of saving the workload to a file
                "__type": "WorkloadFileWriterConsumer",
                "OutputFile": "C:\\temp\\SqlWorkload.sqlite"
            }
        ]
    }
}

Once the configuration file is ready, you just need to pass its path as a command line argument to SqlWorkload and the Listener will start to forward all the events to the File Writer Consumer. You will see an output similar to this:

Capture

When you are done capturing the workload, you can stop SqlWorkload by pressing CTRL+C and you will be left with a SqLite file containing all your workload. You can actually open that file with a SqLite client of your choice and see what it contains. SqLite Browser is one of the possibilities, but there are countless options in Windows, Linux and Mac.

SqliteBrowser

Converting an existing SqlTrace or Extended Events file

You are probably wondering what to do if you already have a set of SqlTrace or Extended Events files that describe your workload perfectly: you probably don’t want to capture the workload again just for the sake of using it with SqlWorkload. Don’t worry: WorkloadTools also contains a command line tool to convert .trc and .xel files to the SqLite format understood by SqlWorkload.

All you have to do is run ConvertWorkload and pass the path to the source file and the path to the output file.

Conclusions

Again, it couldn’t get easier than that: SqlWorkload provides a Consumer to capture your workload to a file for later use, be it a replay or analysis. All you have to do is fill some info in a .json file and SqlWorkload takes care of the rest for you.

In the next post we will see how to replay a workload using WorkloadTools and we will see how one of the possible sources for the replay can be in fact one of these SqLite files produced by the File Writer Consumer.

Stay tuned!

Posted in SQL Server

28 Comments

Tags: Performance Tuning, WorkloadTools

Workload analysis with WorkloadTools

Feb 26

Posted by spaghettidba

Last week I introduced WorkloadTools and promised additional posts to describe what it can do in more detail. So, here we go: in this post I will show you how to use WorkloadTools to capture a workload and analyze it, in order to find things like the most resource-intensive queries or discover how the application behaves over time.

Capturing the workload

First of all, you need to capture the workload using SqlWorkload. As stated in the documentation, you need to provide a .json file that contains all the parameters needed to configure the Listener and the Consumers.

Configuring the Listener

You have two types of listeners to choose from in this case, and the choice will be determined mainly by the version of SQLServer that you’re working on: with older versions (up to SQLServer 2008 R2), you will have no choice but to use a SqlTraceWorkloadListener, because Extended Events do not provide the required events (sql_batch_completed in particular).
With newer versions (from SQLServer 2012 onwards), you can use an ExtendedEventsWorkloadListener.

Regardless of the type of Listener that you choose, there is not much information that you need to provide: the name of the server to connect to and the credentials are usually more than enough. The following json fragment shows a sample Listener configuration:

"Listener":
{
    // This line decides which type of Listener to use
    // Put the name of the class of your choice in this param
    "__type": "ExtendedEventsWorkloadListener",
    "ConnectionInfo":
    {
        // Server name and credentials
        "ServerName": "SQLDEMO\\SQL2014",
        "DatabaseName": "master",
        // Omit UserName and Password to use
        // Windows Authentication
        "UserName": "sa",
        "Password": "P4$$w0rd!"
    },
    // If you want to reduce the workload,
    // put some filters here
    "DatabaseFilter": "DS3"
}

Some things to note:

The .json file accepts comments, even if the JSON standard does not allow comments. The file gets minified before reading, so all comments get stripped away. Comments are useful for describing what the file does and why it configures things in that particular way.
Each Listener type accepts a different set of parameters. See the documentation for more details. Not all parameters are mandatory (optional parameters and their default value are described in square brackets).
You do not need to configure, start and stop a particular trace or Extended Events session: SqlWorkload takes care of everything for you. If you really wanted to, you could override the default trace/session script (but most of the times you don’t want to do that)
Filtering the source events is a good idea in several situations: it helps breaking down complex workloads by database, application, login or host.

Regardless of the type of Listener in use, the workload does not need to get completely written to a SqlTrace or Extended Events file before it can be used by the Listener, but the events are read from the Listener source incrementally, using different techniques, that depend on the Listener type.

For instance, for an ExtendedEventsWorkloadListener, the Listener uses the XE streaming API to read the events without having to save them to a file target first. For a SqlTraceWorkloadListener, the events need to go to a trace file first, but SqlWorkload makes sure to use small rollover files and always reads the tail of the last available file. The same happens with ExtendedEventsWorkloadListener when the FileTargetPath property is specified (this is mandatory for Azure SqlDatabase and must point to a blog storage URL).

Configuring the Consumers

All the events captured by the Listener are forwarded to all the Consumers registered on the Listener itself. You could have multiple Consumers for the same Listener, but in this example one Consumer is sufficient.

Your goal is to capture the workload on the source server and extract some information about the shape of the workload itself, looking for resource-intensive queries, long-running queries or queries that have a high number of executions. The correct Consumer type for this is the AnalysisConsumer.

This Consumer type can be configured in many aspects, but in its simplest form, it just needs to receive the name of the server, database and schema where the workload analysis tables will be created and populated. The schema name is important, because you may want to compare different benchmarks and the easiest way to do this is to save each benchmark to a different schema.

// Consumers are contained in a list
"Consumers":
[
    {
        // Choose the appropriate type of Consumer
        "__type": "AnalysisConsumer",
        // Fill in connection information
        "ConnectionInfo":
        {
            "ServerName": "SQLDEMO\\SQL2016",
            "DatabaseName": "SqlWorkload01",
            "SchemaName": "capture",
            "UserName": "sa",
            "Password": "P4$$w0rd!"
        },
        // This parameter controls how often the consumer
        // aggregates performance data and writes to the
        // analysis database
        "UploadIntervalSeconds": 60
    }
]

UploadIntervalSeconds is another important parameter, because it controls how often the workload analysis data is written to the destination database. AnalysisConsumer uses this parameter to time the creation of “intervals”: inside each interval, the performance data is aggregated by query, application, database, host name and login name.

The queries also get normalized during the analysis process: each constant or literal inside the query text gets replaced by a placeholder, which allows to aggregate together queries that differ only by the constant values. During the normalization process, SqlWorkload also calculates a 64 bit query hash, that can be used to retrieve the normalized query text and a sample query text captured before the normalization occurs.

Please note that, at the time of writing, the destination database does not get created automatically and needs to be created upfront. I am planning to remove this limitation, so keep an eye on this issue.

Putting it all together and starting SqlWorkload

Now that Listener and Consumer are configured, you just need to put it all together in a single .json file and pass it to SqlWorkload.

Here is the complete sqlworkload.json:

{
    "Controller": {

        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "SQLDEMO\\SQL2014",
                "DatabaseName": "master",
                "UserName": "sa",
                "Password": "P4$$w0rd!"
            },
            "DatabaseFilter": "DS3"
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo":
                {
                    "ServerName": "SQLDEMO\\SQL2016",
                    "DatabaseName": "SqlWorkload01",
                    "SchemaName": "capture",
                    "UserName": "sa",
                    "Password": "P4$$w0rd!"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

Now that file is ready, save it (e.g. analyze.json) and use it to run SqlWorkload:

"%programfiles%\workloadtools\sqlworkload.exe" --File "%cd%\analyze.json"

If everything is ok, SqlWorkload will start to print messages to the standard output of your cmd window:

Output

SqlWorkload also writes to a log file (SqlWorkload.log), that you will find in the working directory. You can control the log level and other logging parameters by editing NLog.config in the installation folder.

Every “UploadIntervalSeconds”, SqlWorkload will create a new interval, aggregate the performance data and upload everything to the analysis databases. You will see a message similar to this when the upload occurs:

output2

When you think that the capture has lasted long enough, you can stop SqlWorkload by pressing CTRL+C. If you close the command window, it will not clean up the objects created for the capture, like Extended Events sessions or SqlTrace captures.

Another option is to specify a value for the optional parameter TimeoutMinutes on the Listener configuration: when the timeout expires, the capture will stop automatically.

Visualizing performance data with WorkloadViewer

Now that the analysis database contains the performance data, you can use WorkloadViewer to visualize it and draw your conclusions.

WorkloadViewer is a GUI tool that reads performance data from the analysis database and gives a graphical representation using charts and grids. It accepts a number of command line arguments that allow to automate its behavior, but it can be also opened without specifying any arguments: in this case, WorkloadViewer will present a form to fill the missing information.

WorkloadViewer can be used to visualize information about a single benchmark (analysis mode) or two benchmarks (comparison mode). In this case, you just need to work with a single benchmark, so it is enough to enter the connection info on the left, including the schema name where the tables are. When using Windows Authentication, you can leave UserName and Password blank.

WorkloadViewerParam

WorkloadViewer shows data in three tabs:

Workload
Queries
Query Details

The “Workload” tab shows three charts for total CPU, total duration and batches per second for each interval. This is useful to see how the workload evolves over time and identify any correlation between the three metrics.

WorkloadViewerTab1

The charts support panning (click and drag with the right mouse button) and zooming (mouse wheel) as well as zooming to a particular area (click and drag with the middle mouse button). To reset zoom and pan, double click with the middle mouse button.

The “Queries” tab contains a grid that displays important metrics about the individual queries: avg/sum CPU, avg/sum Duration, avg/sum reads, total number of executions. These metrics are calculated over the entire workload.

WorkloadViewerTab2

You can sort by a particular column by clicking the header: this allows you to rank queries by one of the metrics and identify immediately the top queries by that metric.

Clicking on one of the rows displays the query text on the bottom panel. Double clicking one of the rows takes you to the “Query Details” tab.

WorkloadViewerTab3

This tab displays information about the individual query: it contains the query text and the overall performance metrics, broken down by application name, database name, host name and login name.

The chart displays the average CPU usage over time.

WorkloadViewer is still heavily being worked on and it is not showing all the data available in the analysis database, but I hope to include more information in the future. If you feel adventurous enough, you can point PowerBI to the analysis database and produce a dashboard designed on your specific needs.

Conclusions

As you can see, WorkloadTools can be a valuable help when you need to analyze a workload. SqlWorkload allows you to capture all the data that you need with extreme ease, without having to configure anything other than basic connection information.

WorkloadViewer allows you to visualize the data in a convenient way and can give you an immediate overview of your workload, ranking queries by the most important performance metrics.

But this is only one of the many tasks that you can accomplish with WorkloadTools: in the weeks to come I will continue to write on this topic and I will show you how to unleash the full potential of WorkloadTools.

Stay tuned!

Posted in SQL Server

41 Comments

Tags: Open Source, Performance Tuning, SQL Server, WorkloadTools

spaghettidba

Italian recipes for your daily database cooking

Category Archives: SQL Server

What is XESmartTarget?

Where does it run?

How do I get it?

Why do I need it?

Will I have to write code?

What else can it do?

Where are my recipes?

Performing a real-time replay

The script

The script, explained

Conclusion

The user does not exist

The service account does not have enough privileges to query AD

The service account does not have enough privileges to impersonate the windows user

Choosing a methodology

Setting up the environment

Replaying a Workload from production

Recording multiple benchmarks for the same workload

Comparing benchmarks using WorkloadViewer

Conclusions

Choosing a file format

Capturing a Workload

Converting an existing SqlTrace or Extended Events file

Conclusions

Capturing the workload

Configuring the Listener

Configuring the Consumers

Putting it all together and starting SqlWorkload

Visualizing performance data with WorkloadViewer

Conclusions

spaghettidba

Search:

Blogroll