Workload replay with WorkloadTools


In my last post, I described how to capture a workload to a file, in order to run a replay against your target environment at a later time. Well, that later time has come and you’re ready to roll.

Of course, WorkloadTools has got you covered.

Before I show you how SqlWorkload can run the replay, reading all data from the workload file, I need to spend some time describing how to set up your target environment. It may look superfluous, but getting this part right is they key to a successful benchmarking activity and allows you to make sure that you are comparing apples with apples.

Choosing a methodology

First of all, you need to decide what you want to discover and make sure you understand entirely how performing the replay will help you in your investigation. There are mainly two types of methodologies:

  1. Capture in production, analyze the workload, replay in test, analyze and compare the results
  2. Capture in production, replay and analyze in test to establish a baseline, change something and replay again in test to obtain a second benchmark, then compare the results

The first method is useful when you are interested in comparing two different scenarios that cannot be easily reproduced in a test environment. As an example of this situation, imagine a production server that sits on a SAN storage with no more space available to create a test environment. Management wants to buy a new SAN and obtains a box to conduct a POC. In this case you can set up a test environment on the new SAN and compare the benchmarks on the two different storages.

This way of benchmarking is not always ideal, because it tries to compare a workload captured in production with a workload captured as the replay of the production one. The two are not the same: they depend on the filters applied while capturing in production and can be affected by the conditions under which the replay is being performed. For this reason, this methodology should be used only when it is possible to accept the approximation due to resource constraints.

The second method is more convoluted, but it is often able to deliver more accurate results. With this method, both benchmarks are obtained by measuring the replay of the original workload in a controlled test environment, so that the way the replay itself is performed does not affect the comparison.

This second method is easier to use in situations when the test environment can be reused to obtain the two scenarios to measure. Imagine that you want to observe the effect of changing compatibility level or some other database level options: in this case you would need to replay the original workload, change compatibility level, run a second replay and compare the performance in the two scenarios.

However, not even this method is perfect and you really need to make sure that you understand what you want to measure. If you are looking for plan regressions due to changing something at the instance, database or object level, you probably don’t care much about the relative performance of the hardware, because it is unlikely to affect query performance more than the plan regression itself.

Setting up the environment

Another thing that has to be taken into account is what data the replay will be performed against. In order to obtain meaningful performance information, the workload should ideally be performed against the same database, with the data in the same exact state in both environments.

Working on data in different states can produce misleading results. Imagine that the production workload contains thousands of commands that operate changes to a particular order in a database for an e-commerce website: if you tried to replay that workload against a copy of the database taken one week before the order was created, you would not produce the same amount of reads and writes found in the production workload. This means that the two databases have to be synchronized, by performing a point int time restore in the test environment up to the moment in which the capture of the production workload has started.

If you have to replay the workload multiple times, it is recommended to take a database snapshot before you start the replay, so that you can revert to that snapshot before repeating the replay.

Replaying a Workload from production

In this case, the workload that you capture in production will act as the baseline and will be compared to the workload captured in test when performing the replay. WorkloadTools lets you choose when to analyze the source workload: you can do that during the workload capture, you can do that while performing the replay or you can do that at a later moment. In the first case, you just need to add a second consumer to the listener and let it write the performance data to a schema in the analysis database.

{
    "Controller": {

        // This listener connects to the source instance
        // using Extended Events
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "SourceInstance"
            },
            "DatabaseFilter": "YourDatabase"
        },

        "Consumers":
        [
            // This consumer analyzes the workload and saves
            // the analysis to a database, in the schema “baseline”
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            },
            // This consumer writes the workload to a file
            {
                "__type": "WorkloadFileWriterConsumer",
                "OutputFile": "C:\\temp\\SqlWorkload.sqlite"
            }
        ]
    }
}

If you decide to analyze the workload later, you can start a file listener and feed the events to an analysis consumer. This setup can come handy when the analysis database is not reachable from the machine where the capture is being performed. This is an example of how to perform the analysis using a workload file as the source:

{
    "Controller": {

        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "C:\\temp\\SqlWorkload.sqlite",
            "SynchronizationMode": "false"
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

Another option is to analyze the source workload while performing the replay. Here is a sample json file for that:

{
    "Controller": {

        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "C:\\temp\\SqlWorkload.sqlite",
            // in this case you want to simulate the original query rate
            "SynchronizationMode": "true" 
        },

        "Consumers":
        [
            {
                "__type": "ReplayConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "TargetInstance",
                    "DatabaseName": "YourDatabase"
                }
            },
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "baseline"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

The replay workload has to be captured and analyzed as well, but you don’t need to record the queries to a workload file, because you are only after the performance data and you don’t need to replay the queries captured in this environment. All you need in this case is an instance of SqlWorkload with a listener connected to the test environment and a consumer to perform the analysis.

{
    "Controller": {

        // This listener points to the target instance
        // where the replay is being performed
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "DS3"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "replay"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

The analysis data can be saved to the same target database used for the production workload, but it is not a requirement. In case you decide to use the same database, the target schema needs to be different.

Recording multiple benchmarks for the same workload

In this case, the workload captured in production will not be used as the baseline, but the baseline will be obtained by replaying it. This means that you don’t need to analyze the source workload and all you need to do is record it to a file.

Pointing to the target environment, you will need an instance of SqlWorkload with a listener configured to read the workload file and replay the events using a replay consumer.

{
    "Controller": {

        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "C:\\temp\\SqlWorkload.sqlite",
            // in this case you want to simulate the original query rate
            "SynchronizationMode": "true" 
        },

        "Consumers":
        [
            {
                "__type": "ReplayConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "TargetInstance",
                    "DatabaseName": "YourDatabase"
                }
            }
        ]
    }
}

In the same environment, you will have another instance of SqlWorkload with a listener capturing the events being replayed and an analysis consumer to write the performance data to an analysis database.

{
    "Controller": {

        // This listener points to the target instance
        // where the replay is being performed
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "DS3"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "benchmark01"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

In order to obtain the second benchmark, you will now need to rewind the database to its initial state by performing a restore (using backups or a snapshot) and then you are ready to perform replay and capture once again. The .json files to use are almost identical to the ones that you used to obtain the first benchmark, except that you will need to specify a different schema to save the workload analysis.

{
    "Controller": {

        // This listener points to the target instance
        // where the replay is being performed
        "Listener":
        {
            "__type": "ExtendedEventsWorkloadListener",
            "ConnectionInfo":
            {
                "ServerName": "TargetInstance",
                "DatabaseName": "DS3"
            }
        },

        "Consumers":
        [
            {
                "__type": "AnalysisConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "AnalysisInstance",
                    "DatabaseName": "SqlWorkload",
                    "SchemaName": "benchmark02"
                },
                "UploadIntervalSeconds": 60
            }
        ]
    }
}

Comparing benchmarks using WorkloadViewer

Regardless of the method that you decided to use, at the end of the replays, you will have two distinct sets of tables containing the workload analysis data, sitting in different schemas in the same database or in completely different databases.

WorkloadViewer will let you visualize performance over time, as we have seen for a single workload analysis, but this time it will be able to show you data from both workloads, so that you can compare them.

The first tab will still contain the charts for total duration, cpu and number of batches per second, with two different series:

The grid in the second tab will now show performance data by query for both benchmarks, so that you can easily spot regressions sorting by the difference:

The third tab will show you the details for a single query, with the detail broken down by application, hostname, username and databasename. It will also contain a chart to show you the behavior of the query over time.

Conclusions

Even when replaying a workload, WorkloadTools keep the promise of low complexity and allow you to perform all the activities involved in your benchmarking scenarios.

In the next post I will show you how to leverage the most interesting feature of WorkloadTools: the real-time replay. Stay tuned!

Posted on June 20, 2019, in SQL Server and tagged , . Bookmark the permalink. 24 Comments.

  1. Hi, I tryed following your example in “Recording multiple benchmarks for the same workload”,
    but the instance of SqlWorkLoad recording the Replay consumer stops with the following error after it is succsefully started:

    Info – SqlWorkload.Program : SqlWorkload, Version=1.2.14.0, Culture=neutral, PublicKeyToken=null 1.2.14
    Info – SqlWorkload.Program : Reading configuration from ‘c:\DBA\WorkloadTools\RecordReplay_Kasper_Bechmark01.json’
    Info – WorkloadTools.Listener.ExtendedEvents.ExtendedEventsWorkloadListener : Reading Extended Events session definition from C:\Program Files\WorkloadTools\Listener\ExtendedEvents\sqlworkload.sql
    Info – WorkloadTools.WorkloadController : Listener of type ExtendedEventsWorkloadListener initialized correctly. Waiting for events.
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : Error converting XE data from the stream: Unable to cast object of type ‘System.Int32’ to type ‘System.String’.
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : event type : Error
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : client_app_name : WorkloadTools-ReplayWorker
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : database_name : KASPER_CLONE_ted
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : client_hostname : SQL77
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : server_principal_name : SPK\teda
    Error – WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader : session_id : 278
    Error – WorkloadTools.Listener.ExtendedEvents.ExtendedEventsWorkloadListener : Unable to cast object of type ‘System.Int32’ to type ‘System.String’.
    Error – WorkloadTools.Listener.ExtendedEvents.ExtendedEventsWorkloadListener : at WorkloadTools.Listener.ExtendedEvents.StreamXEventDataReader.ReadEvents() in C:\GitHub\WorkloadTools\WorkloadTools\Listener\ExtendedEvents\StreamXEventDataReader.cs:line 198
    at WorkloadTools.Listener.ExtendedEvents.ExtendedEventsWorkloadListener.ReadEvents() in C:\GitHub\WorkloadTools\WorkloadTools\Listener\ExtendedEvents\ExtendedEventsWorkloadListener.cs:line 251
    Info – WorkloadTools.Listener.ExtendedEvents.ExtendedEventsWorkloadListener : Extended Events session [sqlworkload] stopped successfully.

    What shall i do, i have the same json files as you, just different values.?

  2. Hi Gianluca – How did you get on with this?

    I am also getting the same error. I am using your latest version 1.3.1 using ExtendedEventsWorkloadListener to replay and capture the baseline and benchmark to the same DB but different schemas.

    Error – WorkloadTools.Listener.ExtendedEvents.ExtendedEventsWorkloadListener : Unable to cast object of type ‘System.Int32’ to type ‘System.String

  3. Hi John, please try version 1.3.3, freshly released 🙂 Hope this works for you!

  4. I have a 50 MB .trc file that I captured earlier.

    When I use ConvertWorkLoad to convert it, nothing happens. I can see it reading disk and consuming some CPU and then nothing happens, there is no output. The file log only has this:

    Info – ConvertWorkload.Program : ConvertWorkload, Version=1.3.3.0, Culture=neutral, PublicKeyToken=null 1.3.3
    Info – WorkloadTools.Listener.Trace.TraceFileWrapper : SMO Version: Microsoft.SqlServer.ConnectionInfoExtended, Version=14.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91

    • What’s the command line you used? Can you share the input file?

      • Alas, the trace contains sensitive information. I made another trace and the same thing happens, it does read the input data but writes nothing.

        “C:\Program Files\WorkloadTools\ConvertWorkload.exe” -I Z:\tracefiles\prod-tracing\2019-08-12\2019-08-12-production.trc -O Z:\tracefiles\prod-tracing\2019-08-12-production.sqlite -L convert_log.txt

    • The same problem with version 1.3.4, nothing happen with conversion of a .trc file.

      • You’re right. Microsoft killed this feature when it started shipping trace DLLs with SMO and now the new versions of SMO do not support reading / writing trace files. I’m working on a fix, but it is going to take time, I’m sorry.

  5. Can you advise how to correctly replay a saved workload (sqlite file) and replay it against a target environment?

    When I follow the examples on how to set the json files for setting up a ReplayConsumer and AnalysisConsumer, the final WorkloadViewer always shows the exact same CPU and duration information?

    Here are my 2 json file settings for recording a baseline and replaying on a target server.

    {
    “Controller”: {

    “Listener”: {
    “__type”: “ExtendedEventsWorkloadListener”,
    “ConnectionInfo”: {
    “ServerName”: “BaseLineServer”,
    “UserName”: “BaseLineAdmin”,
    “Password”: “BaseLinePassword”
    },
    “DatabaseFilter”: “DATABASENAME”
    },

    “Consumers”: [
    // This consumer analyzes the workload and saves
    // the analysis to a database, in the schema “baseline”
    {
    “__type”: “AnalysisConsumer”,
    “ConnectionInfo”: {
    “ServerName”: “ANALYSISSERVER”,
    “DatabaseName”: “DATABASENAME”,
    “SchemaName”: “baseline”,
    “UserName”: “AnalysisServerAdmin”,
    “Password”: “AnalysisServerPassword”
    },
    “UploadIntervalSeconds”: 60
    },
    // This consumer writes the workload to a file
    {
    “__type”: “WorkloadFileWriterConsumer”,
    “OutputFile”: “C:\\temp\\SqlWorkload.sqlite”
    }
    ]
    }
    }

    Replay json

    {
    “Controller”: {

    “Listener”:
    {
    “__type”: “FileWorkloadListener”,
    “Source”: “C:\\temp\\SqlWorkload.sqlite”,
    “SynchronizationMode”: “true”
    },

    “Consumers”: [
    {
    “__type”: “ReplayConsumer”,
    “ConnectionInfo”:
    {
    “ServerName”: “TARGETSERVER”,
    “DatabaseName”: “DATABASENAME”,
    “UserName”: “TargetServerAdmin”,
    “Password”: “TargetServerPassword”
    }
    },
    {
    “__type”: “AnalysisConsumer”,
    “ConnectionInfo”: {
    “ServerName”: “ANALYSISSERVER”,
    “DatabaseName”: “DATABASENAME”,
    “SchemaName”: “replay”,
    “UserName”: “AnalysisServerAdmin”,
    “Password”: “AnalysisServerPassword”
    },
    “UploadIntervalSeconds”: 60
    }

    ]
    }
    }

    • The issue is how you set up the replay. You need to have two json files for the replay: one takes care of the replay itself and one takes care of analyzing the replayed events in the target server. Follow the examples and you should be fine. In case you get in trouble shoot me an email

      • OK, I think I understand. But for a replay, am I running those 2 json files as command line parameters for 1 instance of the SqlWorkload project like this the following?

        –File appsettingsReplayOne.json appsettingsReplayTwo.json

        Or am trying to run 2 different instances of the the SqlWorkLoad project at the same time each with it’s own –File command line json parameters?

      • You need to run to separate instances of sqlworkload, each with its own json file

      • Great! Thank you. I have it working now.

  6. When I run the replay JSON file, how I am going to determine the all events are applied on the target server. It shows starting but didn’t say complete. I have attached my JSON file for your reference.
    {
    “Controller”: {

    “Listener”:
    {
    “__type”: “FileWorkloadListener”,
    “Source”: “C:\\temp\\SqlWorkload.sqlite”,
    “SynchronizationMode”: “true”
    },

    “Consumers”:
    [
    {
    “__type”: “ReplayConsumer”,
    “ConnectionInfo”:
    {
    “ServerName”: “targetserver”,
    “DatabaseName”: “targetDB”,
    “UserName”: “user”,
    “Password”: “password”
    }
    }
    ]
    }
    }

    CMD output:

    C:\Program Files\WorkloadTools> “SqlWorkload.exe” –File “Replay.json”
    Info – SqlWorkload.Program : SqlWorkload, Version=1.5.14.0, Culture=neutral, PublicKeyToken=null 1.5.14
    Info – SqlWorkload.Program : Reading configuration from ‘C:\Program Files\WorkloadTools\Replay.json’
    Info – WorkloadTools.Listener.File.FileWorkloadListener : The source file contains 44 events.
    Info – WorkloadTools.WorkloadController : Listener of type FileWorkloadListener initialized correctly. Waiting for events.
    Info – WorkloadTools.Consumer.Replay.ReplayConsumer : Worker [502] – Starting
    Info – WorkloadTools.Consumer.Replay.ReplayConsumer : Worker [246] – Starting

    • I am sorry it didn’t work as expected. I would need to have more info in order to answer your question. Probably I would also need to have a look at what you have in your .sqlite file.

  7. Hi! I am trying to do a replay from a sql lite file that I captured. My application is very cursor-based.

    {
    “Controller”: {

    “Listener”:
    {
    “__type”: “FileWorkloadListener”,
    “Source”: “E:\\Workload_Tool\\Trace3\\SqlWorkload_01.sqlite”,
    “SynchronizationMode”: “true”
    },

    “Consumers”:
    [
    {
    “__type”: “ReplayConsumer”,
    “ConnectionInfo”:
    {
    “ServerName”: “TEST1”,
    “DatabaseName”: “test3”
    }
    },
    {
    “__type”: “AnalysisConsumer”,
    “ConnectionInfo”:
    {
    “ServerName”: “LOADTESTER”,
    “DatabaseName”: “DEMO”,
    “SchemaName”: “baseline”
    },
    “UploadIntervalSeconds”: 60
    }
    ]
    }
    }

    I am having these errors:

    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529452] – Error: sp_cursoroption: The cursor identifier value provided (b4dff33) is not valid.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529454] – Error: sp_cursoroption: The cursor identifier value provided (b4dff33) is not valid.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529456] – Error: sp_cursor: The cursor identifier value provided (b4dff33) is not valid.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529460] – Error: Could not find prepared statement with handle 1073858554.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529462] – Error: Could not find prepared statement with handle 1073858555.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529464] – Error: sp_cursoroption: The cursor identifier value provided (b4dff37) is not valid.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529466] – Error: sp_cursoroption: The cursor identifier value provided (b4dff37) is not valid.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529467] – Error: sp_cursor: The cursor identifier value provided (b4dff37) is not valid.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529472] – Error: Could not find prepared statement with handle 1073858554.
    Warn – WorkloadTools.Consumer.Replay.ReplayWorker : Worker [110] – Sequence[3529473] – Error: Could not find prepared statement with handle 1073858555.

    These errors are repeated throughout the replay. Is there any way to correct it?

    Thanks!
    JN

    • Hi Jose, WorkloadTools has issues with cursors. I ran into this problem multiple times and I have some work to do in this area. I’m afraid I don’t have much to offer right now. Sorry.

  1. Pingback: Replaying Workloads with WorkloadTools – Curated SQL

  2. Pingback: Last Week Reading (2019-06-30) | SQLPlayer

  3. Pingback: Performing a real-time replay with WorkloadTools | spaghettidba

Leave a comment