Blog Archives

Installing SQL Server 2016 Language Reference Help from disk


A couple of years ago I blogged about Installing the SQL Server 2014 Language Reference Help from disk.

With SQL Server 2016 things changed significantly: we have the new Help Viewer 2.2, which is shipped with the Management Studio setup kit.

However, despite all the changes in the way help works and is shipped, I am still unable to download and install help content from the web, so I resorted to using the same trick that I used for SQL Server 2014.

This time the URLs and the files to download are different:

    1. Point your browser to http://services.mtps.microsoft.com/ServiceAPI/catalogs/sql2016/en-us
    2. Download the Language Reference Files:

      If you’re a PowerShell person, these three lines will do:

Invoke-WebRequest -Uri "http://packages.mtps.microsoft.com/sql_2016_branding_en-us(1bd6e667-f159-ac3b-f0a5-964c04ca5a13).cab" `
	-OutFile "sql_2016_branding_en-us(1bd6e667-f159-ac3b-f0a5-964c04ca5a13).cab"
Invoke-WebRequest -Uri "http://packages.mtps.microsoft.com/v2sql_shared_language_reference_b4621_sql_130_en-us_1(83748a56-8810-751f-d453-00c5accc862d).cab" `
	-OutFile "v2sql_shared_language_reference_b4621_sql_130_en-us_1(83748a56-8810-751f-d453-00c5accc862d).cab"
Invoke-WebRequest -Uri "http://packages.mtps.microsoft.com/v2sql_shared_language_reference_b4621_sql_130_en-us_2(ccc38276-b744-93bd-9008-fe79b294ff41).cab" `
	-OutFile "v2sql_shared_language_reference_b4621_sql_130_en-us_2(ccc38276-b744-93bd-9008-fe79b294ff41).cab"
      1. Create a text file name HelpContentSetup.msha in the same folder as the .cab files and paste the following html:
<html xmlns="http://www.w3.org/1999/xhtml">
<head />
<body class="vendor-book">
    <div class="details">
        <span class="vendor">Microsoft</span>
        <span class="locale">en-us</span>
        <span class="product">SQL Server 2016</span>
        <span class="name">Microsoft SQL Server Language Reference</span>
    </div>
    <div class="package-list">
        <div class="package">
            <span class="name">SQL_2016_Branding_en-US</span>
            <span class="deployed">False</span>
            <a class="current-link" href="sql_2016_branding_en-us(1bd6e667-f159-ac3b-f0a5-964c04ca5a13).cab">sql_2016_branding_en-us(1bd6e667-f159-ac3b-f0a5-964c04ca5a13).cab</a>
        </div>
        <div class="package">
            <span class="name">v2SQL_Shared_Language_Reference_B4621_SQL_130_en-us_1</span>
            <span class="deployed">False</span>
            <a class="current-link" href="v2sql_shared_language_reference_b4621_sql_130_en-us_1(83748a56-8810-751f-d453-00c5accc862d).cab">v2sql_shared_language_reference_b4621_sql_130_en-us_1(83748a56-8810-751f-d453-00c5accc862d).cab</a>
        </div>
        <div class="package">
            <span class="name">v2SQL_Shared_Language_Reference_B4621_SQL_130_en-us_2</span>
            <span class="deployed">False</span>
            <a class="current-link" href="v2sql_shared_language_reference_b4621_sql_130_en-us_2(ccc38276-b744-93bd-9008-fe79b294ff41).cab">v2sql_shared_language_reference_b4621_sql_130_en-us_2(ccc38276-b744-93bd-9008-fe79b294ff41).cab</a>
        </div>
    </div>
</body>
</html>
      1. First, set the Help Viewer to open help from the local sources:
        1-viewer
      2. Then select the “Add and Remove Help Content” command:
        2-addremove
      3. This command opens the Help Viewer and asks for the content to add.
        Browse to the file you created in step 3.
        Click “Add” on all the items you wish to add to the library. In this case you will have only 1 item.
        When done, click the “Update” button.
        3-addcontent
      4. Unfortunately, during the installation phase of the library item, something crashes and the installation won’t proceed until you tell it to ignore or report the error.
        4-crash
      5. Despite the crash, everything works as expected and you will find the topic installed in your help library:
        5-installed

Here it is, nice and easy. Hope it works for you too.

Advertisements

My Feedback on my PASS Abstracts Feedback


Brent Ozar recently published the feedback he got on his abstract submissions for PASS Summit 2016 and, as he often does, started a trend which some others followed. I think the feedback is interesting and useful for speakers that plan to improve their own submissions for the years to come, so I decided I will publish my feedback as well.

I did not get selected and, obviously, I would be happier if I got selected instead. This doesn’t mean I have problems with the selection process or the reviewers that implemented it. Instead I’m grateful for the enormous amount of work they did for the community, so a huge thank you goes to the committee.

Every year I see people publicly complain about the selection process or getting mad about being rejected. Folks, get over it: organizers have the right to choose whichever sessions they find more suitable for the Summit. It’s their responsibility to put together the best possible agenda and that doesn’t necessarily include your session.

That said, this is my feedback:

Responding to Extended Events in near real-time

Not Accepted: Higher rated session selected.

Category: General Session (75 minutes)

Track: Enterprise Database Administration & Deployment

Topic: Performance Monitoring / Tuning / Extended Events / Waits

Level: 300

Abstract: Extended Events provide deep insight into SQL Server’s behavior and allow us to gather information not available by other means. However, compared to other technologies such as SQL Trace and Event Notifications, a way to react to the events as soon as they happen seems to be lacking.
In this session we will see how the Extended Events streaming API can be used to process events in a near real-time fashion. We will demonstrate how this technology enables new possibilities to solve real world problems, such as capturing and notifying deadlocks or blocking sessions.

Prerequisites: Extended Events basics, C# basics, Powershell basics. The amount of coding required is in line with the average DBA skills.

Goal1:  Introduce Extended Events and compare them with other monitoring technologies available in SQL Server

Goal2:  Introduce the Extended Events streaming API and demonstrate how it can be used to process the events as soon as they occur, without shredding XML.

Goal3:  Demonstrate how the Extended Events streaming API can be used to solve real world problems, such as building a monitoring and alerting solution for deadlocks and blocked sessions.

Comments:

Needs more specifics in abstract.

A : 4 The abstract digs a little deeper on what’s being offered by XEs, and what needs to be done to efficiently capture events in near real-time without bogging down the system

T : 5 The topic matches with the goals – and many people hesitate capturing large amounts of .xel files fearing shredding the xml. This session offers a new approach to capturing the same using streaming APIs, with C#/PowerShell

S : 5 Prereqs are a little steep, but given the outcome – justified, and will interest people.

Level looks ok for the prereqs and goals. Topic is ok. Abstract is good overall. Only issue is the second sentence reads a little oddly.

Abstract: Well written abstract with strong supportive goals. Topic: Great topic!            Subjective: This sounds like a very interesting session. Taking xEvents to the next level. I would attend this session. I think it will be a large draw!

Feedback on comments:

“Needs more specifics in abstract”. What kind of specifics is needed in the abstract? A clue would be helpful here.

A/T/S: what do these letters stand for? I suppose they mean Abstract, Topic, Subjective. What do the numbers stand for? A mark maybe? Again, no explanation.

“The second sentence reads a little oddly”. I really don’t understand grammar or language comments. It’s not a conference on the English language and non-native speakers are already disadvantaged enough without being constantly reminded that their language skills can’t compare to Shakespeare’s. Speakers names are stripped away from the sessions, so the reviewers may not be aware that the session comes from a foreigner. If the topic is good and fits what the organizers are looking for, minor grammar/language mistakes can be fixed. BTW, one of the reviewers told me that when multiple sessions are on a tie, language is the tie breaker. Questionable, but understandable.

The feedback itself is not very useful for improving this submission for next year.

SQL Server Infernals: Worst Practices in Action

Not Accepted: Other sessions selected based on building a balanced program for track coverage, speaker coverage, topic coverage, and session rating.

Category: General Session (75 minutes)

Track: Enterprise Database Administration & Deployment

Topic: Internals: Storage Engine / Query Engine / Compression

Level: 100

Abstract: Let’s face it: Best Practices are too many to really know them all and choose which ones should be applied first. Does your telephone ring all the time? Do your users ask for that “quick report” that instead takes ages and keeps changing every time you think it’s done?
Have you ever thought that in dire times avoiding Worst Practices could be a good starting point and you can leave fine tuning for a better future? If the answer is “yes”, then this session is for you: we will discover together how not to torture a SQL Server instance and we will see how to avoid making choices that in the long run could turn out to be not as smart as they looked initially.

Prerequisites: Basic database design skills, basic development concepts, basic database administration skills.

Goal1: Demonstrate how bad Database Design decisions (irresponsible denormalization, EAV, bad data types, wrong or missing primary keys) can haunt a project through its whole lifetime.

Goal2: Illustrate how bad development practices (SQL Injection, RBAR, poor or no testing) can hurt performance, put security at risk and pose serious threats to the success of our projects.

Goal3: Enumerate and explain the worst installation and administration practices for SQL Server instances, offering the correct alternatives. Topics covered: HW choice, OS policies, security, ongoing administration, monitoring and tuning.

Comments:

Abstract: compelling

Topic: I like goals

Subjective rating: interesting, but level too low

Abstract: The outline and details of this abstract are well written

Topic: This is a great topic

Subjective: I may attend this session

The outline does not seem to clearly describe the contents of the presentation. The level of detail seems low – more detail might help attendees decide on value of attending. The title may not attract the appropriate attendees – it seems a little vague.

Abstract: Abstract is a little muddled. Goals are clearly laid out. Goals seem to contain the entire gamut of SQL Server. Perhaps a more focused area of SQL Server would help.

Topic: Title is cute.

Subjective: Could be a fun session.

Feedback on comments:

“Level too low”. Level is intentionally low: it’s meant as an introductory session. What to avoid is exactly that: an introductory topic for beginners. Have you ever noticed that basic introductory sessions pack the room? Should we take it as a clue that attendees want 100-level sessions? Apparently not.

“The outline does not seem to clearly describe the contents”. True, I agree. Next time I submit this session I will include in the abstract more details about the contents. It seems that including those details in the goals is not enough.

“Goals seem to contain the entire gamut of SQL Server”. True. I don’t see how this is a bad thing for an introductory session.

This feedback is much more useful than the other ones.

The shape of your Workload: Benchmarking and Baselining

Not Accepted: Higher rated session selected.

Category: General Session (75 minutes)

Track: Enterprise Database Administration & Deployment

Topic: Performance Monitoring / Tuning / Extended Events / Waits

Level: 300

Abstract: The key to optimizing SQL Server performance is to establish a performance baseline and thoroughly analyze the workload on the server. Collecting a baseline is not enough: it is important to analyze the workload in order to intervene effectively exactly where performance issues lie.
In this session we will describe the techniques and tools to analyze SQL Server performance and we will introduce benchmarking techniques that allow us to rate our tuning efforts. We will also introduce some tools included in SQL Server 2014, such as the Distributed Replay, and several third-party applications that come at little or no cost, but provide the highest benefit.

Prerequisites: Basic SQL Server performance tuning techniques (DMVs, Performance Counters), basic monitoring techniques (SQL Trace, Extended Events).

Goal1: Introduce SQL Server performance analysis tools (DMVs, Performance Counters, Data Collector) and demonstrate how to use them to collect a baseline.

Goal2: Introduce workload analysis techniques, using SQL Server and third party tools, including cache analysis queries, RML Utilities and ClearTrace.

Goal3: Demonstrate benchmarking techniques which will allow us to compare performance before and after applying tuning measures, using RML utilities, Distributed Replay and Qure Analyzer.

Comments:

Needs more specifics in abstract.

A : 5 The abstract clearly spells out the need for baseling, and benchmarking, to identify performance workload.

T : 5 The topic maps well with the session goals – and makes use of the Dist Replay Controller/Client to replay the prod workload on lower environments, and making use of RML utils and ClearTrace

S : 5 The prereqs are not steep, and the session offers a lot to replay prod workload on lower envs.

Topic is a good and relevant choice that appeals to a distinct attendee pool. Level appears good for the prereqs and goals.

Abstract is ok. Some word choices make the sentences difficult to follow (ie. ‘intervene effectively exactly’).

Abstract: Well written abstract with strong, supportive goals. Enough information is provided from which an attendee can make an informed decision on whether or not to attend this session.

Topic: Excellent topic. Likely a 200 level session, not 300.

Subjective: Great session which should draw in attendees. This is a skill every database professional needs.

Feedback on comments:

“Needs more specifics in abstract” and “enough information is provided” in the same feedback. Not very helpful.

“Some word choices make the sentences difficult to follow”. I’ve already said what I think about language comments.

Reading the comments, you would think it made the cut, but it didn’t. What can I say? Competition is tough: there are lots of great speakers with lots of great sessions.

“Likely a 200 level session, not 300”. I disagree: I’ve never seen anyone less than experienced fiddling with RML Utilities or distributed replay.

General feedback

The comments I received this time are less helpful than the ones I got last year on the same exact sessions. Yep: I submitted the same sessions last year and I got very useful comments that I used to improve my submissions this year. Unfortunately, it will be hard to improve next year.

The level discussion is completely subjective. I was under the impression that here in Europe we tend to rate sessions lower than Americans do, so a 200 session in Europe could easily be a 300 session in USA. Maybe my impression was wrong. However, it’s interesting to see how the session that I proposed with level 100 was deemed to be underrated and the one that I proposed with level 300 was deemed to be overrated. Levels are part of the session: the speaker chooses which level to go for and I would have a really hard time as a reviewer arguing against that decision without having seen the session.

Regarding the reason for rejection, I am not completely sure it can be helpful for improving next year’s submissions. For instance, my session on Extended Events got “Higher rated session selected” as the reason for rejection, but looking at the schedule there is no other session on Extended Events (which is quite surprising) and no other session on monitoring (which is even more surprising and maybe a bit disappointing). This means that the higher rated session does not have to be on the same topic, which makes the other reason for rejection (Other sessions selected based on building a balanced program for track coverage, speaker coverage, topic coverage, and session rating) difficult to decipher.

Lessons learned

  1. You’re selling the abstract, not the session
    The reviewers are evaluating your abstract, not your session. All your efforts should be pushed towards perfecting your abstract. Craft it as your masterpiece. Start well in advance, don’t wait for the last minute.
  2. Take as many shots as you can
    Looking at the selection process from the outside it’s really hard to tell how it works, so submitting multiple abstracts can increase your chances of scoring a session. Maybe that session that you don’t like is exactly what the committee is looking for. Again, you’re selling only the abstract and if you get selected you have several months to improve the session.
  3. There’s a review service offered by PASS. Use it.
    There is no guarantee that it is going to be the best feedback you will get, but it’s some feedback at least.
  4. You have friends in your #sqlfamily: ask for advice.
    If you don’t trust the feedback that you get from the PASS review service, trust your friends. If you’re submitting for PASS Summit, chances are that you already have spoken at SQLSaturdays or other community events (if not, maybe you should think twice before submitting: PASS Summit is not for inexperienced speakers). In this case, you got in touch with the community leaders in your chapter, who probably keep scoring sessions at Summit year after year: ask them to review your sessions. I am sure that they will have precious suggestions for you.
  5. If you’re not an English native speaker, ask a native speaker to proofread it
    Sure, your English is great. Sure, the reviewers have something to say about native speakers’ English as well. Sure, language should not be the focus of the reviewers. However, see lesson learned #1.
  6. Speaker/Abstract separation is a joke
    If you read Brent’s feedback, one thing emerges quite clearly: in many cases, the reviewers know exactly whose abstract they are reviewing. This might be because they already have seen the session somewhere else or because the abstract contains some “distinctive features” (sp_AskXXX anyone?). This means that some speakers have an advantage over the average Joe, but rightly so: those speakers are awesome and well known for being awesome. I’m not saying that the “Top Guns” can submit their grocery list: what I’m saying is that they are not selling only their abstracts. Now that Brent has published his abstracts, I wonder what would happen next year if somebody submitted one of his sessions… 🙂

Bottom line

Being selected for speaking at the PASS Summit is damn hard. I hope that sharing my feedback will help you improving your submission next year. I also hope that you will find my advice useful, if you dare accepting advice from somebody who never managed to score a session himself.

Upcoming Speaking Engagements


The next few months will be crazy for me. I will be travelling a lot around Europe, speaking about SQL Server topics.

Here is where you will find me in the upcoming weeks:

SQLNexus is a new conference in Copenhagen and will be the official launch event in Denmark for SQL Server 2016.
I will speak about “Responding to Extended Events in near Real-Time”.

SQLBits doesn’t need to be introduced. It’s the biggest SQL Server event in Europe and it will also be the official SQL Server 2016 launch event in UK.
Again, I will speak about “Responding to Extended Events in near Real-Time”.

NTK is a famous conference in Slovenia, which has been running for many years now. This year it will be held in Portorosz, near the Italian border.
It’s going to be fun: I will speak about “Benchmarking, Baselining and Workload Analysis”.

INSIDE-SQL is also a new event, run this year for the first time. It’s an event focused on in-depth concepts, delivered in 75 minutes sessions, with long breaks between sessions, so that attendees can go to the next session without rushing and can hang out with the speakers and ask questions.
I will deliver two sessions:
“Responding to Extended Events in near Real-Time”
“Advanced T-SQL Techniques”

Free SQL Server training from the 40th floor of the Tour Montparnasse in Paris.What else can I say?
I will be speaking about “New Security Features in SQL Server 2016”

One of the coolest SQLSats around!
This time I’ll be speaking about “SQL Server Infernals”, AKA “how to torture your SQL Server instances to death with worst practices”. It’s going to be fun!

I hope to see you at these super cool events!

An annoying Bug in the Restore Dialog


Today, thanks to a customer, I discovered  one of those annoying little things that can really drive you nuts.

Basically, they were trying to restore a backup using the SSMS Restore Database window and they kept getting “No backupset selected to be restored” whenever a backup file was selected.

You just had to select a file for restore and click OK…

chooseFile

… to be met with an error message in the Restore Database window:

Restore_Error_Initial

The weird thing about it is that the backup file restored perfectly fine from a T-SQL script:

T-SQL

So it had to be something wrong with SSMS, but what?

Looking closer at the restore script, one thing stands out. Look at the file name:

T-SQL_highlight

Yep, there’s a leading whitespace in the file name. Could that be the source of the problem?

Let’s try again with the GUI in a slightly different way. This time I will copy the folder path from the “Backup File Location” textbox…

chooseFile_CopyFolder

… and paste it directly in the “File name” textbox, right before the file name:

chooseFile_CopyFolder2

This time everything works as expected.

Bottom line:

  1. This is a bug in SSMS: go on and vote this Connect item to have it fixed in a future version.
  2. Don’t use the GUI to restore a database.
  3. Don’t use the GUI at all.

Native Client Aliases don’t like Trailing Spaces


I usually don’t post small things like this, but today I fought with this obnoxious problem long enough to convince me that it deserved a shout out to the community.

When you create an alias in the SQL Server Configuration Manager, make sure that the alias name contains no spaces, otherwise it won’t work as you expect.

In my case, I had a Reporting Services instance with many data sources pointing to a SQL Server instance (let’s call it MyServer) and I wanted to redirect connections to a different instance using an alias. So I opened Configuration Manager and created an alias like this:

alias

To my great surprise, the alias didn’t work and it took quite some time to notice that something was wrong. The server “MyServer” was a perfectly working existing instance of SQL Server, so no connection was dropped: they all just happened to contact the wrong server. If spotting the problem was hard, fixing it turned out to be even harder: why on earth did the alias refuse to work, while all other aliases were working perfectly?

It turned out to be the simplest of all answers: a trailing space in the alias name.

Just looking at the alias properties it wasn’t too obvious that something was off, but clicking on the alias name field, the cursor appeared slightly more on the right than it should have been:

alias2

Bottom line is: always check your assumptions, because problems like to hide where you won’t search for them.

Counting the number of rows in a table


Don’t be fooled by the title of this post: while counting the number of rows in a table is a trivial task for you, it is not trivial at all for SQL Server.

Every time you run your COUNT(*) query, SQL Server has to scan an index or a heap to calculate that seemingly innocuous number and send it to your application. This means a lot of unnecessary reads and unnecessary blocking.

Jes Schultz Borland blogged about it some time ago and also Aaron Bertrand has a blog post on this subject. I will refrain from repeating here what they both said: go read their blogs to understand why COUNT(*) is a not a good tool for this task.

The alternative to COUNT(*) is reading the count from the table metadata, querying sys.partitions, something along these lines:

SELECT SUM(p.rows)
FROM sys.partitions p
WHERE p.object_id = OBJECT_ID('MyTable')
    AND p.index_id IN (0,1); -- heap or clustered index

Many variations of this query include JOINs to sys.tables, sys.schemas or sys.indexes, which are not strictly necessary in my opinion. However, the shortest version of the count is still quite verbose and error prone.

Fortunately, there’s a shorter version of this query that relies on the system function OBJECTPROPERTYEX:

SELECT OBJECTPROPERTYEX(OBJECT_ID('MyTable'),'cardinality')

Where does it read data from? STATISTICS IO doesn’t return anything for this query, so I had to set up an Extended Events session to capture lock_acquired events and find out the system tables read by this function:

sysalloc

Basically, it’s just sysallocunits and sysrowsets.

It’s nice, short and easy to remember. Enjoy.

Tracking Table Usage and Identifying Unused Objects


One of the things I hate the most about “old” databases is the fact that unused tables are kept forever, because nobody knows whether they’re used or not. Sometimes it’s really hard to tell. Some databases are accessed by a huge number of applications, reports, ETL tools and God knows what else. In these cases, deciding whether you should drop a table or not is a tough call.

Search your codebase

The easiest way to know if a table is used, is to search the codebase for occurences of the table name. However, finding the table name in the code does not mean it is used: there are code branches that in turn are not used. Modern languages and development tools can help you identify unused methods and objects, but it’s not always feasible or 100% reliable (binary dependencies, scripts, dynamic code are, off top of my head, some exceptions).
On the other hand, not finding the table name in the code does not mean you can delete it with no issues. The table could be used by dynamic code and the name retrieved from a configuration file or a table in the database.

In other cases, the source code is not available at all.

Index usage: clues, not evidence

Another way to approach the problem is by measuring the effects of the code execution against the database, in other words, by looking at the information stored by SQL Server whenever a table is accessed.

The DMV sys.dm_db_index_usage_stats records information on all seeks, scans, lookups and updates against indexes and is a very good place to start the investigation. If something is writing to the table or reading from it, you will see the numbers go up and the dates moving forward.
Great, so we’re done and this post is over? Not exactly: there are some more facts to take into account.

First of all, the DMV gets cleared every time SQL Server is restarted, so the accuracy of the data returned is heavily dependant on how long the instance has been running. Moreover, some actions (rebuilding the index, to name one) reset the index usage stats and if you want to rely on sensible stats, your only option is to persist the data in some place regularly.

To achieve this goal, I coded this simple stored procedure that reads the stats from the DMV and stores it in a table, updating the read and write counts for each subsequent execution.

-- You have a TOOLS database, right?
-- If not, create one, you will thank me later
USE TOOLS;
GO

-- A place for everything, everything in its place
IF SCHEMA_ID('meta') IS NULL
	EXEC('CREATE SCHEMA meta;')
GO

-- This table will hold index usage summarized at table level
CREATE TABLE meta.index_usage(
       db_name sysname,
       schema_name sysname,
       object_name sysname,
       read_count bigint,
       last_read datetime,
       write_count bigint,
       last_write datetime,
       PRIMARY KEY CLUSTERED (db_name, schema_name, object_name)
)

-- This table will hold the last snapshot taken
-- It will be used to capture the snapshot and
-- merge it with the destination table
CREATE TABLE meta.index_usage_last_snapshot(
       db_name sysname,
       schema_name sysname,
       object_name sysname,
       read_count bigint,
       last_read datetime,
       write_count bigint,
       last_write datetime,
       PRIMARY KEY CLUSTERED (db_name, schema_name, object_name)
)
GO

-- This procedure captures index usage stats
-- and merges the stats with the ones already captured
CREATE PROCEDURE meta.record_index_usage
AS
BEGIN

SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#stats') IS NOT NULL
       DROP TABLE #stats;

-- We will use the index stats multiple times, so parking
-- them in a temp table is convenient
CREATE TABLE #stats(
       db_name sysname,
       schema_name sysname,
       object_name sysname,
       read_count bigint,
       last_read datetime,
       write_count bigint,
       last_write datetime,
       PRIMARY KEY CLUSTERED (db_name, schema_name, object_name)
);

-- Reads index usage stats and aggregates stats at table level
-- Aggregated data is saved in the temporary table
WITH index_stats AS (
       SELECT DB_NAME(database_id) AS db_name,
              OBJECT_SCHEMA_NAME(object_id,database_id) AS schema_name,
              OBJECT_NAME(object_id, database_id) AS object_name,
              user_seeks + user_scans + user_lookups AS read_count,
              user_updates AS write_count,
              last_read = (
                  SELECT MAX(value)
                  FROM (
                      VALUES(last_user_seek),(last_user_scan),(last_user_lookup)
                  ) AS v(value)
              ),
              last_write = last_user_update
       FROM sys.dm_db_index_usage_stats
       WHERE DB_NAME(database_id) NOT IN ('master','model','tempdb','msdb')
)
INSERT INTO #stats
SELECT db_name,
       schema_name,
       object_name,
       SUM(read_count) AS read_count,
       MAX(last_read) AS last_read,
       SUM(write_count) AS write_count,
       MAX(last_write) AS last_write
FROM index_stats
GROUP BY db_name,
       schema_name,
       object_name;

DECLARE @last_date_in_snapshot datetime;
DECLARE @sqlserver_start_date datetime;

-- reads maximum read/write date from the data already saved in the last snapshot table
SELECT @last_date_in_snapshot = MAX(CASE WHEN last_read > last_write THEN last_read ELSE last_write END)
FROM meta.index_usage_last_snapshot;

-- reads SQL Server start time
SELECT @sqlserver_start_date = sqlserver_start_time FROM sys.dm_os_sys_info;

-- handle restarted server: last snapshot is before server start time
IF (@last_date_in_snapshot) < (@sqlserver_start_date)
       TRUNCATE TABLE meta.index_usage_last_snapshot;

-- handle snapshot table empty
IF NOT EXISTS(SELECT * FROM meta.index_usage_last_snapshot)
       INSERT INTO meta.index_usage_last_snapshot
       SELECT * FROM #stats;

-- merges data in the target table with the new collected data
WITH offset_stats AS (
       SELECT newstats.db_name,
              newstats.schema_name,
              newstats.object_name,
              -- if new < old, the stats have been reset
              newstats.read_count -
                  CASE
                      WHEN newstats.read_count < ISNULL(oldstats.read_count,0) THEN 0
                      ELSE ISNULL(oldstats.read_count,0)
                  END
                  AS read_count,
              newstats.last_read,
              -- if new < old, the stats have been reset
              newstats.write_count -
                  CASE
                      WHEN newstats.write_count < ISNULL(oldstats.write_count,0) THEN 0
                      ELSE ISNULL(oldstats.write_count,0)
                  END
              AS write_count,
              newstats.last_write
       FROM #stats AS newstats
       LEFT JOIN meta.index_usage_last_snapshot AS oldstats
              ON newstats.db_name = oldstats.db_name
              AND newstats.schema_name = oldstats.schema_name
              AND newstats.object_name = oldstats.object_name
)
MERGE INTO meta.index_usage AS dest
USING offset_stats AS src
       ON src.db_name = dest.db_name
       AND src.schema_name = dest.schema_name
       AND src.object_name = dest.object_name
WHEN MATCHED THEN
       UPDATE SET read_count += src.read_count,
              last_read = src.last_read,
              write_count += src.write_count,
              last_write = src.last_write
WHEN NOT MATCHED BY TARGET THEN
       INSERT VALUES (
           src.db_name,
           src.schema_name,
           src.object_name,
           src.read_count,
           src.last_read,
           src.write_count,
           src.last_write
       );

-- empty the last snapshot
TRUNCATE TABLE meta.index_usage_last_snapshot;

-- replace it with the new collected data
INSERT INTO meta.index_usage_last_snapshot
SELECT * FROM #stats;

END

GO

You can schedule the execution of the stored procedure every hour or so and you will see data flow in the meta.index_usage_last_snapshot table. Last read/write date will be updated and the read/write counts will be incremented by comparing saved counts with the captured ones: if I had 1000 reads in the previous snapshot and I capture 1200 reads, the total reads column must be incremented by 200.

So, if I don’t find my table in this list after monitoring for some days, is it safe to assume that it can be deleted? Probably yes. More on that later.
What these stats don’t tell you is what to do when you do find the table in the list. It would be reasonable to think that the table is used, but there are several reasons why it may have ended up being read or written and not all of them will be ascribable to an application.
For instance, if a table is merge replicated, the replication agents will access it and read counts will go up. What the index usage stats tell us is that something is using a table but it says nothing about the nature of that something. If you want to find out more, you need to set up some kind of monitoring that records additional information about where reads and writes come from.

Extended Events to the rescue

For this purpose, an audit is probably too verbose, because it will record an entry for each access to each table being audited. The audit file will grow very quickly if not limited to a few objects to investigate. Moreover, audits have to be set up for each table and kept running for a reasonable time before drawing conclusions.

Audits are based on Extended Events: is there another way to do the same thing Audits do using extended events directly? Of course there is, but it’s trickier than you would expect.

First of all, the Extended Events used by the audit feature are not available directly. You’ve been hearing several times that audits use Extended Events but nobody ever told you which events they are using: the reason is that those events are not usable in a custom Extended Events session (the SecAudit package is marked as “private”). As a consequence, if you want to audit table access, you will have to use some other kind of event.

In order to find out which Extended Events provide information at the object level, we can query the sys.dm_xe_object_columns DMV:

SELECT object_name, description
FROM sys.dm_xe_object_columns
WHERE name = 'object_id'

As you will see, the only event that could help in this case is the lock_acquired event. Whenever a table is accessed, a lock will be taken and capturing those locks is a quick and easy way to discover activity on the tables.

Here is the definition of a session to capture locking information:

CREATE EVENT SESSION [audit_table_usage] ON SERVER
ADD EVENT sqlserver.lock_acquired (
    SET collect_database_name = (0)
        ,collect_resource_description = (1)
    ACTION(sqlserver.client_app_name, sqlserver.is_system, sqlserver.server_principal_name)
    WHERE (
        [package0].[equal_boolean]([sqlserver].[is_system], (0)) -- user SPID
        AND [package0].[equal_uint64]([resource_type], (5)) -- OBJECT
        AND [package0].[not_equal_uint64]([database_id], (32767))  -- resourcedb
        AND [package0].[greater_than_uint64]([database_id], (4)) -- user database
        AND [package0].[greater_than_equal_int64]([object_id], (245575913)) -- user object
        AND (
               [mode] = (1) -- SCH-S
            OR [mode] = (6) -- IS
            OR [mode] = (8) -- IX
            OR [mode] = (3) -- S
            OR [mode] = (5) -- X
        )
    )
)
WITH (
     MAX_MEMORY = 20480 KB
    ,EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS
    ,MAX_DISPATCH_LATENCY = 30 SECONDS
    ,MAX_EVENT_SIZE = 0 KB
    ,MEMORY_PARTITION_MODE = NONE
    ,TRACK_CAUSALITY = OFF
    ,STARTUP_STATE = OFF
);
GO

If you start this session and monitor the data captured with the “Watch live data” window, you will soon notice that a huge number of events gets captured, which means that the output will also be huge and analyzing it can become a daunting task. Saving this data to a file target is not the way to go here: is there another way?

The main point here is that there is no need for the individual events, but the interesting information is the aggregated data from those events. Ideally, you would need to group by object_id and get the maximum read or write date. If possible, counting reads and writes by object_id would be great. At a first look, it seems like a good fit for the histogram target, however you will soon discover that the histogram target can “group” on a single column, which is not what you want. Object_ids are not unique and you can have the same object_id in different databases. Moreover, the histogram target can only count events and is not suitable for other types of aggregation, such as MAX.

Streaming the events with Powershell

Fortunately, when something is not available natively, you can code your own implementation. In this case, you can use the Extended Events streaming API to attach to the session and evaluate the events as soon as they show up in the stream.

In this example, I will show you how to capture the client application name along with the database and object id and group events on these 3 fields. If you are interested in additional fields (such as host name or login name), you will need to group by those fields as well.

In the same way, if you want to aggregate additional fields, you will have to implement your own logic. In this example, I am computing the MAX aggregate for the read and write events, without computing the COUNT. The reason is that it’s not easy to predict whether the count will be accurate or not, because different kind of locks will be taken in different situations (under snapshot isolation no shared locks are taken, so you have to rely on SCH-S locks; when no dirty pages are present SQL Server takes IS locks and not S locks…).

Before going to the Powershell code, you will need two tables to store the information:

USE TOOLS;
GO
CREATE TABLE meta.table_usage_xe(
       db_name sysname,
       schema_name sysname,
       object_name sysname,
       client_app_name nvarchar(128),
       last_read datetime,
       last_write datetime,
       PRIMARY KEY(db_name, schema_name, object_name, client_app_name)
);

CREATE TABLE meta.table_usage_xe_last_snapshot(
       database_id int,
       object_id int,
       client_app_name nvarchar(128),
       last_read datetime,
       last_write datetime,
       PRIMARY KEY(database_id, object_id, client_app_name)
);

Now that you have a nice place to store the aggregated information, you can start this script to capture the events and persist them.

sl $Env:Temp

#For SQL Server 2014:
Add-Type -Path 'C:\Program Files\Microsoft SQL Server\120\Shared\Microsoft.SqlServer.XE.Core.dll'
Add-Type -Path 'C:\Program Files\Microsoft SQL Server\120\Shared\Microsoft.SqlServer.XEvent.Linq.dll'
#For SQL Server 2012:
#Add-Type -Path 'C:\Program Files\Microsoft SQL Server\110\Shared\Microsoft.SqlServer.XEvent.Linq.dll'

$connectionString = 'Data Source = YourServerNameGoesHere; Initial Catalog = master; Integrated Security = SSPI'

$SessionName = "audit_table_usage"

# loads all object ids for table objects and their database id
# table object_ids will be saved in order to rule out whether
# the locked object is a table or something else.
$commandText = "
DECLARE @results TABLE (
       object_id int,
       database_id int
);

DECLARE @sql nvarchar(max);

SET @sql = '
       SELECT object_id, db_id()
       FROM sys.tables t
       WHERE is_ms_shipped = 0
';

DECLARE @statement nvarchar(max);

SET @statement = (
       SELECT 'EXEC ' + QUOTENAME(name) + '.sys.sp_executesql @sql; '
       FROM sys.databases d
       WHERE name NOT IN ('master','model','msdb','tempdb')
       FOR XML PATH(''), TYPE
).value('.','nvarchar(max)');

INSERT @results
EXEC sp_executesql @statement, N'@sql nvarchar(max)', @sql;

SELECT *
FROM @results
"

$objCache = @{}

$conn = New-Object -TypeName System.Data.SqlClient.SqlConnection -ArgumentList $connectionString
$cmd = New-Object -TypeName System.Data.SqlClient.SqlCommand
$cmd.CommandText = $commandText
$cmd.Connection = $conn
$conn.Open()
$conn.ChangeDatabase("master")
$rdr = $cmd.ExecuteReader()

# load table object_ids and store them in a hashtable

while ($rdr.Read()) {
	$objId = $rdr.GetInt32(0)
	$dbId = $rdr.GetInt32(1)
	if(-not $objCache.ContainsKey($objId)){
		$objCache.add($objId,@($dbId))
	}
	else {
		$arr = $objCache.Get_Item($objId)
		$arr += $dbId
		$objCache.set_Item($objId, $arr)
	}
}

$conn.Close()

# create a DataTable to hold lock information in memory
$queue = New-Object -TypeName System.Data.DataTable
$queue.TableName = $SessionName

[Void]$queue.Columns.Add("database_id",[Int32])
[Void]$queue.Columns.Add("object_id",[Int32])
[Void]$queue.Columns.Add("client_app_name",[String])
[Void]$queue.Columns.Add("last_read",[DateTime])
[Void]$queue.Columns.Add("last_write",[DateTime])

# create a DataView to perform searches in the DataTable
$dview = New-Object -TypeName System.Data.DataView
$dview.Table = $queue
$dview.Sort = "database_id, client_app_name, object_id"

$last_dump = [DateTime]::Now

# connect to the Extended Events session
[Microsoft.SqlServer.XEvent.Linq.QueryableXEventData] $events = New-Object -TypeName Microsoft.SqlServer.XEvent.Linq.QueryableXEventData `
    -ArgumentList @($connectionString, $SessionName, [Microsoft.SqlServer.XEvent.Linq.EventStreamSourceOptions]::EventStream, [Microsoft.SqlServer.XEvent.Linq.EventStreamCacheOptions]::DoNotCache)

$events | % {
    $currentEvent = $_

	$database_id = $currentEvent.Fields["database_id"].Value
	$client_app_name = $currentEvent.Actions["client_app_name"].Value
	if($client_app_name -eq $null) { $client_app_name = [string]::Empty }
	$object_id = $currentEvent.Fields["object_id"].Value
	$mode = $currentEvent.Fields["mode"].Value

	# search the object id in the object cache
	# if found (and database id matches) ==> table
	# otherwise ==> some other kind of object (not interesting)
	if($objCache.ContainsKey($object_id) -and $objCache.Get_Item($object_id) -contains $database_id)
	{
		# search the DataTable by database_id, client app name and object_id
		$found_rows = $dview.FindRows(@($database_id, $client_app_name, $object_id))

		# if not found, add a row
		if($found_rows.Count -eq 0){
			$current_row = $queue.Rows.Add()
			$current_row["database_id"] = $database_id
			$current_row["client_app_name"] = $client_app_name
			$current_row["object_id"] = $object_id
		}
		else {
			$current_row = $found_rows[0]
		}

		if(($mode.Value -eq "IX") -or ($mode.Value -eq "X")) {
			# Exclusive or Intent-Exclusive lock: count this as a write
			$current_row["last_write"] = [DateTime]::Now
		}
		else {
			# Shared or Intent-Shared lock: count this as a read
			# SCH-S locks counted here as well (snapshot isolation ==> no shared locks)
			$current_row["last_read"] = [DateTime]::Now
		}
    }

	$ts = New-TimeSpan -Start $last_dump -End (get-date)

	# Dump to database every 5 minutes
	if($ts.TotalMinutes -gt 5) {
		$last_dump = [DateTime]::Now

		# BCP data to the staging table TOOLS.meta.table_usage_xe_last_snapshot
		$bcp = New-Object -TypeName System.Data.SqlClient.SqlBulkCopy -ArgumentList @($connectionString)
		$bcp.DestinationTableName = "TOOLS.meta.table_usage_xe_last_snapshot"
		$bcp.Batchsize = 1000
		$bcp.BulkCopyTimeout = 0

		$bcp.WriteToServer($queue)

		# Merge data with the destination table TOOLS.meta.table_usage_xe
		$statement = "
			BEGIN TRANSACTION

			BEGIN TRY

				MERGE INTO meta.table_usage_xe AS dest
				USING (
					SELECT db_name(database_id) AS db_name,
						object_schema_name(object_id, database_id) AS schema_name,
						object_name(object_id, database_id) AS object_name,
						client_app_name,
						last_read,
						last_write
					FROM meta.table_usage_xe_last_snapshot
				) AS src
					ON src.db_name = dest.db_name
					AND src.schema_name = dest.schema_name
					AND src.object_name = dest.object_name
					AND src.client_app_name = dest.client_app_name
				WHEN MATCHED THEN
					UPDATE SET last_read = src.last_read,
						last_write = src.last_write
				WHEN NOT MATCHED THEN
					INSERT (db_name, schema_name, object_name, client_app_name, last_read, last_write)
					VALUES (db_name, schema_name, object_name, client_app_name, last_read, last_write);

				TRUNCATE TABLE meta.table_usage_xe_last_snapshot;

				COMMIT;

			END TRY
			BEGIN CATCH
				ROLLBACK;
				THROW;
			END CATCH
		"

		$conn = New-Object -TypeName System.Data.SqlClient.SqlConnection -ArgumentList $connectionString
		$cmd = New-Object -TypeName System.Data.SqlClient.SqlCommand
		$cmd.CommandText = $statement
		$cmd.Connection = $conn
		$conn.Open()
		$conn.ChangeDatabase("TOOLS")
		[Void]$cmd.ExecuteNonQuery()
		$conn.Close()

		$queue.Rows.Clear()

	}

}

WARNING: Be careful running this script against a production server: I tried it with a reasonaly busy server and the CPU/memory load of powershell.exe is non-negligible. On the other hand, the load imposed by the session per se is very low: make sure you run this script from a different machine and not on the database server.

What to do with unused objects

After monitoring for a reasonable amount of time, you will start to notice that some objects are never used and you will probably want to delete them. Don’t!

In my experience, as soon as you delete an object, something that uses it (and you didn’t capture) pops up and fails. In those cases, you want to restore the objects very quickly. I usually move everything to a “trash” schema and have it sitting there for some time (six months/one year) and eventually empty the trash. If somebody asks for a restore, it’s just as simple as an ALTER SCHEMA … TRANSFER statement.

Bottom line

Cleaning up clutter from a database is not simple: hopefully the techniques in this post will help you in the task. Everything would be much simpler if the Extended Events histogram target was more flexible, but please keep in mind that it’s not about the tools: these techniques can help you identify unused objects when no other information is available, but nothing is a good substitute for a correct use of the database. When new tables are added to a database, keep track of the request and take notes about who uses the tables for which purpose: everything will be much easier in the long run.

Speaking at SQLSaturday Pordenone


Next week, on Saturday 28, make sure you don’t miss SQLSaturday Pordenone!

Pordenone is the place where the Italian adventure with SQLSaturday started, more than two years ago. It was the beginning of a journey that brought many SQLSaturdays to Italy, with our most successful one in Parma last November.

Now we’re back in Pordenone to top that result!

We have a fantastic schedule for this event, with a great speaker lineup and great topics for the sessions. Everything is set in the right direction to be a great day of free learning and fun.

I will have two sessions this time:

SQL Server Security in an Insecure World

In this session I will talk about security, with a general introduction to the topic and then I’ll go straight to demonstrate some of the vulnerabilities that attackers could use to take over your server. Yes, I’ll be demonstrating SQL-Injection attacks: SQL-I is still a top security issue, even if we’re in 2015. Everyone must be aware of the risks and take action immediately.
I will also describe the security features available in SQL Server to lock down the server as much as possible, but the key concept I will try to drive is that security is a process, not a feature.
If you want to find out more, join me at 12:00 PM in room S7.

Extending the Data Collector to Monitor SQL Server effortlessly

In this session I will try to promote one of the least used features in SQL Server: the Data Collector. It doesn’t have all the bells and whistles of the expensive monitoring suites, but it does the job pretty well. Things start to be painfully difficult when you try to extend it with additional collection sets, but the good news is that there’s an open-source project that provides a GUI to manage and customize the collection sets. The project is called ExtendedTSQLCollector and it does much more than just adding a GUI to the Data Collector: it also provides two additional collector types to collect data from LOB columns (in case you’re wondering, no – the vanilla Data Collector doesn’t support LOB columns) and Extended Events sessions.
I will also demonstrate a convenient way to centralize and extend the Data Collector reports to create a reporting and alerting solution for free.
Sounds interesting? Join me at 4:30 PM in room S7.

So, what are you waiting for? Register now and join us in Pordenone!

Blame it on Connect


Connect-logo-NewSome weeks ago I blogged about the discouraging signals coming from Connect and my post started a discussion that didn’t go very far. Instead it died quite soon: somebody commented the post and ranted about his Connect experience. I’m blogging again about Connect, but I don’t want to start a personal war against Microsoft: today I want to look at what happened from a new perspective.

What I find disappointing is a different aspect of the reactions from the SQL Server community, which made me think that maybe it’s not only Connect’s fault.

My post was in the headlines of SQL Server Central and was also included in the weekly links that Brent Ozar sends out with the Brent Ozar Unlimited newsletter, so it got a lot of views that day. Looking at my wordpress stats, I see that thousands of people read my post (to be fair, I can only say that they opened the page, I cannot tell whether they read the post or not) and some hundreds of people clicked the link to the original Connect item that started my rant.

Nobody upvoted the item. Yup, nobody.

Ok, very few people love the Data Collector and I rarely see it used in the wild, so, yes: I can understand how nobody cares about a bug in it. But, hey, it’s not my only Connect item that got no love from the community. Here’s another one, involving data corruption when using linked servers. See? Only 9 upvotes.

Here’s another one yet, that involves the setup program. No upvotes except mine.

What’s the point I want to drive? The voting system and the comments are the only way we have to improve the content on Connect. If we disregard the tools we have in our hands, there’s no use in complaining about the feedback system at all.

We need more community engagement

Filing our own items on Connect is not enough: we have to get involved in the platform to make our voice heard in more ways. When we find an item that we’d like to get fixed, we should definitely upvote it. At the same time, when we find items that are poorly described or are related to an issue that can be solved without bothering the support team, we should interact with the OP and ask for clarification or provide an alternative answer. When appropriate, we should also downvote poor questions.

Some popular Q&A sites like StackOverflow have built successful models based on this paradigm, like it or not. Moreover, the “points” system has proved successful at driving user engagement, which is something totally missing from Connect: you file your complaint and never come back.

Some online communities have moderators, who can play a fundamental role in the community. They can flag inappropriate items, edit and format questions and comments. The can also close questions or put them on hold. If part of the problem with Connect is the signal/noise ratio, more power to moderators is a possible answer.

Can PASS help?

In this post, Kevin Kline says that one of the ways that PASS should improve itself could be playing a better role in advocacy, telling Microsoft what are the features we really would like to see in SQL Server vNext and what are the bugs we really need to get fixed in the product. The idea is that Microsoft would (or at least should) listen more attentively to a whole community of users rather than to single individuals.

It’s a great idea and I think that PASS should really go for it. Unfortunately, something like that will never substitute Connect, because it’s a platform to collect feedback for all Microsoft products and not only for SQL Server. Moreover, how PASS is planning to gather the user feedback is still unclear: would it be using a voting system like Connect’s? How would that be different from Connect itself then?

Speed matters

Another thing that I think drives people away from Connect is its dreadful slowness. Connect is slow and nobody uses slow sites. It seems to be getting better lately, but we’re still not there. StackOverflow is probably using a fraction of Microsoft’s hardware and money to run all the StackExchange network at the speed of light. Part of its success is the responsiveness and Connect has a long way to go to catch up.

Bottom line

Connect has its issues, we all know it, but it’s not all Microsoft’s fault. The individual users can do something to improve the quality of the feedback and they definitely should. Everybody can start now! More votes means more attention, less votes means less love. Simple and straightforward.

On the other hand, the communities can contribute too. How they can contribute is not clear yet, but some communities (like PASS) have lots of people that volunteer and make their voice heard. It would really be a shame if that voice got lost.

Microsoft, please do your part. Users and communities want to contribute: help yourself by helping them and you won’t regret it. Responsiveness is the keyword here: we need a more responsive site and more responsive support engineers.

Who’s up to the challenge?