Next week, on Saturday 28, make sure you don’t miss SQLSaturday Pordenone!
Pordenone is the place where the Italian adventure with SQLSaturday started, more than two years ago. It was the beginning of a journey that brought many SQLSaturdays to Italy, with our most successful one in Parma last November.
Now we’re back in Pordenone to top that result!
We have a fantastic schedule for this event, with a great speaker lineup and great topics for the sessions. Everything is set in the right direction to be a great day of free learning and fun.
I will have two sessions this time:
SQL Server Security in an Insecure World
In this session I will talk about security, with a general introduction to the topic and then I’ll go straight to demonstrate some of the vulnerabilities that attackers could use to take over your server. Yes, I’ll be demonstrating SQL-Injection attacks: SQL-I is still a top security issue, even if we’re in 2015. Everyone must be aware of the risks and take action immediately.
I will also describe the security features available in SQL Server to lock down the server as much as possible, but the key concept I will try to drive is that security is a process, not a feature.
If you want to find out more, join me at 12:00 PM in room S7.
Extending the Data Collector to Monitor SQL Server effortlessly
In this session I will try to promote one of the least used features in SQL Server: the Data Collector. It doesn’t have all the bells and whistles of the expensive monitoring suites, but it does the job pretty well. Things start to be painfully difficult when you try to extend it with additional collection sets, but the good news is that there’s an open-source project that provides a GUI to manage and customize the collection sets. The project is called ExtendedTSQLCollector and it does much more than just adding a GUI to the Data Collector: it also provides two additional collector types to collect data from LOB columns (in case you’re wondering, no – the vanilla Data Collector doesn’t support LOB columns) and Extended Events sessions.
I will also demonstrate a convenient way to centralize and extend the Data Collector reports to create a reporting and alerting solution for free.
Sounds interesting? Join me at 4:30 PM in room S7.
So, what are you waiting for? Register now and join us in Pordenone!
Some weeks ago I blogged about the discouraging signals coming from Connect and my post started a discussion that didn’t go very far. Instead it died quite soon: somebody commented the post and ranted about his Connect experience. I’m blogging again about Connect, but I don’t want to start a personal war against Microsoft: today I want to look at what happened from a new perspective.
What I find disappointing is a different aspect of the reactions from the SQL Server community, which made me think that maybe it’s not only Connect’s fault.
My post was in the headlines of SQL Server Central and was also included in the weekly links that Brent Ozar sends out with the Brent Ozar Unlimited newsletter, so it got a lot of views that day. Looking at my wordpress stats, I see that thousands of people read my post (to be fair, I can only say that they opened the page, I cannot tell whether they read the post or not) and some hundreds of people clicked the link to the original Connect item that started my rant.
Nobody upvoted the item. Yup, nobody.
Ok, very few people love the Data Collector and I rarely see it used in the wild, so, yes: I can understand how nobody cares about a bug in it. But, hey, it’s not my only Connect item that got no love from the community. Here’s another one, involving data corruption when using linked servers. See? Only 9 upvotes.
Here’s another one yet, that involves the setup program. No upvotes except mine.
What’s the point I want to drive? The voting system and the comments are the only way we have to improve the content on Connect. If we disregard the tools we have in our hands, there’s no use in complaining about the feedback system at all.
We need more community engagement
Filing our own items on Connect is not enough: we have to get involved in the platform to make our voice heard in more ways. When we find an item that we’d like to get fixed, we should definitely upvote it. At the same time, when we find items that are poorly described or are related to an issue that can be solved without bothering the support team, we should interact with the OP and ask for clarification or provide an alternative answer. When appropriate, we should also downvote poor questions.
Some popular Q&A sites like StackOverflow have built successful models based on this paradigm, like it or not. Moreover, the “points” system has proved successful at driving user engagement, which is something totally missing from Connect: you file your complaint and never come back.
Some online communities have moderators, who can play a fundamental role in the community. They can flag inappropriate items, edit and format questions and comments. The can also close questions or put them on hold. If part of the problem with Connect is the signal/noise ratio, more power to moderators is a possible answer.
Can PASS help?
In this post, Kevin Kline says that one of the ways that PASS should improve itself could be playing a better role in advocacy, telling Microsoft what are the features we really would like to see in SQL Server vNext and what are the bugs we really need to get fixed in the product. The idea is that Microsoft would (or at least should) listen more attentively to a whole community of users rather than to single individuals.
It’s a great idea and I think that PASS should really go for it. Unfortunately, something like that will never substitute Connect, because it’s a platform to collect feedback for all Microsoft products and not only for SQL Server. Moreover, how PASS is planning to gather the user feedback is still unclear: would it be using a voting system like Connect’s? How would that be different from Connect itself then?
Another thing that I think drives people away from Connect is its dreadful slowness. Connect is slow and nobody uses slow sites. It seems to be getting better lately, but we’re still not there. StackOverflow is probably using a fraction of Microsoft’s hardware and money to run all the StackExchange network at the speed of light. Part of its success is the responsiveness and Connect has a long way to go to catch up.
Connect has its issues, we all know it, but it’s not all Microsoft’s fault. The individual users can do something to improve the quality of the feedback and they definitely should. Everybody can start now! More votes means more attention, less votes means less love. Simple and straightforward.
On the other hand, the communities can contribute too. How they can contribute is not clear yet, but some communities (like PASS) have lots of people that volunteer and make their voice heard. It would really be a shame if that voice got lost.
Microsoft, please do your part. Users and communities want to contribute: help yourself by helping them and you won’t regret it. Responsiveness is the keyword here: we need a more responsive site and more responsive support engineers.
Who’s up to the challenge?
As you probably know, SQL Server allows only one default instance per server. The reason is not actually something special to SQL Server, but it has to do with the way TCP/IP endpoints work.
In fact, a SQL Server default instance is nothing special compared to a named instance: it has a specific instance id (MSSQLSERVER) and listens on a well-known TCP port (1433), but it has no other intrinsic property or feature that makes it different from any other instance.
Let’s look closely to these properties: the instance id is specific to a SQL Server instance and it has to be unique. In this regard, MSSQLSERVER makes no exception. Similarly, a TCP endpoint must be unique and there can be only one socket listening on a specific endpoint.
Nevertheless, I will show you a way to have multiple “default” instances installed on the same server, even if it might look impossible at a first look.
Install two instances of SQL Server
First of all, you need to have two (or more) instances installed on your server. In this example I will use the server “FANGIO” and I will install two named instances: INST01 and INST02.
Here’s what my Configuration Manager looks like once the two instances are ready:
In this case I used two named instances, but it would have worked even if I used a default instance and a named instance. Remember? Default instances are nothing special.
Provision IP addresses
Each SQL Server instance must listen on a different TCP endpoint, but this does not mean that each instance has to listen on a different port: a TCP endpoint is made of an IP address and a port. This means that two instances can listen on the same port, as long as the IP addresses are different.
In this case, you just need to add a new IP address to the server, one for each SQL Server instance that you want to listen on port 1433.
Configure network protocols
Now that you have multiple IP addresses, you just have to tell SQL Server to listen on that specific address, port 1433.
Open the Configuration Manager and enable TCP/IP:
Now open the properties applet and disable “Listen All”:
In the IP Addresses tab, configure the IP address and the port:
In this case I enabled the address 10.0.1.101 for INST01 and I disabled all the remaining addresses. For INST02 I enabled 10.0.1.102.
Now the server has two IP addresses and they both resolve to its network name (FANGIO). In order to let clients connect to the appropriate SQL Server instance, you need to create two separate “A” records in DNS to resolve to each IP address.
In this case I don’t have a DNS server (it’s my home lab) so I will use the hosts file:
Now the example setup looks like this:
When a client connects to the default instance on ASCARI, it is connecting to FANGIO\INST01 instead. Similarly, the default instance on VILLENEUVE corresponds to FANGIO\INST02.
Why would I want to do this?
If you had only default instances in your servers, moving databases around for maintenances, upgrades or consolidations would be just a matter of adding a CNAME to your DNS.
With named instances, the only way to redirect connections to a different server is by using a SQLClient alias. Unfortunately, aliases are client-side settings and have to be deployed to each and every client in order to work. Group policies can deploy aliases to multiple machines at once, but policies are not evaluated immediately, while a DNS entry can propagate very quickly.
Another reason to use this setup is the ability to bypass the SQLBrowser: when a named instance is specified, the client has to contact the SQLBrowser service on port 1434 with a small UDP datagram and receive back the list of instances, along with the port they’re listening on. When the default instance is specified, there is no need to contact the SQLBrowser, because we already know the port it is listening on (it’s 1433, unless it has been changed).
Sometimes the firewall settings for SQLBrowser are tricky to set up, especially with clusters. Another thing I recently discovered is that SQLBrower allows attackers to create huge DDOS attacks using a 440x amplification factor.
Some setup guides recommend that you change the port SQL Server listens on to something different from 1433, which is a well-known port, more likely to be discovered by attackers. I think that an attacker skilled enough to penetrate your server needs much more resistance than just “hiding” your instance to a non-default port. A quick port scan would immediately reveal any SQL Server instance listening on any port, so this is really a moot point in my opinion.
SQL Server allows only one default instance to be installed on a machine, but with a few simple steps every instance can be made a “default” instance. The main advantage of such a setup is the ability to redirect client connections to a database instance with a simple change in the DNS configuration.
A couple of years ago I blogged about a bug on the Data Collector that I couldn’t resolve but with an ugly workaround. At the end of that post, I stated that I wouldn’t have bothered filing the bug on Connect, due to prior discouraging results. Well, despite what I wrote there, I took the time to open a bug on Connect (the item can be found here), which was promptly closed as “won’t fix”.
Nothing new under the sun: “won’t fix” is the most likely answer on Connect, no matter how well you document your issue and how easy is the bug to reproduce. I really am sorry to say that, but it’s a widespread feeling that Connect has become totally pointless, if it ever had a point. The common feeling about Connect is that bugs are usually closed as “won’t fix” or “by design” without any further communication, while suggestions are completely disregarded.
How did we get here? Why is Microsoft spending money on a service that generates frustration on users? Where does this idiosyncrasy come from?
If I had to give Microsoft advice on how to improve Connect, I would focus on one simple point:
One of the things I see over and over on Connect is the lack of communication between users and support engineers. Once the item is closed, it’s closed (with few notable exceptions). You can ask for more information, add details to the item, do anything you can think of, but the engineers will stop responding. Period.
This means that there is no way to steer the engineer’s understanding of the bug: if (s)he read it wrong, (s)he won’t read it again.
I can understand that anybody with a Microsoft account can create bugs on Connect without having to pay for the time spent on the problem by the engineers: this can easily lead to a very low signal/noise rate, which is not sustainable. In other words, the support engineers seem to be flooded by an overwhelming amount of inaccurate submissions, which makes mining for noteworthy bugs an equally overwhelming task.
However, I think that the current workflow for closing bugs is too abrupt and a more reasonable workflow would at least require responding to the first comment received after the item is marked for closure.
How is CSS different?
In this particular case, I decided to conduct a small experiment and I opened the same exact bug with CSS. Needless to say that the outcome was totally different.
The bug was recognized as a bug, but this is not the main point: the biggest difference was the amount and the quality of communication with the support engineer. When you file a bug with CSS, a support engineer is assigned to your case and you can communicate with him/her directly by email. If the engineer needs more information on the case, (s)he will probably call you on the phone and ask for clarification. In our case, we also have a TAM (Technical Account Manager) that gets CC’ed to all emails between us and CSS.
Where does the difference lay? Just one: money.
If you want to contact the CSS, you have to pay for support. If the bug turns out to be a documented behavior instead, you pay for the time spent by the engineers working on it. This is totally absent from Connect, where everyone can file bugs without having to pay attention to what they do: there will be nothing to pay at the end of the day.
Is Connect really pointless?
One thing I discovered with my experiment may surprise you: CSS reads Connect items and if there is one matching your case, they will take it into account. This is really good news in my opinion and sheds a totally new light over Connect.
Another thing I discovered is that there is much more information behind a Connect item than it’s visible to users. When the engineers process items, they produce comments that are attached to the different workflow steps involved in the triage. Unfortunately, this is invisible to the end users, that are left with the minimal information that the engineers decide to share.
However, the important lesson learned from this experiment is that Connect may be frustrating for end users, but it is far from pointless: the information gathered while triaging bugs contributes to the quality of the paid support and, ultimately, to the quality of SQL Server itself. What still is unsatisfactory is the feedback to Connect users, that are getting more and more discouraged to file new items.
An appeal to Microsoft
Dear Microsoft, please, please, please improve the feedback on Connect: more feedback means less frustration for users that submit legitimate and reasonable bugs. Less frustration means more sensible feedback from your users, which in turn helps your CSS and improves the quality of SQL Server. Not everybody can open cases with CSS: this doesn’t mean that they are not contributing positevely to your product (and you know it), so please reward them with a better communication.
I am honoured to join a community of people that I highly respect and have always been my inspiration. The MVPs I had the pleasure to meet are a model to strive for: exceptional technical experts and great community leaders that devote their own time to spread their knowledge. I have never considered myself nearly as good as those exceptional people and receiving this award means that now I have to live up to the overwhelming expectations that it sets.
So, now what?
This award maybe means that I’m on the right track. I will continue to help the community with my contribution, hoping that somebody find it useful in the journey with SQL Server. I will continue to spread whatever I know about SQL Server and all the technologies around it with my blog posts, my articles and my forum answers. I will continue to speak at conferences, SQL Saturdays and technology events around me.
The award opens new possibilities and new ways to contribute and I won’t miss the opportunity to do more!
I am really grateful to those who made it happen, in particular the exceptional people at sqlservercentral.com, where my journey with the SQL Server community began many years ago.
A huge thank you goes also to the Italian #sqlfamily that introduced me to speaking at SQL Server events.
And now, let’s rock this 2015!
Does anybody need another good reason to avoid setting AUTO_CLOSE on a database? Looks like I found one.
Some days ago, all of a sudden, a database started to throw errors along the lines of “The log for database MyDatabase is not available”. The instance was an old 2008 R2 Express (don’t get me started on why an Express Edition is in production…) with some small databases.
The log was definitely there and the database looked online. Actually, I was able to query the tables, but every attempt to update the contents ended up with the “log unavailable” error.
Then I opened the ERRORLOG and found something really interesting: lots and lots of entries similar to “Starting up database MyDatabase” over and over… Does it ring a bell?
Yes, it’s AUTO_CLOSE
Looks like SQL Server closed the database and failed to open it completely, hence the “log unavailable” errors.
What should be done now to bring the database back to normal behaviour? Simply bring the database offline and then back online:
ALTER DATABASE MyDatabase SET OFFLINE; ALTER DATABASE MyDatabase SET ONLINE;
And while we’re at it, let’s disable AUTO_CLOSE:
ALTER DATABASE MyDatabase SET AUTO_CLOSE OFF;
See? Best practices are not for losers!
Some weeks ago I had to wipe my machine and reinstall everything from scratch, SQL Server included.
For some reason that I still don’t understand, SQL Server Management Studio installed fine, but I couldn’t install Books Online from the online help repository. Unfortunately, installing from offline is not an option with SQL Server 2014, because the installation media doesn’t include the Language Reference documentation.
The issue is well known: Aaron Bertrand blogged about it back in april when SQL Server 2014 came out and he updated his post in august when the documentation was finally completely published. He also blogged about it at SQLSentry.
However, I couldn’t get that method to work: the Help Library Manager kept firing errors as soon as I clicked the “Install from Online” link. The error message was “An exception has occurred. See the event log for details.”
Needless to say that the event log had no interesting information to add.
If you are experiencing the same issue, here is a method to install the language reference from disk without downloading the help content from the Help Library Manager:
1 . Open a web browser and point it to the following url: http://services.mtps.microsoft.com/ServiceAPI/products/dd433097/dn632688/books/dn754848/en-us
2. Download the individual .cab files listed in that page to a location in your disk (e.g. c:\temp\langref\)
3. Create a text file name HelpContentSetup.msha in the same folder as the .cab files and paste the following html:
<html xmlns="http://www.w3.org/1999/xhtml"> <head /> <body class="vendor-book"> <div class="details"> <span class="vendor">Microsoft</span> <span class="locale">en-us</span> <span class="product">SQL Server 2014</span> <span class="name">Microsoft SQL Server Language Reference</span> </div> <div class="package-list"> <div class="package"> <span class="name">SQL_Server_2014_Books_Online_B4164_SQL_120_en-us_1</span> <span class="deployed">False</span> <a class="current-link" href="sql_server_2014_books_online_b4164_sql_120_en-us_1(0b10b277-ad40-ef9d-0d66-22173fb3e568).cab">sql_server_2014_books_online_b4164_sql_120_en-us_1(0b10b277-ad40-ef9d-0d66-22173fb3e568).cab</a> </div> <div class="package"> <span class="name">SQL_Server_2014_Microsoft_SQL_Server_Language_Reference_B4246_SQL_120_en-us_1</span> <span class="deployed">False</span> <a class="current-link" href="sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_1(5c1ad741-d0e3-a4a8-d9c0-057e2ddfa6e1).cab">sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_1(5c1ad741-d0e3-a4a8-d9c0-057e2ddfa6e1).cab</a> </div> <div class="package"> <span class="name">SQL_Server_2014_Microsoft_SQL_Server_Language_Reference_B4246_SQL_120_en-us_2</span> <span class="deployed">False</span> <a class="current-link" href="sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_2(24815f90-9e36-db87-887b-cf20727e5e73).cab">sql_server_2014_microsoft_sql_server_language_reference_b4246_sql_120_en-us_2(24815f90-9e36-db87-887b-cf20727e5e73).cab</a> </div> </div> </body> </html>
4 . Open the Help Library Manager and select “Install content from disk”
5. Browse to the .msha you just created and click Next
6. The SQL Server 2014 node will appear. Click the Add link
7. Click the Update button and let the installation start
8. Installation will start and process the cab files
9. Installation finished!
9. To check whether everything is fine, click on the “remove content” link and you should see the documentation.
Done! It was easy after all, wasn’t it?
Lately I spent some time evaluating some monitoring tools for SQL Server and one thing that struck me very negatively is how none of them (to date) has been reporting database free space correctly.
I was actively evaluating one of those tools when one of my production databases ran out of space without any sort of warning.
I was so upset that I decided to code my own monitoring script.
Some things to take into account:
- Hard set limits for file growth have to be considered: a drive with lots of space is useless if the database file cannot grow and take it.
- If fixed growth is used, there must be enough space in the drive to accomodate the growth amount you set.
- If percent growth is used, you have to calculate recursively how much your database file will grow before taking all the space in the drive
- Some scripts found in blogs and books don’t account for mount points. Use
sys.dm_os_volume_statsto include mount points in your calculation (unless you’re running SQL Server versions prior to 2012).
- Database free space alone is not enough. NTFS performance start degrading when the drive free space drops below 20%. Make sure you’re monitoring that as well.
- 20% of a huge database can be lots of space. You can change that threshold to whatever you find appropriate (for instance, less than 20% AND less than 20 GB)
That said, here is my script, I hope you find it useful.
-- create a temporary table to hold data from sys.master_files IF OBJECT_ID('tempdb..#masterfiles') IS NOT NULL DROP TABLE #masterfiles; CREATE TABLE #masterfiles ( database_id int, type_desc varchar(10), name sysname, physical_name varchar(255), size_mb int, max_size_mb int, growth int, is_percent_growth bit, data_space_id int, data_space_name nvarchar(128) NULL, drive nvarchar(512), mbfree int ); -- extract file information from sys.master_files -- and correlate each file to its logical volume INSERT INTO #masterfiles SELECT mf.database_id ,type_desc ,name ,physical_name ,size_mb = size / 128 ,max_size_mb = CASE WHEN max_size = 268435456 AND type_desc = 'LOG' THEN -1 ELSE CASE WHEN max_size = -1 THEN -1 ELSE max_size / 128 END END ,mf.growth ,mf.is_percent_growth ,mf.data_space_id ,NULL ,d.volume_mount_point ,d.available_bytes / 1024 / 1024 FROM sys.master_files AS mf CROSS APPLY sys.dm_os_volume_stats(database_id, file_id) AS d; -- add an "emptyspace" column to hold empty space for each file ALTER TABLE #masterfiles ADD emptyspace_mb int NULL; -- iterate through all databases to calculate empty space for its files DECLARE @name sysname; DECLARE c CURSOR FORWARD_ONLY READ_ONLY STATIC LOCAL FOR SELECT name FROM sys.databases WHERE state_desc = 'ONLINE' OPEN c FETCH NEXT FROM c INTO @name WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @sql nvarchar(max) DECLARE @statement nvarchar(max) SET @sql = ' UPDATE mf SET emptyspace_mb = size_mb - FILEPROPERTY(name,''SpaceUsed'') / 128, data_space_name = ISNULL( (SELECT name FROM sys.data_spaces WHERE data_space_id = mf.data_space_id), ''LOG'' ) FROM #masterfiles AS mf WHERE database_id = DB_ID(); ' SET @statement = 'EXEC ' + QUOTENAME(@name) + '.sys.sp_executesql @sql' EXEC sp_executesql @statement, N'@sql nvarchar(max)', @sql FETCH NEXT FROM c INTO @name END CLOSE c DEALLOCATE c -- create a scalar function to simulate the growth of the database in the drive's available space IF OBJECT_ID('tempdb..calculateAvailableSpace') IS NOT NULL EXEC tempdb.sys.sp_executesql N'DROP FUNCTION calculateAvailableSpace' EXEC tempdb.sys.sp_executesql N' CREATE FUNCTION calculateAvailableSpace( @diskFreeSpaceMB float, @currentSizeMB float, @growth float, @is_percent_growth bit ) RETURNS int AS BEGIN IF @currentSizeMB = 0 SET @currentSizeMB = 1 DECLARE @returnValue int = 0 IF @is_percent_growth = 0 BEGIN SET @returnValue = (@growth /128) * CAST((@diskFreeSpaceMB / (@growth / 128)) AS int) END ELSE BEGIN DECLARE @prevsize AS float = 0 DECLARE @calcsize AS float = @currentSizeMB WHILE @calcsize < @diskFreeSpaceMB BEGIN SET @prevsize = @calcsize SET @calcsize = @calcsize + @calcsize * @growth / 100.0 END SET @returnValue = @prevsize - @currentSizeMB IF @returnValue < 0 SET @returnValue = 0 END RETURN @returnValue END ' -- report database filegroups with less than 20% available space ;WITH masterfiles AS ( SELECT * ,available_space = CASE mf.max_size_mb WHEN -1 THEN tempdb.dbo.calculateAvailableSpace(mbfree, size_mb, growth, is_percent_growth) ELSE max_size_mb - size_mb END + emptyspace_mb FROM #masterfiles AS mf ), spaces AS ( SELECT DB_NAME(database_id) AS database_name ,data_space_name ,type_desc ,SUM(size_mb) AS size_mb ,SUM(available_space) AS available_space_mb ,SUM(available_space) * 100 / CASE SUM(size_mb) WHEN 0 THEN 1 ELSE SUM(size_mb) END AS available_space_percent FROM masterfiles GROUP BY DB_NAME(database_id) ,data_space_name ,type_desc ) SELECT * FROM spaces WHERE available_space_percent < 20 ORDER BY available_space_percent ASC IF OBJECT_ID('tempdb..#masterfiles') IS NOT NULL DROP TABLE #masterfiles; IF OBJECT_ID('tempdb..calculateAvailableSpace') IS NOT NULL EXEC tempdb.sys.sp_executesql N'DROP FUNCTION calculateAvailableSpace'
I am sure that there are smarter scripts around that calculate it correctly and I am also sure that there are other ways to obtain the same results (PowerShell, to name one). The important thing is that your script takes every important aspect into account and warns you immediately when the database space drops below your threshold, not when the available space is over.
Last time it happened to me it was a late saturday night and, while I really love my job, I can come up with many better ways to spend my saturday night.
I’m pretty sure you do as well.
I haven’t been blogging much lately, actually I haven’t been blogging at all in the last 4 months. The reason behind is I have been putting all my efforts in a new project I started recently, which absorbed all my attention and spare time.
I am proud to announce that my project is now live and available to everyone for download.
The project name is ExtendedTSQLCollector and you can find it at http://extendedtsqlcollector.codeplex.com. As you may have already guessed, it’s a bridge between two technologies that were not meant to work together, that could instead bring great advantages when combined: Extended Events and Data Collector.
ExtendedTSQLCollector is a set of two Collector Types built to overcome some of the limitations found in the built-in collector types and extend their functionality to include the ability to collect data from XE sessions.
The first Collector Type is the “Extended T-SQL Query” collector type, which was my initial goal when I started the project. If you have had the chance to play with the built-in “Generic T-SQL Query” collector type, you may have noticed that not all datatypes are supported. For instance, it’s impossible to collect data from XML or varchar(max) columns. This is due to the intermediate format used by this collector type: the SSIS raw files.
The “Extended T-SQL Query” collector type uses a different intermediate format, which allows collecting data of any data type. This is particularly useful, because SQL Server exposes lots of information in XML format (just think of the execution plans!) and you no longer need to code custom SSIS packages to collect that data.
The second Collector Type is the “Extended XE Reader” collector type, which takes advantage of the Extended Events streaming APIs to collect data from an Extended Events session, without the need to specify additional targets such as .xel files or ring buffers. This means no file system bloat due to .xel rollover files and no memory consumption for additional ring buffers: all the events are read directly from the session and processed in near real-time.
In addition to the filter predicates defined in the XE session, you can add more filter predicates on the data to collect and upload to the MDW and decide which columns (fields and actions) to collect. The collector will take care of creating the target table in your MDW database and upload all the data that satisfies the filter predicates.
The near real-time behavior of this collector type allowed me to include an additional feature to the mix: the ability to fire alerts in response to Extended Events. The current release (1.5) allows firing email alerts when the events are captured, with additional filter predicates and the ability to include event fields and actions in the email body. You can find more information on XE alerts in the documentation.
Here is an example of the email alerts generated by the XEReader collector type for the blocked_process event:
Another part of the project is the CollectionSet Manager, a GUI to install the collector types to the target servers and configure collection sets and collection items. I think that one of the reasons why the Data Collector is very underutilized by DBAs is the lack of a Graphical UI. Besides the features specific to the ExtendedTSQLCollector, such as installing the collector type, this small utility aims at providing the features missing in the SSMS Data Collector UI. This part of the project is still at an early stage, but I am planning to release it in the next few months.
My journey through the ins and outs of the Data Collector allowed me to understand deeply how it works and how to set it up and troubleshoot it. Now I am planning to start a blog series on this topic, from the basics to the advanced features. Stay tuned :-)
I don’t want to go into deep details on the setup and configuration of this small project: I just wanted to ignite your curiosity and make you rush to codeplex to download your copy of ExtendedTSQLCollector.
What are you waiting for?
Some days ago I was talking with my friend Davide Mauri about the uniquifier that SQL Server adds to clustered indexes when they are not declared as UNIQUE.
We were not completely sure whether this behaviour applied to duplicate keys only or to all keys, even when unique.
The best way to discover the truth is a script to test what happens behind the scenes:
-- ============================================= -- Author: Gianluca Sartori - @spaghettidba -- Create date: 2014-03-15 -- Description: Checks whether the UNIQUIFIER column -- is added to a column only on -- duplicate clustering keys or all -- keys, regardless of uniqueness -- ============================================= USE tempdb GO IF OBJECT_ID('sizeOfMyTable') IS NOT NULL DROP VIEW sizeOfMyTable; GO -- Create a view to query table size information -- Not very elegant, but saves a lot of typing CREATE VIEW sizeOfMyTable AS SELECT OBJECT_NAME(si.object_id) AS table_name, si.name AS index_name, SUM(total_pages) AS total_pages, SUM(used_pages) AS used_pages, SUM(data_pages) AS data_pages FROM sys.partitions AS p INNER JOIN sys.allocation_units AS AU ON P.hobt_id = AU.container_id INNER JOIN sys.indexes AS si ON si.index_id = p.index_id AND si.object_id = p.object_id WHERE si.object_id = OBJECT_ID('#testUniquifier') GROUP BY OBJECT_NAME(si.object_id), si.name GO IF OBJECT_ID('#testUniquifier') IS NOT NULL DROP TABLE #testUniquifier; -- Create a test table CREATE TABLE #testUniquifier ( i int NOT NULL ) -- Results table: will receive table size -- in different scenarios DECLARE @results TABLE( description varchar(500), table_name sysname, index_name sysname, total_pages int, used_pages int, data_pages int ); -- INSERTS 100K UNIQUE VALUES INSERT INTO #testUniquifier SELECT TOP(100000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM sys.all_columns AS AC CROSS JOIN sys.all_columns AS AC1; -- ----------------------------------------------------------------- -- TEST1: CREATES A UNIQUE CLUSTERED INDEX (NO UNIQUIFIER) -- ----------------------------------------------------------------- CREATE UNIQUE CLUSTERED INDEX UK_test ON #testUniquifier(i); INSERT @results SELECT 'Unique clustered index' AS description, * FROM sizeOfMyTable; DROP INDEX UK_test ON #testUniquifier -- ----------------------------------------------------------------- -- TEST2: CREATES A NON-UNIQUE CLUSTERED INDEX -- NO DUPLICATES ARE PRESENT YET -- ----------------------------------------------------------------- CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i) INSERT @results SELECT 'Non-Unique clustered index, no duplicates' AS description, * FROM sizeOfMyTable DROP INDEX IX_test ON #testUniquifier -- ----------------------------------------------------------------- -- TEST3: CREATES A NON-UNIQUE CLUSTERED INDEX -- 10000 DUPLICATE VALUES ARE PRESENT -- ----------------------------------------------------------------- UPDATE TOP(10000) #testUniquifier SET i = 1 CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i) INSERT @results SELECT 'Non-Unique clustered index, some duplicates' AS description, * FROM sizeOfMyTable DROP INDEX IX_test ON #testUniquifier -- ----------------------------------------------------------------- -- TEST4: CREATES A NON-UNIQUE CLUSTERED INDEX -- ALL ROWS CONTAIN THE SAME VALUE (1) -- ----------------------------------------------------------------- UPDATE #testUniquifier SET i = 1 CREATE CLUSTERED INDEX IX_test ON #testUniquifier(i) INSERT @results SELECT 'Non-Unique clustered index, all duplicates' AS description, * FROM sizeOfMyTable -- ----------------------------------------------------------------- -- Display results -- ----------------------------------------------------------------- SELECT * FROM @results;
As you can see, the uniquifier is added only to the keys that are duplicated:
Another way to discover the same results would be looking at the output of DBCC PAGE().
Looking at the text output of DBCC PAGE, uniquifiers are displayed as 0 (zero) when the values are not set, but the values are actually missing from the page.
This becomes even clearer when using DBCC PAGE WITH TABLERESULTS:
IF OBJECT_ID('tempdb..#formatteddata') IS NOT NULL DROP TABLE #formatteddata; SELECT *, ROWNUM = ROW_NUMBER() OVER (ORDER BY page_id, slot_id) INTO #formatteddata FROM #testUniquifier CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%); IF OBJECT_ID('tempdb..#dbccpage') IS NOT NULL DROP TABLE #dbccpage; CREATE TABLE #dbccpage ( page_id int, ParentObject varchar(128), Object varchar(128), Field varchar(128), value varchar(4000), Slot AS SUBSTRING(Object, NULLIF(CHARINDEX('Slot ',Object,1),0) + 5, ISNULL(NULLIF(CHARINDEX(' ',Object,6),0),0) - 5) ) DECLARE @current_page_id int; DECLARE pages CURSOR STATIC LOCAL FORWARD_ONLY READ_ONLY FOR SELECT DISTINCT page_id FROM #formatteddata OPEN pages FETCH NEXT FROM pages INTO @current_page_id WHILE @@FETCH_STATUS = 0 BEGIN INSERT INTO #dbccpage (ParentObject, Object, Field, value) EXEC sp_executesql N'DBCC PAGE (2, 1, @pageid, 3) WITH TABLERESULTS;', N'@pageid int', @current_page_id UPDATE #dbccpage SET page_id = @current_page_id WHERE page_id IS NULL FETCH NEXT FROM pages INTO @current_page_id END CLOSE pages; DEALLOCATE pages; WITH PageData AS ( SELECT page_id, slot, field, value FROM #dbccpage WHERE field IN ('i', 'UNIQUIFIER') ), Uniquifiers AS ( SELECT * FROM PageData PIVOT (MAX(value) FOR field IN ([i], [UNIQUIFIER])) AS pvt ), sourceData AS ( SELECT * FROM #formatteddata ) SELECT src.ROWNUM, src.i, src.page_id, src.slot_id, UNIQUIFIER FROM sourceData AS src LEFT JOIN Uniquifiers AS unq ON src.slot_id = unq.slot AND src.page_id = unq.page_id ORDER BY ROWNUM;
If you run the code in the different situations outlined before (unique clustered index, non-unique clustered index with or without duplicate keys) you will find the uniquifiers associated with each duplicate key and you will also notice that no uniquifier is generated for the keys that are unique.