Brent Ozar recently published the feedback he got on his abstract submissions for PASS Summit 2016 and, as he often does, started a trend which some others followed. I think the feedback is interesting and useful for speakers that plan to improve their own submissions for the years to come, so I decided I will publish my feedback as well.
I did not get selected and, obviously, I would be happier if I got selected instead. This doesn’t mean I have problems with the selection process or the reviewers that implemented it. Instead I’m grateful for the enormous amount of work they did for the community, so a huge thank you goes to the committee.
Every year I see people publicly complain about the selection process or getting mad about being rejected. Folks, get over it: organizers have the right to choose whichever sessions they find more suitable for the Summit. It’s their responsibility to put together the best possible agenda and that doesn’t necessarily include your session.
That said, this is my feedback:
Responding to Extended Events in near real-time
Not Accepted: Higher rated session selected.
Category: General Session (75 minutes)
Track: Enterprise Database Administration & Deployment
Topic: Performance Monitoring / Tuning / Extended Events / Waits
Abstract: Extended Events provide deep insight into SQL Server’s behavior and allow us to gather information not available by other means. However, compared to other technologies such as SQL Trace and Event Notifications, a way to react to the events as soon as they happen seems to be lacking.
In this session we will see how the Extended Events streaming API can be used to process events in a near real-time fashion. We will demonstrate how this technology enables new possibilities to solve real world problems, such as capturing and notifying deadlocks or blocking sessions.
Prerequisites: Extended Events basics, C# basics, Powershell basics. The amount of coding required is in line with the average DBA skills.
Goal1: Introduce Extended Events and compare them with other monitoring technologies available in SQL Server
Goal2: Introduce the Extended Events streaming API and demonstrate how it can be used to process the events as soon as they occur, without shredding XML.
Goal3: Demonstrate how the Extended Events streaming API can be used to solve real world problems, such as building a monitoring and alerting solution for deadlocks and blocked sessions.
Needs more specifics in abstract.
A : 4 The abstract digs a little deeper on what’s being offered by XEs, and what needs to be done to efficiently capture events in near real-time without bogging down the system
T : 5 The topic matches with the goals – and many people hesitate capturing large amounts of .xel files fearing shredding the xml. This session offers a new approach to capturing the same using streaming APIs, with C#/PowerShell
S : 5 Prereqs are a little steep, but given the outcome – justified, and will interest people.
Level looks ok for the prereqs and goals. Topic is ok. Abstract is good overall. Only issue is the second sentence reads a little oddly.
Abstract: Well written abstract with strong supportive goals. Topic: Great topic! Subjective: This sounds like a very interesting session. Taking xEvents to the next level. I would attend this session. I think it will be a large draw!
Feedback on comments:
“Needs more specifics in abstract”. What kind of specifics is needed in the abstract? A clue would be helpful here.
A/T/S: what do these letters stand for? I suppose they mean Abstract, Topic, Subjective. What do the numbers stand for? A mark maybe? Again, no explanation.
“The second sentence reads a little oddly”. I really don’t understand grammar or language comments. It’s not a conference on the English language and non-native speakers are already disadvantaged enough without being constantly reminded that their language skills can’t compare to Shakespeare’s. Speakers names are stripped away from the sessions, so the reviewers may not be aware that the session comes from a foreigner. If the topic is good and fits what the organizers are looking for, minor grammar/language mistakes can be fixed. BTW, one of the reviewers told me that when multiple sessions are on a tie, language is the tie breaker. Questionable, but understandable.
The feedback itself is not very useful for improving this submission for next year.
SQL Server Infernals: Worst Practices in Action
Not Accepted: Other sessions selected based on building a balanced program for track coverage, speaker coverage, topic coverage, and session rating.
Category: General Session (75 minutes)
Track: Enterprise Database Administration & Deployment
Topic: Internals: Storage Engine / Query Engine / Compression
Abstract: Let’s face it: Best Practices are too many to really know them all and choose which ones should be applied first. Does your telephone ring all the time? Do your users ask for that “quick report” that instead takes ages and keeps changing every time you think it’s done?
Have you ever thought that in dire times avoiding Worst Practices could be a good starting point and you can leave fine tuning for a better future? If the answer is “yes”, then this session is for you: we will discover together how not to torture a SQL Server instance and we will see how to avoid making choices that in the long run could turn out to be not as smart as they looked initially.
Prerequisites: Basic database design skills, basic development concepts, basic database administration skills.
Goal1: Demonstrate how bad Database Design decisions (irresponsible denormalization, EAV, bad data types, wrong or missing primary keys) can haunt a project through its whole lifetime.
Goal2: Illustrate how bad development practices (SQL Injection, RBAR, poor or no testing) can hurt performance, put security at risk and pose serious threats to the success of our projects.
Goal3: Enumerate and explain the worst installation and administration practices for SQL Server instances, offering the correct alternatives. Topics covered: HW choice, OS policies, security, ongoing administration, monitoring and tuning.
Topic: I like goals
Subjective rating: interesting, but level too low
Abstract: The outline and details of this abstract are well written
Topic: This is a great topic
Subjective: I may attend this session
The outline does not seem to clearly describe the contents of the presentation. The level of detail seems low – more detail might help attendees decide on value of attending. The title may not attract the appropriate attendees – it seems a little vague.
Abstract: Abstract is a little muddled. Goals are clearly laid out. Goals seem to contain the entire gamut of SQL Server. Perhaps a more focused area of SQL Server would help.
Topic: Title is cute.
Subjective: Could be a fun session.
Feedback on comments:
“Level too low”. Level is intentionally low: it’s meant as an introductory session. What to avoid is exactly that: an introductory topic for beginners. Have you ever noticed that basic introductory sessions pack the room? Should we take it as a clue that attendees want 100-level sessions? Apparently not.
“The outline does not seem to clearly describe the contents”. True, I agree. Next time I submit this session I will include in the abstract more details about the contents. It seems that including those details in the goals is not enough.
“Goals seem to contain the entire gamut of SQL Server”. True. I don’t see how this is a bad thing for an introductory session.
This feedback is much more useful than the other ones.
The shape of your Workload: Benchmarking and Baselining
Not Accepted: Higher rated session selected.
Category: General Session (75 minutes)
Track: Enterprise Database Administration & Deployment
Topic: Performance Monitoring / Tuning / Extended Events / Waits
Abstract: The key to optimizing SQL Server performance is to establish a performance baseline and thoroughly analyze the workload on the server. Collecting a baseline is not enough: it is important to analyze the workload in order to intervene effectively exactly where performance issues lie.
In this session we will describe the techniques and tools to analyze SQL Server performance and we will introduce benchmarking techniques that allow us to rate our tuning efforts. We will also introduce some tools included in SQL Server 2014, such as the Distributed Replay, and several third-party applications that come at little or no cost, but provide the highest benefit.
Prerequisites: Basic SQL Server performance tuning techniques (DMVs, Performance Counters), basic monitoring techniques (SQL Trace, Extended Events).
Goal1: Introduce SQL Server performance analysis tools (DMVs, Performance Counters, Data Collector) and demonstrate how to use them to collect a baseline.
Goal2: Introduce workload analysis techniques, using SQL Server and third party tools, including cache analysis queries, RML Utilities and ClearTrace.
Goal3: Demonstrate benchmarking techniques which will allow us to compare performance before and after applying tuning measures, using RML utilities, Distributed Replay and Qure Analyzer.
Needs more specifics in abstract.
A : 5 The abstract clearly spells out the need for baseling, and benchmarking, to identify performance workload.
T : 5 The topic maps well with the session goals – and makes use of the Dist Replay Controller/Client to replay the prod workload on lower environments, and making use of RML utils and ClearTrace
S : 5 The prereqs are not steep, and the session offers a lot to replay prod workload on lower envs.
Topic is a good and relevant choice that appeals to a distinct attendee pool. Level appears good for the prereqs and goals.
Abstract is ok. Some word choices make the sentences difficult to follow (ie. ‘intervene effectively exactly’).
Abstract: Well written abstract with strong, supportive goals. Enough information is provided from which an attendee can make an informed decision on whether or not to attend this session.
Topic: Excellent topic. Likely a 200 level session, not 300.
Subjective: Great session which should draw in attendees. This is a skill every database professional needs.
Feedback on comments:
“Needs more specifics in abstract” and “enough information is provided” in the same feedback. Not very helpful.
“Some word choices make the sentences difficult to follow”. I’ve already said what I think about language comments.
Reading the comments, you would think it made the cut, but it didn’t. What can I say? Competition is tough: there are lots of great speakers with lots of great sessions.
“Likely a 200 level session, not 300”. I disagree: I’ve never seen anyone less than experienced fiddling with RML Utilities or distributed replay.
The comments I received this time are less helpful than the ones I got last year on the same exact sessions. Yep: I submitted the same sessions last year and I got very useful comments that I used to improve my submissions this year. Unfortunately, it will be hard to improve next year.
The level discussion is completely subjective. I was under the impression that here in Europe we tend to rate sessions lower than Americans do, so a 200 session in Europe could easily be a 300 session in USA. Maybe my impression was wrong. However, it’s interesting to see how the session that I proposed with level 100 was deemed to be underrated and the one that I proposed with level 300 was deemed to be overrated. Levels are part of the session: the speaker chooses which level to go for and I would have a really hard time as a reviewer arguing against that decision without having seen the session.
Regarding the reason for rejection, I am not completely sure it can be helpful for improving next year’s submissions. For instance, my session on Extended Events got “Higher rated session selected” as the reason for rejection, but looking at the schedule there is no other session on Extended Events (which is quite surprising) and no other session on monitoring (which is even more surprising and maybe a bit disappointing). This means that the higher rated session does not have to be on the same topic, which makes the other reason for rejection (Other sessions selected based on building a balanced program for track coverage, speaker coverage, topic coverage, and session rating) difficult to decipher.
- You’re selling the abstract, not the session
The reviewers are evaluating your abstract, not your session. All your efforts should be pushed towards perfecting your abstract. Craft it as your masterpiece. Start well in advance, don’t wait for the last minute.
- Take as many shots as you can
Looking at the selection process from the outside it’s really hard to tell how it works, so submitting multiple abstracts can increase your chances of scoring a session. Maybe that session that you don’t like is exactly what the committee is looking for. Again, you’re selling only the abstract and if you get selected you have several months to improve the session.
- There’s a review service offered by PASS. Use it.
There is no guarantee that it is going to be the best feedback you will get, but it’s some feedback at least.
- You have friends in your #sqlfamily: ask for advice.
If you don’t trust the feedback that you get from the PASS review service, trust your friends. If you’re submitting for PASS Summit, chances are that you already have spoken at SQLSaturdays or other community events (if not, maybe you should think twice before submitting: PASS Summit is not for inexperienced speakers). In this case, you got in touch with the community leaders in your chapter, who probably keep scoring sessions at Summit year after year: ask them to review your sessions. I am sure that they will have precious suggestions for you.
- If you’re not an English native speaker, ask a native speaker to proofread it
Sure, your English is great. Sure, the reviewers have something to say about native speakers’ English as well. Sure, language should not be the focus of the reviewers. However, see lesson learned #1.
- Speaker/Abstract separation is a joke
If you read Brent’s feedback, one thing emerges quite clearly: in many cases, the reviewers know exactly whose abstract they are reviewing. This might be because they already have seen the session somewhere else or because the abstract contains some “distinctive features” (sp_AskXXX anyone?). This means that some speakers have an advantage over the average Joe, but rightly so: those speakers are awesome and well known for being awesome. I’m not saying that the “Top Guns” can submit their grocery list: what I’m saying is that they are not selling only their abstracts. Now that Brent has published his abstracts, I wonder what would happen next year if somebody submitted one of his sessions… 🙂
Being selected for speaking at the PASS Summit is damn hard. I hope that sharing my feedback will help you improving your submission next year. I also hope that you will find my advice useful, if you dare accepting advice from somebody who never managed to score a session himself.
The next few months will be crazy for me. I will be travelling a lot around Europe, speaking about SQL Server topics.
Here is where you will find me in the upcoming weeks:
- May 4: SQLNexus
SQLNexus is a new conference in Copenhagen and will be the official launch event in Denmark for SQL Server 2016.
I will speak about “Responding to Extended Events in near Real-Time”.
- May 5: SQLBits
SQLBits doesn’t need to be introduced. It’s the biggest SQL Server event in Europe and it will also be the official SQL Server 2016 launch event in UK.
Again, I will speak about “Responding to Extended Events in near Real-Time”.
- May 16: NTK
NTK is a famous conference in Slovenia, which has been running for many years now. This year it will be held in Portorosz, near the Italian border.
It’s going to be fun: I will speak about “Benchmarking, Baselining and Workload Analysis”.
- June 14: INSIDE-SQL
INSIDE-SQL is also a new event, run this year for the first time. It’s an event focused on in-depth concepts, delivered in 75 minutes sessions, with long breaks between sessions, so that attendees can go to the next session without rushing and can hang out with the speakers and ask questions.
I will deliver two sessions:
“Responding to Extended Events in near Real-Time”
“Advanced T-SQL Techniques”
- June 24: SQL Saturday Paris
Free SQL Server training from the 40th floor of the Tour Montparnasse in Paris.What else can I say?
I will be speaking about “New Security Features in SQL Server 2016”
- September 10: SQL Saturday Cambridge
One of the coolest SQLSats around!
This time I’ll be speaking about “SQL Server Infernals”, AKA “how to torture your SQL Server instances to death with worst practices”. It’s going to be fun!
I hope to see you at these super cool events!
This will be no surprise to those who have been working with SQL Server for a long time, but it can be puzzling at first and actually I was a bit confused myself when I stumbled upon this behavior for the first time.
SQL Server treats windows users in a special way, a way that could lead us to some interesting observations.
First of all, we need a test database and a couple of windows principals to perform our tests:
1. In a command prompt, create a windows group named ‘testGroup’
net localgroup testGroup /ADD
2. In a command prompt, create a windows user named ‘testUser’ and add it to the group
net user testUser "testUser" /ADD net localgroup testGroup testUser /ADD
3. In SSMS, create a test database:
CREATE DATABASE testWindowsUser;
Now that everything is set up, we can start our investigation.
You can create a database user for a Windows user with no corresponding login
When dealing with windows users, you don’t need to create a login in SQL Server in order to create a database user, but you can create it directly:
USE testWindowsUser; GO CREATE USER [XPS13-SQLC\testUser]; GO
We just created a user in the database, without creating a login first and without having to add “WITHOUT LOGIN” to the CREATE USER statement. If you search for a login with the same SID in sys.logins, nothing will turn up:
SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%testUser';
login_name user_name default_schema_name ----------- -------------------- -------------------- NULL XPS13-SQLC\testUser dbo
It is interesting to note that the default schema for the user, if you don’t specify one, will be “dbo”.
At this point, the Windows user cannot log in to SQL Server, but the user can be impersonated to perform operations against the databases, in the same exact way as with any other user without login.
Before we proceed with the investigation, let’s clean up our mess:
DROP USER [XPS13-SQLC\testUser];
You can create a database user for a Windows group with no corresponding login
If we try to do the same thing with a Windows group, we get even more interesting findings.
USE testWindowsUser; GO CREATE USER [XPS13-SQLC\testGroup] GO
Again, we did not have to specify an existing login name and we did not have to add “WITHOUT LOGIN”. Looking at the user data in sys.database_principals and sys.logins shows again that no login is associated with this user, but this time we can see that no default schema was set.
SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%testGroup';
login_name user_name default_schema_name ----------- --------------------- -------------------- NULL XPS13-SQLC\testGroup NULL
As common sense suggests, we cannot impersonate a database user that corresponds to a Windows group. If we try that, we are met with an error message.
EXECUTE AS USER = 'XPS13-SQLC\testGroup'; -- '
Msg 15517, Level 16, State 1, Line 85 Cannot execute as the database principal because the principal "XPS13-SQLC\testGroup" does not exist, this type of principal cannot be impersonated, or you do not have permission.
Windows users are granted access through groups in surprising ways
Now the fun begins. Turns out that you can impersonate a Windows user that is not a user in the database, as long as the Windows user is a member of the group.
Let’s grant some permissions to the group and see what happens:
ALTER ROLE db_owner ADD MEMBER [XPS13-SQLC\testGroup]; GO EXECUTE AS USER = 'XPS13-SQLC\testUser'; -- ' EXEC('CREATE VIEW testView AS SELECT 1 AS one'); REVERT; GO
Surprisingly enough, we were able to impersonate a database principal that doesn’t exist at all. In fact we dropped it right before we started to play with the groups, right? Let’s check again the database principals:
SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%test%'; GO
login_name user_name default_schema_name ----------- --------------------- -------------------- NULL XPS13-SQLC\testGroup NULL NULL XPS13-SQLC\testUser XPS13-SQLC\testUser
Wait, what? Who created that user? Looks like SQL Server creates the database user automatically when it needs to impersonate one of the group members that are not created in the database yet.
Another interesting fact is that the default schema of the newly created user matches the user name. If we don’t like that, we can change it afterwards, but wouldn’t it be cool if we could simply change that on the Windows group and let it propagate to the users created during this process? Let’s try and see if this is possible.
-- First let's drop the user and dependent objects DROP VIEW [XPS13-SQLC\testUser].testView; DROP SCHEMA [XPS13-SQLC\testUser]; DROP USER [XPS13-SQLC\testUser]; GO -- Then let's change the default schema for the group ALTER USER [XPS13-SQLC\testGroup] WITH default_schema = dbo; GO -- Let's re-create the view impersonating the user EXECUTE AS USER = 'XPS13-SQLC\testUser'; -- ' EXEC('CREATE VIEW testView AS SELECT 1 AS One'); REVERT GO -- Check if the view is there SELECT OBJECT_SCHEMA_NAME(object_id) AS schema_name, name FROM sys.views WHERE name = 'testView'; GO -- Check database users: SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%test%'; GO
schema_name name ------------ --------- dbo testView login_name user_name default_schema_name ------------ --------------------- -------------------- NULL XPS13-SQLC\testGroup dbo
This time the view was placed in the “dbo” schema as expected (remember? We set the default schema on the group), but the really weird thing is that no new user was added. Why? I couldn’t find any answer in the documentation. It is even more puzzling that SQL Server was able to impersonate a user that is not present at all. Nevertheless, as far as I can remember it has always been like this, at least starting from SQL Server 2005.
To be sure we’re not imagining things, let’s change again the default schema of the Windows group:
-- Note the weird syntax ALTER USER [XPS13-SQLC\testGroup] WITH default_schema = NULL; GO -- Let's re-create the view impersonating the user EXECUTE AS USER = 'XPS13-SQLC\testUser'; -- ' EXEC('CREATE VIEW testView AS SELECT 1 AS One'); REVERT GO -- Check if the view is there SELECT OBJECT_SCHEMA_NAME(object_id) AS schema_name, name FROM sys.views WHERE name = 'testView'; GO -- Check database users: SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%test%'; GO
schema_name name -------------------- --------------------- dbo testView XPS13-SQLC\testUser testView login_name user_name default_schema_name -------------------- --------------------- -------------------- NULL XPS13-SQLC\testGroup NULL NULL XPS13-SQLC\testUser XPS13-SQLC\testUser
Again, SQL Server creates a new user and assigns it a default schema with the same name. The view was placed in the user’s default schema.
Now let’s revert to a clean database and drop some objects.
DROP VIEW dbo.testView; DROP VIEW [XPS13-SQLC\testUser].testView; DROP SCHEMA [XPS13-SQLC\testUser]; DROP USER [XPS13-SQLC\testUser]; GO
In order to have a complete picture, we can now check what happens if we create a login for the windows user.
CREATE LOGIN [XPS13-SQLC\testUser] FROM WINDOWS; GO EXECUTE AS LOGIN = 'XPS13-SQLC\testUser'; EXEC('CREATE VIEW testView AS SELECT 1 AS One'); REVERT GO -- Check if the view is there SELECT OBJECT_SCHEMA_NAME(object_id) AS schema_name, name FROM sys.views WHERE name = 'testView'; GO -- Check database users: SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%test%'; GO
schema_name name -------------------- --------------------- XPS13-SQLC\testUser testView login_name user_name default_schema_name -------------------- --------------------- -------------------- NULL XPS13-SQLC\testGroup NULL XPS13-SQLC\testUser XPS13-SQLC\testUser XPS13-SQLC\testUser
Again, SQL Server creates a new database user for this login and a schema with the same name. The view gets added to the user’s default schema.
What is interesting to note is that we could access the database by logging in as the Windows user, without having an explicit permission path to allow it. Only after accessing the database an explicit database user is created.
If we try the same thing with the default schema set to “dbo” on the windows group, the behavior matches what we got for the user:
-- let's try again with a default schema on the group ALTER USER [XPS13-SQLC\testGroup] WITH default_schema = dbo; GO -- cleanup DROP VIEW dbo.testView; DROP VIEW [XPS13-SQLC\testUser].testView; DROP SCHEMA [XPS13-SQLC\testUser]; DROP USER [XPS13-SQLC\testUser]; GO EXECUTE AS LOGIN = 'XPS13-SQLC\testUser'; -- ' EXEC('CREATE VIEW testView AS SELECT 1 AS One'); REVERT GO -- Check if the view is there SELECT OBJECT_SCHEMA_NAME(object_id) AS schema_name, name FROM sys.views WHERE name = 'testView'; GO -- Check database users: SELECT svp.name AS login_name, dbp.name AS user_name, dbp.default_schema_name FROM sys.database_principals AS dbp LEFT JOIN sys.server_principals AS svp ON dbp.sid = svp.sid WHERE dbp.name LIKE '%test%'; GO
This means that this behavior has to be taken into account when we’re querying permissions on the databases to see who can access what. In order to know whether a windows user is member of a windows group, we can use the system function IS_MEMBER().
In our case, it would be enough to call the function in this way:
EXECUTE AS LOGIN = 'XPS13-SQLC\testUser'; -- ' SELECT IS_MEMBER('XPS13-SQLC\testGroup'); REVERT GO
Unfortunately, there is no additional parameter available to specify which windows user to test for group membership and the function will always test against the current login. This has two important consequences:
- To test multiple logins, we need to impersonate them one at a time, which is difficult to achieve in scripts.
- There is no way to test database users: impersonating the database user will not be enough to test the windows group membership for the windows user associated to the database user.
Summing it up, this is what we have:
|Windows user impersonated as||Group’s Default Schema set||New user added||New user’s schema|
|Database User||No||Yes||User name|
The dbo user can be mapped to a non-existing Windows User
This is extremely tricky and I was bitten by this behavior recently, wasting countless hours trying to figure out what was happening.
Imagine that a developer creates a database in the development environment, while logged in with Windows authentication. This is a very common scenario, both if you’re using a centralized development server or developers’ workstations.
When development is complete, the developer hands us a database backup to restore in production, because the database already contains some data (built-in values in lookup tables or something else required by the application). We restore the database in production, where the developer does not have a login, confident that he will be unable to access it. Well, that’s not exactly what is going to happen.
Let’s see it with an example:
-- Grant dbcreator permissions to the windows user ALTER SERVER ROLE dbcreator ADD MEMBER [XPS13-SQLC\testUser] GO -- The user creates a database. It may happen in development, right? EXECUTE AS LOGIN = 'XPS13-SQLC\testUser'; CREATE DATABASE testWindowsGroup REVERT GO -- Let's pretend that developers finish working with the database -- and ask to move it from development to production -- He takes a backup and we restore it to production BACKUP DATABASE testWindowsGroup TO DISK = 'c:\temp\testWindowsGroup.bak'; GO -- We don't have two separate servers to test, we will simulate -- the same situation by dropping some objects. Let's drop the database. DROP DATABASE testWindowsGroup; GO -- Now let's drop the associated windows login DROP LOGIN [XPS13-SQLC\testUser] GO -- The database gets restored in production (again, we will use the same instance, -- but the main point is that the login of the database owner is not present). RESTORE DATABASE testWindowsGroup FROM DISK = 'c:\temp\testWindowsGroup.bak' GO -- Who's the owner of the database? You, the DBA. Looking good. SELECT db.name, svp.name FROM sys.databases AS db INNER JOIN sys.server_principals AS svp ON db.owner_sid = svp.sid WHERE database_id = DB_ID('testWindowsGroup');
Everything is set up the way it should: the database is restored to production, the DBA is the database owner, the developer has no access to the database and everybody is happy.
Now imagine that a new totally unrelated database gets added to the production instance, a database where everybody in the company must have read-only access. We create a new login for a Windows group that includes all authorized logins and we grant access to the new database.
Again, let’s see it with an example (using tempdb):
-- Let's add a login for the group and grant permissions on another database -- I will use tempdb here, but it could be any other database. CREATE LOGIN [XPS13-SQLC\testGroup] FROM WINDOWS; GO USE tempdb; GO -- The group gets added to tempdb with read-only permissions CREATE USER [XPS13-SQLC\testGroup] FOR LOGIN [XPS13-SQLC\testGroup]; ALTER ROLE db_datareader ADD MEMBER [XPS13-SQLC\testGroup]; GO -- Let’s go back to the database we restored previously USE testWindowsGroup; GO -- Now see what happens: EXECUTE AS LOGIN = 'XPS13-SQLC\testUser'; -- ' EXEC('CREATE VIEW testView AS SELECT 1 AS One'); REVERT GO
WTF? A windows user that has no matching SQL Server login could create a view in a database where no matching user exists? How could that happen?
-- Let's check the permissions of this user: EXECUTE AS LOGIN = 'XPS13-SQLC\testUser'; -- ' SELECT IS_ROLEMEMBER('db_owner') AS IAmDbOwner, IS_MEMBER('db_owner') AS AgainDbOwner; REVERT GO
IAmDbOwner AgainDbOwner ----------- ------------- 1 1
Weird. The login of the Windows user seems to be the database owner, but we checked upon restore that the DBA is the database owner. Moreover, there is no explicit login in the server for this Windows user and the only Windows group that contains this user does not have access at all to the database: how is this possible?
The answer is buried inside the database principals table:
-- The sid of the dbo user is still the sid of the user that created the database, -- even if the windows user is not a login in SQL Server SELECT sid FROM sys.database_principals WHERE name = 'dbo';
We got access to the database thanks to a windows group added to grant access to a totally unrelated database. The group has no permissions whatsoever on the database, but it allows its members to log in: the actual permissions on the database are granted through the SID of the dbo user.
This can be spotted immediately with the GUI, which is a fact that hides the resolution even more from experienced DBAs that don’t use the GUI at all.
Windows authentication comes handy most of the times and it offers better protection compared to SQL Server authentication, but sometimes the way SQL Server treats Windows users can be puzzling.
Don’t take things for granted and check now if you really understand who has access to your databases: the answer might be very different from what you expect.
P.S. If you’re wondering why I placed a comment with a quote after every “EXECUTE AS” statement, it’s just to fix this WordPress wacky syntax highlighter going nuts.
Today, thanks to a customer, I discovered one of those annoying little things that can really drive you nuts.
Basically, they were trying to restore a backup using the SSMS Restore Database window and they kept getting “No backupset selected to be restored” whenever a backup file was selected.
You just had to select a file for restore and click OK…
… to be met with an error message in the Restore Database window:
The weird thing about it is that the backup file restored perfectly fine from a T-SQL script:
So it had to be something wrong with SSMS, but what?
Looking closer at the restore script, one thing stands out. Look at the file name:
Yep, there’s a leading whitespace in the file name. Could that be the source of the problem?
Let’s try again with the GUI in a slightly different way. This time I will copy the folder path from the “Backup File Location” textbox…
… and paste it directly in the “File name” textbox, right before the file name:
This time everything works as expected.
- This is a bug in SSMS: go on and vote this Connect item to have it fixed in a future version.
- Don’t use the GUI to restore a database.
- Don’t use the GUI at all.
I spent the last few weeks travelling a lot compared to my usual routine and I got quickly fed up with bringing my mobile workstation with me. My Dell precision M4600 is a powerhouse, but it’s definitely too big and too heavy for travelling, so I decided it was time to buy a small laptop just for travelling and presenting at SQL Saturdays and events around Europe.
After lots of research, I pulled the trigger on a new XPS13. It’s a very nice, tiny and light machine, beefy enough to run my virtual machines and slim enough to fit in a small bag.
So, everything fine? Not quite. The XPS13 2015 sports a stunning QHD touch display, with almost no bezel: a small wonder. The only complaint with it is windows and its (in)ability to scale applications when running at High DPI.
Windows got better at scaling applications on HiDPI displays and Windows 10 is doing a remarkably good job at scaling applications that can actually scale. I am running at maximum resolution (3200 x 1800) with 200% scale factor and, for instance, Visual Studio 2015 renders perfectly:
Other applications scale horribly. For instance, SSMS (SQL Server Management Studio) 2014 is a complete disaster:
At a first glance, it looks ok, but look at those icons on the Object Explorer tree: aren’t they a little bit too small compared to the labels next to them?
I can live with small icons, but you just need to open some dialog windows to understand how bad things really are:
As you can see, text gets scaled correctly, but the size of the window and some controls are completely wrong.
While some dialog windows are easily fixed by resizing them, some other are completely unusable. For instance, the restore database dialog, when locating a backup file, looks like this:
A new hope from an old technique
In Windows Vista, you had two possible ways of scaling applications: with the first one (the default) applications were instructed to scale their objects using the scaling factor imposed by the operating system. The results, depending on the quality of the application and the Windows version, could vary a lot. Some scaled correctly, some other look very similar to what we are seeing in SSMS, with some weird-looking GUIs. In Vista, this option was called “XP style DPI scaling”.
The second option, which you could activate by unchecking the “XP style” checkbox, involved drawing the graphical components of the GUI to an off-screen buffer and then drawing them back to the display, scaling the whole thing up to the screen resolution. This option is called “bitmap scaling” and the result is a perfectly laid out GUI:
In order to enable this option in Windows 10, you need to merge this key to your registry:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\SideBySide] "PreferExternalManifest"=dword:00000001
Then, the application has to be decorated with a manifest file that instructs Windows to disable DPI scaling and enable bitmap scaling, by declaring the application as DPI unaware. The manifest file has to be saved in the same folder as the executable (ssms.exe) and its name must be ssms.exe.manifest. In this case, for SSMS 2014, the file path is “C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Binn\ManagementStudio\Ssms.exe.manifest”.
Paste this text inside the manifest file and save it in UTF8 encoding:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3"> <dependency> <dependentAssembly> <assemblyIdentity type="win32" name="Microsoft.Windows.Common-Controls" version="18.104.22.168" processorArchitecture="*" publicKeyToken="6595b64144ccf1df" language="*"> </assemblyIdentity> </dependentAssembly> </dependency> <dependency> <dependentAssembly> <assemblyIdentity type="win32" name="Microsoft.VC90.CRT" version="9.0.21022.8" processorArchitecture="amd64" publicKeyToken="1fc8b3b9a1e18e3b"> </assemblyIdentity> </dependentAssembly> </dependency> <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3"> <security> <requestedPrivileges> <requestedExecutionLevel level="asInvoker" uiAccess="false"/> </requestedPrivileges> </security> </trustInfo> <asmv3:application> <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings"> <ms_windowsSettings:dpiAware xmlns:ms_windowsSettings="http://schemas.microsoft.com/SMI/2005/WindowsSettings">false</ms_windowsSettings:dpiAware> </asmv3:windowsSettings> </asmv3:application> </assembly>
This “Vista style” bitmap scaling is very similar to what Apple is doing on his Retina displays, except that Apple uses a different font rendering algorithm that looks better when scaled up. If you use this technique in Windows, ClearType rendering is performed on the off-screen buffer before upscaling, so the final result might look a bit blurry.
The amount of blurriness you will see depends on the scale factor you set in the control panel or in the settings app in Windows 10. Needless to say that exact pixel scaling looks better, so prefer 200% over 225% or 250% scale factors, because there is no such thing as “half pixel”.
Seeing is believing:
That’s it: nice and easy. Enjoy your High DPI lifestyle, until Microsoft comes up with a better solution (or a better SSMS).
Don’t tell me that you didn’t see it coming: at some point, Developers end up being put to hell by a DBA!
I don’t want to enter the DBA/Developer wars, but some sins committed by Developers really deserve a ticket to the SQL Server hell. In particular, some of those sins are perpetrated when not even a single line of code is written yet and they have to do with the way the development environment is set up.
What they say in Heaven
Before starting a software project, the angelic developers set up their environment in the best of all ways, with proper environment isolation and definition. In particular they will have:
- Development Environment: this is the place (ideally a dev’s desktop) where the development work is performed. It should resemble the production environment as much as possible.
- Test Environment (QA): This is where the testers ensure the quality of the application, open bugs and review bug fixes. It should be identical to the production environment (in Heaven it is).
- User Acceptance Test Environment (UAT): this is where the clients test the quality of third-party applications, request features and file bugs.
- Staging Environment (Pre-Production): this environment is used to assemble, test and review newer versions of the database before it is moved into production. The hardware mirrors that of the production environment.
- Production Environment: This is where the real database lives. It can be updated from the staging environment, when available, as well as new functionality and bug fixes release from UAT or staging environment.
If your organization or the project are small, you probably don’t need all of these environments. In Heaven, where time and money are not a constraint, they have all of them and they’re all identical to production. Heh, Heaven is Heaven after all…
Environmental sinners will face SQL Server’s judgement
Setting up your development environment in the wrong way can harm SQL Server (and your software) in many ways, right from the start of the project, throughout its whole lifetime. Let’s see some of the most common sins:
- Using the production environment for development: frankly, I don’t think this sin needs any further explanation. On the other hand, don’t assume that nobody’s doing it, despite we’re in 2015: lots of damned developers’ souls confess this sin while entering the SQL Server hell!
- Using the test environment for development: Again, this seems so obvious that there should be no need to discuss it: development is development and test is test. The test environment(s) should be used to test the application, not to see it breaking every minute because you changed something. Developing the code and testing it are two different things and, even if you happen to be in charge of both, this is not a good reason to confuse the two tasks.
- Using a shared instance for development: Back in the old days, when I was working as an ASP classic developer in a software house, we had a shared development environment on a central IIS server, where everyone saved their code on a shared folder and just had to hit F5 in Internet Explorer to see the changes immediately in action.
If you think this model is foolish you’re 100% right, but in the 90s’ we didn’t know better. However, while everyone today agrees that it’s a terrible idea for code, you will still find hordes of developers not completely convinced that it’s an equally terrible idea as far as the database is concerned. Having a shared development database greatly simplifies the process of creating a consistent development database, which is a problem only if you have no authoritative source to build it from (which brings us to the next sin).
- No source control: Nobody in their right mind would start a software project today without using source control, yet source control for the database is still an esoteric topic, despite the plethora of tools to accomplish this task.
- Granting sysadmin rights to the application: If you’re using a local development instance (and you should), you probably are the administrator of that instance. Hey, nothing wrong with that, unless you use windows authentication in your application. In that case, whenever you debug the application in Visual Studio (or whatever you’re using), the application impersonates you (a sysadmin) when hitting the database, so there is no need to grant any permission in order to let the app perform anything on the instance.
So, what happens when you’re done with development and you have to deploy in test (or, worse, production)? Exactly: nothing works, because (hopefully) the application won’t run with sysadmin privileges in production. At that point, extracting the complete lists of permissions needed by the application is an overwhelming task that you could have happily avoided by developing with a non-privileged user in the first place. When using a regular user, each time the application needs additional permissions, you simply have to add a GRANT statement to the deployment script, which also acts as the documentation the DBA will ask for.
If you fail to provide this documentation, two things could happen: a) the DBA may refuse to deploy the database b) you could end up needing sysadmin privileges, which means a dedicated instance, which could in turn bring us back to a).
- Developing on a different version/edition from production: if your application is targeting SQL Server 2008 R2, developing on SQL Server 2012 could mean that you will discover incompatible T-SQL features after development. The same can be said for the SQL Server edition: if you are using a Developer Edition for development but you are targeting Standard Edition, you will discover the use of enterprise-only features when it’s too late. You can save yourself all the pain by using in development the same exact SQL Server version and edition you are targeting in production.
In the next episodes of SQL Server Infernals I’m afraid I will have to put more developers to hell. If you’re a developer, stay tuned to find out if your soul is a at risk! If you’re a DBA, stay tuned to enjoy seeing more developers damned!
There’s a place in the SQL Server hell where you can find poor souls wandering the paths of their circle, shouting nonsense table names or system-generated constraint names, trying to baptize everything they find on their way in a different manner. They might seem innocuous at a first glance, but beware those damned souls, as they can raise confusion and endanger performance.
What they say in Heaven
Guided by the Intelligent Designer’s hands, database architects in Heaven always name their tables, columns and all database objects following the rules in the ISO 11179 standard. However, standards aside, the most important thing they do is adhere to a single naming convention, so that every angelic DBA and developer can sing in the same language.
It has to be said that even in Heaven some angels prefer specific naming conventions and some other angels might prefer different ones (say plural or singular table names), but as soon as they start to design a database, every disagreement magically disappears and they all sing in harmony.
Damnation by namification
Some naming conventions are better than others, but many times it all comes down to personal preference. It’s a highly debatable subject and I will refrain from posting here what my preference is. If you want to learn more about naming conventions, take advice from one of the masters.
That said, some naming conventions are really bad and adopting them is a one way ticket to the SQL Server hell:
- Hungarian Notation: my friends in Hungary will forgive me if I say that their notation doesn’t play well with database objects. In fact, the Hungarian Notation was conceived in order to overcome the lack of proper data types in the BCPL language, putting a metadata prefix in each variable name. For instance, a variable holding a string would carry the “str” prefix, while a variable holding a long integer would carry the “l” prefix.
SQL Server (and all modern relational databases) have proper data type support and all sorts of metadata discovery features, so there is no point in naming a table “tbl_customer” or a view “vwSales”. Moreover, if the DBA decides to break a table in two and expose its previous structure as a view (in order to prevent breaking existing code), having the “tbl” prefix in the view name completely defeats the purpose of identifying the object type by its prefix.
Next time you’re tempted to use the Hungarian Notation ask yourself: “is my name John or DBA_John?”
- Using insanely short object names: Some legacy databases (yes, you, DB2/400) used to have a hard maximum of 10 characters for object names. It wasn’t uncommon to see table names such as “VN30SKF0OF” or “PRB10SPE4F”: good luck figuring out what those tables represented!
Fortunately, those days are gone and today there is no single reason to use alphabet soup names for your objects. The object name is a contract between the object and its contents and it should be immediately clear what the contents are by just glancing at the name.
- Using insanely long object names: On the other hand, table names such as “ThisIsTheViewThatContainsOrdersWhichAreYetToBeShipped” adds nothing to clarity of the schema. “UnshippedOrders” will do just as well.
- Mixing Languages: if you’re fortunate enough to be a native English speaker, you have no idea what this means. In countries such as Italy or Spain, this is a real issue. Many people may end up designing different parts the database schema and each designer may be inclined to use English (the lingua franca of Information Technology) or his/her first language. Needless to say that the result is a mess.
- Using the “sp_” prefix for stored procedures: it’s a special case of Hungarian Notation, with severe performance implications. In his blog, Aaron Bertrand discussed the notorious negative impact of the “sp_” prefix, offering a performance comparison with charts and crunchy numbers.
TL;DR version: SQL Server looks up objects with the sp_ prefix in the master database first, then in the user database. While it may look like a negligible performance issue, it can explode at scale.
- Using reserved keywords or illegal characters: While it’s still possible to include almost anything inside square brackets, the use of spaces, quotes or any other illegal character is a totally unneeded masochistic habit. Reserved keywords may also add a thrilling touch of insane confusion to your T-SQL code:
SELECT * FROM [TRUNCATE] [TABLE]
- Using system-generated names for constraints, indexes and so on: When you don’t name your constraints and indexes explicitly, SQL Server is kind enough as to do it for you, using a semi-random system-generated name. That’s great! Uh, wait a moment: this means that two databases deployed to two different instances will contains the same index with a different name, making all your deployment scripts nearly useless. Do yourself a favor and take the time to name all your objects explicitly.
- No naming convention or multiple, inconsistent naming conventions: The worst of all mistakes is having multiple naming conventions, or no naming convention at all (which is equal to “as many naming conventions as objects in the database”). Naming conventions is a sort of religious subject and there are multiple valid reasons to adopt one or another: the only thing you should absolutely avoid is turning your database into a sort of Babel tower, where multiple different languages are spoken and nobody understands what the others say.
This is the last circle of SQL Server hell dedicated to Database Design sins: in the next episode of SQL Server Infernals we will venture into the first circle dedicated to development. Stay tuned!
Constraints are sometimes annoying in real life, but no society can exist without rules and regulations. The same concept is found in Database Design: no good data can exist without constraints.
What they say in Heaven
Constraints define what is acceptable in the database and what does not comply with business rules. In Heaven, where the perfect database runs smoothly, no constraint is overlooked and all the data obeys to the rules of angels:
- Every column accepts only the data it was meant for, using the appropriate data type
- Every column that requires a value has a NOT NULL constraint
- Every column that references a key in a different table has a FOREIGN KEY constraint
- Every column that must comply with a business rule has a CHECK constraint
- Every column that must be populated with a predefined value has a DEFAULT constraint
- Every table has a PRIMARY KEY constraint
- Every group of columns that does not accept duplicate values has a UNIQUE constraint
Chaos belongs to hell
OK: Heaven is Heaven, but what about hell? Let’s see what will get you instant damnation:
- Using the wrong data type: we already found out that the SQL Server hell is full of Shaky Typers. The data type is the first constraint on your data: choose it carefully.
- No PRIMARY KEY constraints: In the relational model, tables have primary keys. Without a primary key, a table is not even a table (exception made for staging tables and other temporary objects). Do you want duplicate data and unusable data? Go on and drop your primary key.
- NULL and NOT NULL used interchangeably: NOT NULL is a constraint on your data: failing to mark required columns with NOT NULL will inevitably mean that you’ll end up having missing information in your rows. At the same time, marking all columns as NOT NULL will bring garbage data in the database, because users will start using dummy data to circumvent the stupid constraint. We already met these sinners in the First Circle of the SQL Server hell.
- No Foreign Key constraints: Foreign Keys can be annoying, because they force you to modify the database in the correct order, but following the rules pays off. Without proper constraints, what would happen if you tried to delete from a lookup table a key referenced in other tables? Unfortunately, it would work, silently destroying the correctness of your data.
What would happen if you tried to sneak in a row that references a non-existing key? Again, it would bring in invalid data.
- No CHECK constraints: Many columns have explicit or implicit constraints: failing to add them to the database schema means that values forbidden by the business rules will start to flow into the database. Some constraints are implicit, but equally important as the explicit ones. For instance:
- an order should never be placed in a future date
- a stock quantity should never be negative
- a ZIP code should only contain numeric characters
- a Social Security Number should be exactly 9 digits long
- Relying on the application to validate data: If I had €0.01 for every time I found invalid data in a database and the developers said “the application will guarantee consistency”, I would be blogging from my castle in Mauritius. Maybe the application can guarantee consistency for the data that it manipulates (and it won’t, trust me), but it can do nothing for other applications using the same database. Often the database is a hub for many applications, each with its own degree of complexity and each with its level of quality. Pretending that all these applications will independently guarantee that no invalid data is brought in is totally unrealistic.
The last circle of SQL Server hell dedicated to Database Design sins is the circle of Inconsistent Baptists, those who fail to comply to sensible naming conventions. Stay tuned!
Choosing the right data type for your columns is first of all a design decision that has tremendous impact on the correctness of the database schema. It is not just about performance or space usage: the data type is the first constraint on your data and it decides what can be persisted in your columns and what is not acceptable.
Choosing the wrong data type for your columns is a mistake that might make your life as a DBA look like hell.
What they say in Heaven
Guided by angelic spells, the hands that design databases in Heaven always choose the right data type. Database architects always look at the logical schema and ask the right questions about each attribute and they always manage to understand what the attribute is used for and what it will be used for in the future.
What will put you to hell
Choosing the wrong data type is like trying to fit a square peg in a round hole. The worst thing about it is that you end up damaging the peg… ahem… the data.
- Using numeric data types for non-numeric attributes: Even if a telephone number contains only digits and it’s called telephone number, it is not a number at all. It does not allow mathematical operations and it has no order relation (saying that a telephone number is greater than another one makes no sense). In fact, a telephone number is a code you have to dial to contact a telephone extension. The same can be said for ZIP codes, which only allow numeric digits, but are nothing like a number. Storing this data in a numeric column is looking for trouble.
- Storing data as their human-readable representation: A Notable example is dates stored as (var)char. The string representation of a date is not a date at all: without the validation rules included in the date types, any invalid date could be saved in your column, including ‘2015-02-30’ or ‘2015-33-99’. Moreover, varchar columns do not allow date manipulation functions, such as DATEADD, DATEDIFF, YEAR, MONTH and so on. Another reason why this is a terrible idea is that dates have their own sorting rules, which you lose when you store them as strings. You also need more storage space to save a string representation of a date compared to the proper date type. If you really want to convert a date to a string, you can find many algorithms and functions to perform the conversion in this article I wrote for SQLServerCentral in 2012, but please do it in your presentation layer, not when storing the data in your tables.
Another surprisingly common mistake in the AS/400 world is storing dates in three separate integer columns for year, month and day. I have no idea where this pattern comes from, but it definitely belongs to hell.
While much more uncommon in the wild, the same applies to numbers: storing them as varchars is a terrible idea.
Extra evil bonus: you get double evil points for storing dates and numbers as nvarchar: double the storage, double the pain.
- Using deprecated data types: (n)text and image are things of the past: get over it. The replacement (n)varchar(max) and varbinary(max) are much more powerful and flexible.
- Using “extended” data type just to “be safe”: This applies both to numeric and character columns: using a bigger data type just to play it safe can be a good idea at times, but not when the size of the column is well known upfront and is instead a vital constraint on the data itself. For instance, a ZIP code longer than 5 characters is obviously an error. A social security number longer than 9 digits is not valid.
Along the same lines, storing years in a int column is only going to be a waste of storage space. The same can be said about small lookup tables with just a handful of rows in them, where the key column can be a smallint or even a tinyint: it won’t save much space in the lookup table itself, but it can save lots of space in the main tables (with many more rows) where the code is referenced.
- Storing fixed-size information in varchar columns: Similarly to the previous sin, when your attribute has a fixed character size, there is no point in using a varying character type. If your attribute has exactly 3 characters, why use varchar(3)?
Extra evil bonus: varchar(1) will get you double points.
- Storing duration in time or datetime columns: Datetime and time represent points in time and they are not meant for storing durations. If you really want to store a duration, use a numeric column to store the number of seconds (it’s the ANSI standard unit measure for representing a duration). Even better, you could store the start/end date and time in two separate datetime columns. SQL Server 2016 also supports periods.
- Getting Unicode wrong: Choosing nvarchar for attributes that will never contain Unicode data and choosing varchar for attributes that can contain Unicode data are equally evil and will get you instant damnation. For instance, a ZIP code will only contain numeric characters, so using Unicode data types will have the only outcome of wasting space. At the same time, storing customer business names or annotations in varchar columns means that you won’t be able to persist international characters. While it may appear quite unlikely that such characters will ever appear in your database, you will regret your decision when that happens (and it will).
- Messing with XML: I’m not a big fan of XML in the database, but sometimes it can come handy. Storing XML data in a plain varchar column is a very bad idea. The XML data type provides validation rules that won’t allow in invalid or malformed XML and also provides functions to manipulate the XML data. Storing schema-less XML is another bad idea: if you have an XML schema use it, otherwise you will end up saving invalid data. On the other hand, using XML to go “beyond relational” and mimic Oracle’s nested tables will only get you damned. Fun times.
- Using different data types in different tables for the same attribute: there’s only one thing worse than getting your data types wrong: getting them wrong in multiple places. Once you decided the data type to store an attribute, don’t change your mind when designing new tables. If it is a varchar(10), don’t use varchar(15) in your next table. Usually proper foreign key constraints help you avoid this issue, but it’s not always the case.
If this query returns rows, chances are that you have schizophrenic columns in your database schema:
WITH my_schema AS ( SELECT OBJECT_NAME(c.object_id) AS table_name, c.name AS column_name, t.name AS type_name, c.max_length, c.precision, c.scale FROM sys.columns AS c INNER JOIN sys.types AS t ON c.system_type_id = t.system_type_id ), incarnations AS ( SELECT *, DENSE_RANK() OVER ( PARTITION BY column_name ORDER BY type_name, max_length, precision, scale ) AS incarnation_number FROM my_schema ), incarnation_count AS ( SELECT *, MAX(incarnation_number) OVER ( PARTITION BY column_name ) AS incarnation_count FROM incarnations ) SELECT * FROM incarnation_count WHERE incarnation_count > 1 ORDER BY incarnation_count DESC, column_name, type_name, max_length, precision, scale;
The lack of proper constraints will be the topic of the next post, when we will meet the anarchic designers. Stay tuned!
Object-Oriented programming taught us that generalizing is a good thing and, whenever possible, we should do it. Complex class hierarchies are a good way of reusing code, hitting the specialized classes only when a special implementation is needed.
In the database world, the concept doesn’t play exactly well.
What they say in Heaven
In Heaven, there is a lookup table for each attribute, no matter how simple and no matter how small is the lookup table.
For instance, if your database is about sales, you probably have a Customers table and an Orders table, each with its own attributes resolved through a Foreign Key. The lookup tables are usually very small, with just a handful of rows in them:
Temptation comes from our own desires
Wouldn’t that be great if you could stop adding small, insignificant tables to your database schema? Wouldn’t it be a lot easier if you had ONE table to store all that lookup nonsense? “Less is more” after all, isn’t it?
If you had a “One True Lookup Table”, everything would be more elegant and simple. Look at this database schema:
Isn’t it elegant and clean?
INSERT INTO LookupTable (table_name, lookup_code, lookup_description) VALUES ('Order_Status', 'OP', 'Open'), ('Order_Status', 'CL', 'Closed'), ('Order_Status', 'SH', 'Shipped');
Devil’s in the details
You may have less tables to deal with now, but there’s a price to pay. A bigger price than you would have expected.
- No foreign keys: Did you notice that the foreign keys are gone? In order to create a foreign key, you would have to add the lookup table name to the Orders and Customers tables, for each attribute stored in the lookup table. I don’t think you would like it.
- Generic data type: In order to merge all lookup tables in one, you need to choose a “generic” data type that fits for all. The most generic data type is a character-based type, so you’ll probably end up with a huge nvarchar column. You probably don’t want the same huge column in the referencing tables and you could end up having different data types between the main tables and the lookups. One more not-so-good idea. Moreover, when you’re joining your tables with the lookup table, you will have implicit (or explicit) conversions happening, which is a performance nightmare.
- Single Hotspot: Instead of hitting multiple tables for lookups, everyone will hit the same table over and over. This will create a hotspot in the database, with locking and latching issues all over the place.
- Acrobatic constraints: Defining constraints on a generic table becomes very difficult. Not an impossible deal, but very difficult. For the schema in this example, you could define a CHECK constraint to enforce the use of the correct data type, but the syntax of the constraint will not be very straightforward:
CHECK( CASE WHEN lookup_code = 'states' AND lookup_code LIKE '[A-Z][A-Z]' THEN 1 WHEN lookup_code = 'priorities' AND lookup_code LIKE '[0-9]' THEN 1 WHEN lookup_code = 'countries' AND lookup_code LIKE '[0-9][0-9][0-9]' THEN 1 WHEN lookup_code = 'status' AND lookup_code LIKE '[A-Z][A-Z]' THEN 1 ELSE 0 END = 1 )
It could get even worse
As soon as you start to realize that trading multiple lookup tables for an OTLT is not a good deal, devil will raise the bid and offer the ultimate generalization: the Entity Attribute Value, also known as “EAV”.
If you come to think of it, who needs fixed attributes in a table when you can have as many attributes as you want in a general-purpose table? Why messing with ALTER TABLE statements when you can have a single table that can store an infinite number of attributes that you can bind to any row in any table?
A typical EAV schema looks like this:
This way, you can have any type of attribute bound to your main entities. For instance, to define a “ship_date” attribute in your Orders table, you just have to insert a couple of rows in your EAV schema:
INSERT INTO Entities (entity_id, entity_name) VALUES (1, 'Orders'); INSERT INTO AttributeNames (attribute_id, entity_id, attribute_name) VALUES (1, 1, 'ship_date'); INSERT INTO AttributeValues (attribute_id, entity_id, id, value) VALUES (1, 1, 123, '2015-06-24 22:10:00.000');
Looks like a great idea, doesn’t it? Unfortunately, it is not.
- Generic data types: again, what would prevent a date such as ‘2015-02-30 18:30:00.000’ from being assigned to the ship date? Uh-oh: nothing.
- No foreign keys: again, enforcing foreign key constraints would be impossible.
- A single hotspot in the database: every attribute for every table involved in this nonsense would have to be looked up in the same table.
- No constraints: how would you enforce a constraint as simple as “NOT NULL”? Good luck with that.
- Dreadful reporting queries: when you will be asked to create a report on a table that uses this paradigm (I said “when”, not “if”, because it will happen), you will have to OUTER JOIN to the EAV table for each and every attribute that you want to retrieve. In case you are wondering if this is good or bad, take into account that the optimizer starts to freak out when it finds too many JOINS in a query and will likely timeout looking for a decent execution plan, feeding you the best it could come up with (usually, a mess).
Some software solutions are entirely based on user-defined attributes and the ability to define them is a central feature. For instance, many CRM solutions are heavily dependent on user-defined attributes. However, there are many ways to achieve the same results without resorting to an EAV design. For instance, one could wonder why ALTERing the database schema seems to be a less desirable solution.
There are also many flavors of EAV, with different degrees of evil involved. Some implementations at least provide different columns for different data types, some others use XML or JSON.
The EAV design comes with the intent of solving a real world problem that doesn’t have a definitive answer in the relational model. In partial defense of the “generalizers”, it has to be said that this is a challenging problem. Nevertheless, like Dante put his political enemies to hell, I am the “poet” and I’m afraid that the generalizers will have to get accustomed to sulfur. It just takes a couple of thousand years, after all.
Who will be damned next?
In the next circle of the SQL Server hell we will meet the shaky typers – the poor souls that chose the wrong data types for their columns. Stay tuned for more!