Blog Archives

SQL Server Infernals – Circle 4: Anarchic Designers


Infernals

Constraints are sometimes annoying in real life, but no society can exist without rules and regulations. The same concept is found in Database Design: no good data can exist without constraints.

What they say in Heaven

Constraints define what is acceptable in the database and what does not comply with business rules. In Heaven, where the perfect database runs smoothly, no constraint is overlooked and all the data obeys to the rules of angels:

  • Every column accepts only the data it was meant for, using the appropriate data type
  • Every column that requires a value has a NOT NULL constraint
  • Every column that references a key in a different table has a FOREIGN KEY constraint
  • Every column that must comply with a business rule has a CHECK constraint
  • Every column that must be populated with a predefined value has a DEFAULT constraint
  • Every table has a PRIMARY KEY constraint
  • Every group of columns that does not accept duplicate values has a UNIQUE constraint

Chaos belongs to hell

OK: Heaven is Heaven, but what about hell? Let’s see what will get you instant damnation:

Anarchy

  1. Using the wrong data type: we already found out that the SQL Server hell is full of Shaky Typers. The data type is the first constraint on your data: choose it carefully.
  1. No PRIMARY KEY constraints: In the relational model, tables have primary keys. Without a primary key, a table is not even a table (exception made for staging tables and other temporary objects). Do you want duplicate data and unusable data? Go on and drop your primary key.
  1. NULL and NOT NULL used interchangeably: NOT NULL is a constraint on your data: failing to mark required columns with NOT NULL will inevitably mean that you’ll end up having missing information in your rows. At the same time, marking all columns as NOT NULL will bring garbage data in the database, because users will start using dummy data to circumvent the stupid constraint. We already met these sinners in the First Circle of the SQL Server hell.
  1. No Foreign Key constraints: Foreign Keys can be annoying, because they force you to modify the database in the correct order, but following the rules pays off. Without proper constraints, what would happen if you tried to delete from a lookup table a key referenced in other tables? Unfortunately, it would work, silently destroying the correctness of your data.
    What would happen if you tried to sneak in a row that references a non-existing key? Again, it would bring in invalid data.
  1. No CHECK constraints: Many columns have explicit or implicit constraints: failing to add them to the database schema means that values forbidden by the business rules will start to flow into the database. Some constraints are implicit, but equally important as the explicit ones. For instance:
    • an order should never be placed in a future date
    • a stock quantity should never be negative
    • a ZIP code should only contain numeric characters
    • a Social Security Number should be exactly 9 digits long
  1. Relying on the application to validate data: If I had €0.01 for every time I found invalid data in a database and the developers said “the application will guarantee consistency”, I would be blogging from my castle in Mauritius. Maybe the application can guarantee consistency for the data that it manipulates (and it won’t, trust me), but it can do nothing for other applications using the same database. Often the database is a hub for many applications, each with its own degree of complexity and each with its level of quality. Pretending that all these applications will independently guarantee that no invalid data is brought in is totally unrealistic.

 The last circle of SQL Server hell dedicated to Database Design sins is the circle of Inconsistent Baptists, those who fail to comply to sensible naming conventions. Stay tuned!

Advertisements

Non-unique indexes that COULD be unique


In my last post I showed a query to identify non-unique indexes that should be unique.

You maybe have some other indexes that could be unique based on the data they contain, but are not.

To find out, you just need to query each of those indexes and group by the whole key, filtering out those that have duplicate values. It may look like an overwhelming amount of work, but the good news is I have a script for that:

DECLARE @sql nvarchar(max);

WITH indexes AS (
    SELECT
         QUOTENAME(OBJECT_SCHEMA_NAME(uq.object_id)) AS [schema_name]
        ,QUOTENAME(OBJECT_NAME(uq.object_id)) AS table_name
        ,uq.name AS index_name
        ,cols.name AS cols
    FROM sys.indexes AS uq
    CROSS APPLY (
        SELECT STUFF((
            SELECT ',' + QUOTENAME(sc.name) AS [text()]
            FROM sys.index_columns AS uc
            INNER JOIN sys.columns AS sc
                ON  uc.column_id = sc.column_id
                AND uc.object_id = sc.object_id
            WHERE uc.object_id = uq.object_id
                AND uc.index_id = uq.index_id
                AND uc.is_included_column = 0
            FOR XML PATH('')
        ),1,1,SPACE(0))
    ) AS cols (name)
    WHERE is_unique = 0
        AND has_filter = 0
        AND is_hypothetical = 0
        AND type IN (1,2)
        AND object_id IN (
            SELECT object_id
            FROM sys.objects
            WHERE is_ms_shipped = 0
            AND type = 'U'
        )
)
-- Build a big statement to query index data
SELECT @sql = (
    SELECT
        'SELECT ''' + [schema_name] + ''' AS [schema_name],
            ''' + table_name + ''' AS table_name,
            ''' + index_name + ''' AS index_name,
            can_be_unique =
                CASE WHEN (
                    SELECT COUNT(*)
                    FROM (
                        SELECT ' + cols + ',COUNT(*) AS cnt
                        FROM ' + [schema_name] + '.' + [table_name] + '
                        GROUP BY ' + cols + '
                        HAVING COUNT(*) > 1
                    ) AS data
                    ) > 0
                THEN 0
                ELSE 1
                END;'
    FROM indexes
    FOR XML PATH(''), TYPE
).value('.','nvarchar(max)');

-- prepare a table to receive results
DECLARE @results TABLE (
    [schema_name] sysname,
    [table_name] sysname,
    [index_name] sysname,
    [can_be_unique] bit
)

-- execute the script and pipe the results
INSERT @results
EXEC(@sql)

-- show candidate unique indexes
SELECT *
FROM @results
WHERE can_be_unique = 1
ORDER BY [schema_name], [table_name], [index_name]

The script should complete quite quickly, since you have convenient indexes in place. However, I suggest that you run it against a non production copy of your database, as it will scan all non unique indexes found in the database.

The results will include all the indexes that don’t contain duplicate data. Whether you should make those indexes UNIQUE, only you can tell.

Some indexes may contain unique data unintentionally, but could definitely store duplicate data in the future. If you know your data domain, you will be able to spot the difference.

Non-unique indexes that should be unique


Defining the appropriate primary key and unique constraints is fundamental for a good database design.

unique

One thing that I often see overlooked is that all the indexes with a key that includes completely another UNIQUE index’s key should in turn be created as UNIQUE. You could argue that such an index has probably been created by mistake, but it’s not always the case.

If you want to check your database for indexes that can be safely made UNIQUE, you can use the following script:

SELECT OBJECT_SCHEMA_NAME(uq.object_id) AS [schema_name],
    OBJECT_NAME(uq.object_id) AS table_name,
    uq.name AS unique_index_name,
    nui.name AS non_unique_index_name
FROM sys.indexes AS uq
CROSS APPLY (
    SELECT name, object_id, index_id
    FROM sys.indexes AS nui
    WHERE nui.object_id = uq.object_id
        AND nui.index_id <> uq.index_id
        AND nui.is_unique = 0
        AND nui.has_filter = 0
        AND nui.is_hypothetical = 0
) AS nui
WHERE is_unique = 1
    AND has_filter = 0
    AND is_hypothetical = 0
    AND uq.object_id IN (
        SELECT object_id
        FROM sys.tables
    )
    AND NOT EXISTS (

        SELECT column_id
        FROM sys.index_columns AS uc
        WHERE uc.object_id = uq.object_id
            AND uc.index_id = uq.index_id
            AND uc.is_included_column = 0

        EXCEPT

        SELECT column_id
        FROM sys.index_columns AS nuic
        WHERE nuic.object_id = nui.object_id
            AND nuic.index_id = nui.index_id
            AND nuic.is_included_column = 0

    )
ORDER BY [schema_name], table_name, unique_index_name

You may wonder why you should bother making those indexes UNIQUE.

The answer is that constraints help the optimizer building better execution plans. Marking an index as UNIQUE tells the optimizer that one and just one row can be found for each key value: it’s a valuable information that can actually help estimating the correct cardinality.

Does the script return any rows? Make those indexes UNIQUE, you’ll thank me later.

Enforcing Complex Constraints with Indexed Views


Some days ago I blogged about a weird behaviour of “table-level” CHECK constraints. You can find that post here.

Somehow, I did not buy the idea that a CHECK with a scalar UDF or a trigger were the only possible solutions. Scalar UDFs are dog-slow and also triggers are evil.

I also read this interesting article by Alexander Kuznetsov (blog) and some ideas started to flow.

Scalar UDFs are dog-slow because the function gets invoked RBAR (Row-By-Agonizing-Row, for those that don’t know this “Modenism”). If the UDF performs data access, the statement in the scalar function gets invoked for each row, hitting performance badly.

Triggers are evil, according to Glenn Berry (blog|twitter), because they are a “bad” way to implement referential integrity. Moreover, from a performance standpoint, even if triggers work with sets instead of rows (unlike UDFs), they fire an additional query (or even more than one).

However, I seem to have found a way to merge the “business logic” query plan into the same execution plan of the external DML statement that modifies the data.

The method I will discuss here makes use of Indexed Views.

First of all, we will need some tables.

And now some business rules:

  1. Users with a premium account can place orders with an unlimited total amount. Users with a normal account can place orders limited to a $1000 total amount.
  2. Minors cannot buy products in the ‘ADULT’ category.

Let’s create the tables and populate them with some sample data:

USE tempdb;
GO

-- Create users table
CREATE TABLE Users (
        user_id int PRIMARY KEY,
        user_name nvarchar(30) NOT NULL,
        birth_date date
)
GO

CREATE TABLE AccountTypes (
        account_type_id int PRIMARY KEY,
        account_type_code char(3) NOT NULL UNIQUE,
        account_type_description nvarchar(255) NOT NULL
)
GO

-- Create account table
CREATE TABLE Accounts (
        account_id int PRIMARY KEY,
        user_id int FOREIGN KEY REFERENCES Users(user_id),
        balance decimal(10,2),
        account_type_id int FOREIGN KEY REFERENCES AccountTypes(account_type_id)
)
GO

-- Create product categories table
CREATE TABLE ProductCategories (
        product_category_id int PRIMARY KEY,
        product_category_code char(5) NOT NULL UNIQUE,
        product_category_description nvarchar(255) NOT NULL
)
GO

-- Create products table
CREATE TABLE Products (
        product_id int PRIMARY KEY,
        EAN_code char(18) NOT NULL,
        product_description nvarchar(255) NOT NULL,
        product_category_id int FOREIGN KEY REFERENCES ProductCategories(product_category_id),
)
GO

-- Create orders table
CREATE TABLE Orders (
        order_id int PRIMARY KEY,
        user_id int FOREIGN KEY REFERENCES Users(user_id),
        total_amount decimal(10,2) NOT NULL CHECK(total_amount > 0),
        order_date datetime NOT NULL
)
GO

-- Create order details table
CREATE TABLE OrderDetails (
        order_id int NOT NULL FOREIGN KEY REFERENCES Orders(order_id),
        order_line int NOT NULL CHECK(order_line > 0),
        product_id int NOT NULL FOREIGN KEY REFERENCES Products(product_id),
        quantity int NOT NULL CHECK(quantity > 0),
        PRIMARY KEY(order_id, order_line)
)
GO

-- Insert sample data
INSERT INTO Users(user_id, user_name, birth_date)
        VALUES (1, N'Gianluca Sartori', '1977-11-25') –- This is me
INSERT INTO Users(user_id, user_name, birth_date)
        VALUES (2, N'Mladen Prajdić',   '1980-08-16') -- I suspect this is not Mladen’s birthday
INSERT INTO Users(user_id, user_name, birth_date)
        VALUES (3, N'Giulia Sartori',   '2009-07-02') -- This is my 2 year old baby girl
INSERT INTO AccountTypes(account_type_id, account_type_code, account_type_description)
        VALUES (1, 'NOR', N'Normal account')
INSERT INTO AccountTypes(account_type_id, account_type_code, account_type_description)
        VALUES (2, 'PRE', N'Premium account')

INSERT INTO Accounts(account_id, user_id, balance, account_type_id) VALUES (1, 1, 520, 2)
INSERT INTO Accounts(account_id, user_id, balance, account_type_id) VALUES (2, 2, 376, 2)
INSERT INTO Accounts(account_id, user_id, balance, account_type_id) VALUES (3, 3, 31,  1)

INSERT INTO ProductCategories(product_category_id, product_category_code, product_category_description)
        VALUES (1, 'MSCCD', N'Music CDs')
INSERT INTO ProductCategories(product_category_id, product_category_code, product_category_description)
        VALUES (2, 'TOONS', N'Disney Cartoons')
INSERT INTO ProductCategories(product_category_id, product_category_code, product_category_description)
        VALUES (3, 'ADULT', N'Adult stuff')

INSERT INTO Products(product_id, EAN_code, product_description, product_category_id)
        VALUES (1, 'MMFAFGRCDGKDGQEJ10', N'AC/DC – Back in Black', 1)
INSERT INTO Products(product_id, EAN_code, product_description, product_category_id)
        VALUES (2, 'DD245FS6D3KBNSDWNF', N'Finding Nemo', 2)
INSERT INTO Products(product_id, EAN_code, product_description, product_category_id)
        VALUES (3, 'B87S0NFDKSDFSAP2IS', N'Pics of hot chicks with little or no clothes to share with your friends on Twitter', 3)

Now that sample data is ready, let’s enforce the business rule #1: orders from users with a normal account must be limited to $1000.
To achieve this, we have to create an additional “dummy” table that holds exactly two rows. This table exists with the only purpose to implement a cartesian product and violate a UNIQUE constraint in the indexed view.

-- Create dummy table to store exactly two rows
CREATE TABLE TwoRows (
        N int NOT NULL PRIMARY KEY
)

INSERT INTO TwoRows VALUES(1)
INSERT INTO TwoRows VALUES(2)
GO

Everything is ready to create the view and the UNIQUE index bound to it:

CREATE VIEW CHECK_Orders_Amount
WITH SCHEMABINDING
AS
SELECT 1 AS ONE
FROM dbo.Orders AS ORD
INNER JOIN dbo.Accounts AS ACCT
        ON ORD.user_id = ACCT.user_id
INNER JOIN dbo.AccountTypes AS ACTY
        ON ACCT.account_type_id = ACTY.account_type_id
CROSS JOIN dbo.TwoRows AS TR
WHERE ORD.total_amount >= 1000
        AND ACTY.account_type_code <> 'PRE'

GO

CREATE UNIQUE CLUSTERED INDEX IX_CHECK_Orders_Accounts ON dbo.CHECK_Orders_Amount(ONE)
GO

We can now insert some sample data to test if the business rule gets enforced:

-- Insert order #1 for user #1 (me) and total amount $2500. Works.
INSERT INTO Orders (order_id, user_id, total_amount, order_date) VALUES (1, 1, 2500.00, GETDATE())
-- Insert order #2 for user #2 (Mladen) and total amount $500. Works
INSERT INTO Orders (order_id, user_id, total_amount, order_date) VALUES (2, 2, 500.00, GETDATE())

-- Insert order #3 for user #3 (My 2 year-old daughter) and total amount $5000. OUCH! Violates the UNIQUE constraint.
INSERT INTO Orders (order_id, user_id, total_amount, order_date) VALUES (3, 3, 5000.00, GETDATE())
-- Insert order #3 for Giulia with total amount $100. Works
INSERT INTO Orders (order_id, user_id, total_amount, order_date) VALUES (3, 3, 100.00, GETDATE())

If we look at the execution plan of the INSERT statements, we can see that the indexed view maintenance is merged into the INSERT query plan:

It may be interesting to note that SQL Server is smart enough to identify the statements that require updating the indexed view. For instance, if we try to update a column that is not used in the indexed view, we won’t see any index maintenance in the query plan.
For instance, we could update order_id and examine the query plan:

-- This statement does not update the indexed view, so it is not included in the plan
UPDATE Orders SET order_id = 3 WHERE order_id = 2
GO

As you can see, there is no need to maintain the index on the view. To achieve the same with a trigger, you would have to explicitly define the behaviour of the code using IF UPDATE(ColumnName).
Moreover, the UNIQUE constraint gets evaluated whenever ANY table used in the indexed view gets modified: this would be very hard to achieve with a trigger.

Now that the first business rule is set, we can proceed with the second one: no ‘ADULT’ products can be ordered by minors.
This can get a tricky requirement, as we might be tempted to calculate the age of the user comparing it to GETDATE(). Unfortunately, non-deterministic functions cannot be used in indexed views. We will have to get around it by using the order_date column, that was set to GETDATE() previously.

CREATE VIEW CHECK_Orders_Adult
WITH SCHEMABINDING
AS
SELECT 1 AS ONE
FROM dbo.Orders AS ORD
CROSS JOIN dbo.TwoRows
INNER JOIN dbo.OrderDetails AS ODT
        ON ORD.order_id = ODT.order_id
INNER JOIN dbo.Products AS PR
        ON ODT.product_id = PR.product_id
INNER JOIN dbo.ProductCategories AS PRC
        ON PR.product_category_id = PRC.product_category_id
INNER JOIN dbo.Users AS USR
        ON ORD.user_id = USR.user_id
WHERE PRC.product_category_code = 'ADULT'
        AND DATEADD(year, 18, USR.birth_date) > ORD.order_date
GO

CREATE UNIQUE CLUSTERED INDEX IX_CHECK_Orders_Adult ON dbo.CHECK_Orders_adult(ONE)
GO

With the constraint in place, we can try to verify if the business rule gets enforced:

-- I order the AC/DC album. I will listen to it in my car while driving.
INSERT INTO OrderDetails (order_id, order_line, product_id, quantity) VALUES (1, 1, 1, 1)
-- Mladen orders the hot chicks DVD to send the pics via Twitter.
INSERT INTO OrderDetails (order_id, order_line, product_id, quantity) VALUES (2, 1, 3, 1)
-- Giulia tries to buy the hot chicks DVD as well. She likes boobs. For the milk, I suspect.
-- Fortunately, the INSERT statement fails.
INSERT INTO OrderDetails (order_id, order_line, product_id, quantity) VALUES (3, 1, 3, 1)
-- OK, Giulia: you'd better buy a Disney DVD
INSERT INTO OrderDetails (order_id, order_line, product_id, quantity) VALUES (3, 1, 2, 1)

Conclusion

Indexed Views provide an elegant way to enforce business rules that go beyond the scope of a single row in a table, without the kludge of CHECK constraints with scalar UDFs or the pain of DML triggers.
However, some limitations apply:

  • Not all queries can be expressed in a way that can be used in an Indexed View. You can’t use non-deterministic functions, common table expressions, subqueries or self joins.
  • Indexed Views cannot perform cross-database queries. If the business rule must be verified against a table stored in a different database, this method cannot be used.

Picking the right tool among CHECK constraints and triggers can be a hard decision. But now, hopefully, you have another option. 😉

P.S. : Mladen Prajdić (blog|twitter) kindly gave his blessing to the publishing of this post.

Table-level CHECK constraints


EDITED 2011-08-05: This post is NOT about the “correct” way to implement table-level check constraints. If that is what you’re looking for, see this post instead.

Today on SQL Server Central I stumbled upon an apparently simple question on CHECK constraints. The question can be found here.

The OP wanted to know how to implement a CHECK constraint based on data from another table. In particular, he wanted to prohibit modifications to records in a detail table based on a datetime column on the master table. A simple way to achieve it is to use a trigger, but he was concerned about performance and wanted to implement it with a CHECK constraint.

Let’s see how this can be done. First of all, we will need a couple of test tables, with some sample data:

USE tempdb;
GO

-- Create master table
CREATE TABLE masterTable (
        id int identity(1,1) PRIMARY KEY,
        dateColumn datetime
)
GO

-- Create referenced table
CREATE TABLE detailTable (
        id int identity(1,1) PRIMARY KEY,
        master_id int FOREIGN KEY REFERENCES masterTable(id),
        valueColumn varchar(50)
)
GO

-- Insert sample data
INSERT INTO masterTable(dateColumn) VALUES(GETDATE())
INSERT INTO masterTable(dateColumn) VALUES(DATEADD(day,-1,GETDATE()))
INSERT INTO masterTable(dateColumn) VALUES(DATEADD(day,-2,GETDATE()))

SELECT * FROM masterTable

-- Insert sample data in referenced table
INSERT INTO detailTable(master_id, valueColumn) VALUES (1,'Value for id 1')
INSERT INTO detailTable(master_id, valueColumn) VALUES (2,'Value for id 2')
INSERT INTO detailTable(master_id, valueColumn) VALUES (3,'Value for id 3')

SELECT * FROM detailTable

Now we need to create the CHECK constraint. The only way to code it to validate data against a different table is to use a scalar UDF, which, to answer the original question, makes it a poor choice from a performance standpoint.

-- Create a scalar UDF: you will need this in the CHECK constraint
CREATE FUNCTION getMasterDate(@id int)
RETURNS datetime
AS
BEGIN
        RETURN ISNULL((SELECT dateColumn FROM masterTable WHERE id = @id),'30110101')
END
GO

-- Add the constraint WITH NOCHECK: existing rows are not affected
ALTER TABLE detailTable WITH NOCHECK ADD CONSTRAINT chk_date
        CHECK (DATEADD(day,-1,GETDATE()) > dbo.getMasterDate(master_id))

No surprises: the constraints gets added to the table and SQL Server does not complain about the existing rows that don’t pass the check, since we decided to use the NOCHECK switch.

Now, with the constraint in place, we should be unable to modify the data in a way that violates the constraint:

UPDATE detailTable
SET valueColumn = 'New ' + valueColumn

Surprise! Some rows conflict with the CHECK constraint, but no complaints from SQL Server, because the constraint is attached to a single column (master_id), and we left that column untouched.
In fact, if we query sys.check_constraints, we can see that this is a column-scoped constraint:

SELECT parent_column_id, definition, is_not_trusted
FROM sys.check_constraints
WHERE parent_object_id = OBJECT_ID('detailTable')

Parent_column_id = 0 means table scoped constraint, parent_column_id > 0 means column-scoped constraint.
If we try to update the column “master_id”, the constraint prevents the modification:

-- If you try to update master_id it fails
UPDATE detailTable
SET master_id = master_id + 1 - 1
Msg 547, Level 16, State 0, Line 2
The UPDATE statement conflicted with the CHECK constraint "chk_date". The conflict occurred in database "tempdb", table "dbo.detailTable", column 'master_id'.
The statement has been terminated.

What is really surprising is how SQL Server behaves if we decide to make the constraint table-scoped, adding another predicate to the CHECK:

-- Drop the constraint
ALTER TABLE detailTable DROP CONSTRAINT chk_date
GO

-- Create the constraint referencing multiple columns
ALTER TABLE detailTable WITH NOCHECK ADD CONSTRAINT chk_date
        CHECK (
                DATEADD(day,-1,GETDATE()) > dbo.getMasterDate(master_id)
                AND
                ISNULL(valueColumn,'') = ISNULL(valueColumn,SPACE(0))
        )
GO

As we might expect, it doesn’t work any more:

UPDATE detailTable
SET valueColumn = 'New ' + valueColumn
Msg 547, Level 16, State 0, Line 2
The UPDATE statement conflicted with the CHECK constraint "chk_date". The conflict occurred in database "tempdb", table "dbo.detailTable", column 'master_id'.
The statement has been terminated.

But, wait: what is REALLY attaching the constraint to the column we are trying to modify? Does a thing such as a “table-scoped” constraint really exist?

We just need to add another column and see how it changes the behaviour of the CHECK constraint:

ALTER TABLE detailTable ADD anotherColumn int

Now, if we try to update the newly created column, since we have a “table-scoped” CHECK constraint, we should get an error:

UPDATE detailTable
SET anotherColumn = 1

… but it’s not so!

(3 row(s) affected)

The constraint does not include any reference to “anotherColumn”, so it does not even get executed. If you don’t believe it is so, you can check with Profiler and capture any call to scalard UDFs: you wan’t see any in this case.

This means that what Microsoft calls a table level CHECK constraint is something that does not really exist and a better name for it would be “Multicolumn CHECK constraint”.

The main thing to keep in mind is that if we want the constraint to check the data regardless of the column getting modified we MUST include ALL the columns of the table in the constraint definition.