Counting the number of rows in a table
Don’t be fooled by the title of this post: while counting the number of rows in a table is a trivial task for you, it is not trivial at all for SQL Server.
Every time you run your COUNT(*) query, SQL Server has to scan an index or a heap to calculate that seemingly innocuous number and send it to your application. This means a lot of unnecessary reads and unnecessary blocking.
Jes Schultz Borland blogged about it some time ago and also Aaron Bertrand has a blog post on this subject. I will refrain from repeating here what they both said: go read their blogs to understand why COUNT(*) is a not a good tool for this task.
The alternative to COUNT(*) is reading the count from the table metadata, querying sys.partitions, something along these lines:
SELECT SUM(p.rows) FROM sys.partitions p WHERE p.object_id = OBJECT_ID('MyTable') AND p.index_id IN (0,1); -- heap or clustered index
Many variations of this query include JOINs to sys.tables, sys.schemas or sys.indexes, which are not strictly necessary in my opinion. However, the shortest version of the count is still quite verbose and error prone.
Fortunately, there’s a shorter version of this query that relies on the system function OBJECTPROPERTYEX:
SELECT OBJECTPROPERTYEX(OBJECT_ID('MyTable'),'cardinality')
Where does it read data from? STATISTICS IO doesn’t return anything for this query, so I had to set up an Extended Events session to capture lock_acquired events and find out the system tables read by this function:
Basically, it’s just sysallocunits and sysrowsets.
It’s nice, short and easy to remember. Enjoy.
Posted on May 18, 2015, in SQL Server, T-SQL and tagged SQL, SQL Server, SQLServer, T-SQL. Bookmark the permalink. 14 Comments.
For stored procedures, I use the method your good article points out. In person, I use COUNT(*) for one and only one reason… it gives me time to take a sip of Scotch. 😉
Hahaha! Now that’s a good reason to have slow code around! 🙂
Jeff, thanks for stopping by and for the good laugh.
Reblogged this on Alessandro Alpi's Blog.
It looks really useful, but on my DB (SQL Server 2008R2)
select count(*) from VarOrFormValue
gives
23650227
and
SELECT OBJECTPROPERTYEX(OBJECT_ID(‘VarOrFormValue’),’cardinality’)
gives
NULL
Clearly I’m missing something obvious????
My bad, I forgot to point out that it only works from 2012 onwards.
In SQL Server 2005 thru 2008, you can do the following to quickly get a row count. Note that prior to 2005, this wasn’t accurate.
SELECT *
FROM sys.sysindexes
WHERE id = OBJECT_ID(‘dbo.YourTableNameHere’)
AND indid IN (0,1)
;
The “indid” of 0 or 1 will pick up either on the heap or the clustered index depending, of course, on whether there’s a clustered index or not. This could easily be converted to a function to provide similar utility to the 2012 function of OBJECTPROPERTYEX().
Thank you. Years of using count(*) and didn’t realise this was an issue – and I thought I knew at least something about SQL!
SELECT rowcnt
FROM sys.sysindexes
WHERE id = OBJECT_ID(‘cpr.VarOrFormValue’)
AND indid IN (0,1)
24,444,019 in 75 ms first time, about 25ms thereafter
select count(*) from cpr.VarOrFormValue
24,444,019 in 14,569 ms first time, about 750ms thereafter.
That makes it worthwhile using.
Do you know at what row count it makes a meaningful difference?
Whenever sys.partitions contains a smaller number of rows than the user table. “Meaningful” is highly subjective, but you will see a noticeable difference when the user table is an order of magnitude bigger than sys.partirions.
Very easy and fast lookups.
But I’m hesitant to use it, because we have a table where
COUNT_BIG(*) >> OBJECTPROPERTYEX(…,’cardinality’)
How can that be?
I think it has to do with metadata not being in sync with the table. You will find more information in the blog by Aaron Bertrand that I linked in the third paragraph. http://sqlperformance.com/2014/10/t-sql-queries/bad-habits-count-the-hard-way
@Robert Sterbal,
If it makes no meaningful difference, then no harm done. It’s one of those things that cause no harm and increases in benefit as the data scales up.
The only time that you might have to do a significant test to see which is best is if you have something that needs to do a COUNT tens or hundreds times per second.. A COUNT on a short table may be more efficient for such heavy hits.
Using the OBJECTPROPERTYEX method may be a problem from 2014 onwards as it seems to return 0 rows for memory optimized tables e.g. when I run the following against the AdventureWorks2016 database:
SELECT
t.[name]
, t.[object_id]
, p.[rows] AS [sys.partitions]
, CONVERT(int,OBJECTPROPERTYEX(t.[object_id],’Cardinality’)) AS [OBJECTPROPERTYEX]
FROM sys.tables t
INNER JOIN sys.partitions p ON p.[object_id] = t.[object_id] AND p.[index_id] IN (0,1)
WHERE t.[is_memory_optimized] = 1
ORDER BY t.[name]
I get the following results:
name object_id sys.partitions OBJECTPROPERTYEX
—————————— ———– ——————– —————-
DemoSalesOrderDetailSeed 1556200594 538 0
DemoSalesOrderHeaderSeed 1588200708 31465 0
Product_inmem 2039678314 504 0
SalesOrderDetail_inmem 468196718 121317 0
SalesOrderHeader_inmem 212195806 31465 0
SpecialOffer_inmem 1895677801 16 0
SpecialOfferProduct_inmem 132195521 538 0
Thanks for the heads up! This seems to be a bug. I will report it.