Pretty frequently, I want to know the size of a table:
SELECT COUNT(id) FROM table_name ;
But on some database engines (e.g. Postgres) and if the table is on the larger size (>10^7 records), even when properly indexed this query can take longer than expected. But to explain why, we have to first look at another problem in the database space.
Let's say we have a database of users.
id | username |
---|---|
1 | jon |
And two transactions begin around the same time. One is a transaction that updates "Jon" to "John" and another transaction that, while executing, needs to read the name of ID 1. What name should be returned? ACID compliant databases need to have a solution because otherwise it would violate the I of ACID (Atomicity, Consistency, Isolation, Durability) which, to define isolation briefly, is "two concurrent transactions should leave the database in the same state if they arrived sequentially".
So how do databases solve this problem? Early on, a number of databases would lock the data being modified. So the above read transaction would would have to wait until the update transaction finished modifying the data. This introduces a lot of locking overhead though (what if a transaction modifies a bunch of data at once? All the data would need associated locks) which keeps our reads waiting in line potentially for a long time. If our database has a lot of read traffic (and we are okay with delivering stale data), we need a new solution.
In walks Multiversion Concurrency Control to solve the problem.
The solution is quite simple, by using multiple versions of the same record. When the database goes to update a record, it creates a copy of the record and updates the copy instead. While the database is processing the update, reads go to the record with the highest version, ignoring the copy until the update is finished.
id | username | version |
---|---|---|
1 | jon | 1 |
1 | john |
After the update is finished, the version id is incremented for the copied record.
id | username | version |
---|---|---|
1 | jon | 1 |
1 | john | 2 |
So now all subsequent reads for the name will return "John". When a new update is made, a new version is created and so on. This strategy is used in quite a few popular databases such as Postgres and MySQL.
This provides a number of benefits
- Better read performance
- Fewer database locks
- Fewer database contention issues
- Record isolation for write operations
- Fewer database deadlocks
Nothing is free, though. There are some downsides.
- Increased complexity on the database engine side
- Increased database size due to duplicate version records
- Postgres solves this problem via a VACUUM process. This process identifies and deletes duplicate and old records created by MVCC
So with this in mind, why can counts be slow? Because we are potentially looking at a much larger dataset than at first glance. We don't want to include these extra versions in the count.
It would be remiss of me to not mention Transaction Isolation Levels, given all the talking about MVCC reading data while a transaction is underway.
The ANSI/ISO SQL standard describes the following as undesirable:
- Dirty reads
- Non-repeatable reads
- Phantom reads
Dirty read is when a transaction retrieves a record that has been updated by another transaction that hasn't been committed yet.
Non-repeatable read is when, during a transaction, a row is retrieved twice and the values within the row are different between reads.
Phantom read is when, during a transaction, two identical queries are executed and the collection of rows returned by the second query is different.
Preventing these situations does come with a performance penalty, so databases provide levels of isolation (aka Transaction Isolation Levels) to allow developers to decide which scenarios they are comfortable with.
- Read uncommitted
- Read committed
- Repeatable read
- Serializable
Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
---|---|---|---|
Read uncommitted | Possible | Possible | Possible |
Read committed | Not possible | Possible | Possible |
Repeatable read | Not possible | Not possible | Possible |
Serializable | Not possible | Not possible | Not possible |
Side note: Postgres only supports Read committed and Serializable (with Read committed as the default) and InnoDB supports all four (Repeatable read as default).
So now we know why COUNT
can be slow. Sometimes we need to know how many records a table has, though. How can we get these counts in a decent time frame?
First ask yourself, do you need to know the exact number? Or is an estimate okay?
I need to know exact numbersYou're a bit out of luck then. Split up the data more via sharding to reduce table size and, if available, run clean up processes such as VACUUM regularly to keep your table sizes down.
I am fine with estimatesYou've got more options here:
- Query database metadata
- Use a materialized view (if supported)
- Store the count yourself
Query database metadata: Database engines will often maintain metadata around tables for the purposes of query plans and database clean up. For instance, Postgres stores approximate table sizes within the pg_catalog.pg_class
table.
SELECT reltuples::bigint FROM pg_catalog.pg_class WHERE relname = 'table_name'; ;
Use a materialized view: Since count queries can be slow, another solution is to define a materialized view where the count is stored. (Not all databases support materialized views, though) A materialized view is a view where the data returned by your view query is persisted in the database at the cost of the data being potentially stale. This will provide speedy access to counts similar to the metadata approach above.
Store the count yourself: Not all databases support materialized views. But you always have the option to create similar functionality yourself. See Summary Tables and How to Implement Materialized Views in MySQL/MSSQL for some ways you can do so.