JON ALLIGOOD'S WEBSITE

ABOUT CONTACT

MULTIVERSION CONCURRENCY CONTROL: WHY COUNTS ON SOME DATABASES CAN BE SLOW

Pretty frequently, I want to know the size of a table:


        SELECT COUNT(id)
        FROM table_name
        ;
      

But on some database engines (e.g. Postgres) and if the table is on the larger size (>10^7 records), even when properly indexed this query can take longer than expected. But to explain why, we have to first look at another problem in the database space.

WHAT IF YOU READ A RECORD THAT IS CURRENTLY BEING UPDATED

Let's say we have a database of users.

id username
1 jon

And two transactions begin around the same time. One is a transaction that updates "Jon" to "John" and another transaction that, while executing, needs to read the name of ID 1. What name should be returned? ACID compliant databases need to have a solution because otherwise it would violate the I of ACID (Atomicity, Consistency, Isolation, Durability) which, to define isolation briefly, is "two concurrent transactions should leave the database in the same state if they arrived sequentially".

So how do databases solve this problem? Early on, a number of databases would lock the data being modified. So the above read transaction would would have to wait until the update transaction finished modifying the data. This introduces a lot of locking overhead though (what if a transaction modifies a bunch of data at once? All the data would need associated locks) which keeps our reads waiting in line potentially for a long time. If our database has a lot of read traffic (and we are okay with delivering stale data), we need a new solution.

In walks Multiversion Concurrency Control to solve the problem.

MULTIVERSION CONCURRENCY CONTROL

The solution is quite simple, by using multiple versions of the same record. When the database goes to update a record, it creates a copy of the record and updates the copy instead. While the database is processing the update, reads go to the record with the highest version, ignoring the copy until the update is finished.

id username version
1 jon 1
1 john

After the update is finished, the version id is incremented for the copied record.

id username version
1 jon 1
1 john 2

So now all subsequent reads for the name will return "John". When a new update is made, a new version is created and so on. This strategy is used in quite a few popular databases such as Postgres and MySQL.

This provides a number of benefits

Nothing is free, though. There are some downsides.

So with this in mind, why can counts be slow? Because we are potentially looking at a much larger dataset than at first glance. We don't want to include these extra versions in the count.

TRANSACTION ISOLATION LEVELS

It would be remiss of me to not mention Transaction Isolation Levels, given all the talking about MVCC reading data while a transaction is underway.

The ANSI/ISO SQL standard describes the following as undesirable:

Dirty read is when a transaction retrieves a record that has been updated by another transaction that hasn't been committed yet.

Non-repeatable read is when, during a transaction, a row is retrieved twice and the values within the row are different between reads.

Phantom read is when, during a transaction, two identical queries are executed and the collection of rows returned by the second query is different.

Preventing these situations does come with a performance penalty, so databases provide levels of isolation (aka Transaction Isolation Levels) to allow developers to decide which scenarios they are comfortable with.

Isolation Level Dirty Read Non-Repeatable Read Phantom Read
Read uncommitted Possible Possible Possible
Read committed Not possible Possible Possible
Repeatable read Not possible Not possible Possible
Serializable Not possible Not possible Not possible

Side note: Postgres only supports Read committed and Serializable (with Read committed as the default) and InnoDB supports all four (Repeatable read as default).

HOW CAN I GET COUNTS QUICKLY?

So now we know why COUNT can be slow. Sometimes we need to know how many records a table has, though. How can we get these counts in a decent time frame?

First ask yourself, do you need to know the exact number? Or is an estimate okay?

I need to know exact numbers

You're a bit out of luck then. Split up the data more via sharding to reduce table size and, if available, run clean up processes such as VACUUM regularly to keep your table sizes down.

I am fine with estimates

You've got more options here:

Query database metadata: Database engines will often maintain metadata around tables for the purposes of query plans and database clean up. For instance, Postgres stores approximate table sizes within the pg_catalog.pg_class table.


      SELECT reltuples::bigint
      FROM pg_catalog.pg_class
      WHERE relname = 'table_name';
      ;
      

Use a materialized view: Since count queries can be slow, another solution is to define a materialized view where the count is stored. (Not all databases support materialized views, though) A materialized view is a view where the data returned by your view query is persisted in the database at the cost of the data being potentially stale. This will provide speedy access to counts similar to the metadata approach above.

Store the count yourself: Not all databases support materialized views. But you always have the option to create similar functionality yourself. See Summary Tables and How to Implement Materialized Views in MySQL/MSSQL for some ways you can do so.

ADDITIONAL READING