Multiversion Concurrency Control

ABOUT PORTFOLIO CONTACT

MULTIVERSION CONCURRENCY CONTROL: WHY COUNTS ON SOME DATABASES CAN BE SLOW

Pretty frequently, I want to know the size of a table:


        SELECT COUNT(id)
        FROM table_name
        ;

But on some database engines (e.g. Postgres) and if the table is on the larger size (>10^7 records), even when properly indexed this query can take longer than expected. But to explain why, we have to first look at another problem in the database space.

WHAT IF YOU READ A RECORD THAT IS CURRENTLY BEING UPDATED

Let's say we have a database of users.

id	username
1	jon

And two transactions begin around the same time. One is a transaction that updates "Jon" to "John" and another transaction that, while executing, needs to read the name of ID 1. What name should be returned? ACID compliant databases need to have a solution because otherwise it would violate the I of ACID (Atomicity, Consistency, Isolation, Durability) which, to define isolation briefly, is "two concurrent transactions should leave the database in the same state if they arrived sequentially".

So how do databases solve this problem? Early on, a number of databases would lock the data being modified. So the above read transaction would would have to wait until the update transaction finished modifying the data. This introduces a lot of locking overhead though (what if a transaction modifies a bunch of data at once? All the data would need associated locks) which keeps our reads waiting in line potentially for a long time. If our database has a lot of read traffic (and we are okay with delivering stale data), we need a new solution.

In walks Multiversion Concurrency Control to solve the problem.

MULTIVERSION CONCURRENCY CONTROL

The solution is quite simple, by using multiple versions of the same record. When the database goes to update a record, it creates a copy of the record and updates the copy instead. While the database is processing the update, reads go to the record with the highest version, ignoring the copy until the update is finished.

id	username	version
1	jon	1
1	john

After the update is finished, the version id is incremented for the copied record.

id	username	version
1	jon	1
1	john	2

So now all subsequent reads for the name will return "John". When a new update is made, a new version is created and so on. This strategy is used in quite a few popular databases such as Postgres and MySQL.

This provides a number of benefits

Better read performance
Fewer database locks
Fewer database contention issues
Record isolation for write operations
Fewer database deadlocks

Nothing is free, though. There are some downsides.

Increased complexity on the database engine side
Increased database size due to duplicate version records

Postgres solves this problem via a VACUUM process. This process identifies and deletes duplicate and old records created by MVCC

So with this in mind, why can counts be slow? Because we are potentially looking at a much larger dataset than at first glance. We don't want to include these extra versions in the count.

TRANSACTION ISOLATION LEVELS

It would be remiss of me to not mention Transaction Isolation Levels, given all the talking about MVCC reading data while a transaction is underway.

The ANSI/ISO SQL standard describes the following as undesirable:

Dirty reads
Non-repeatable reads
Phantom reads

Dirty read is when a transaction retrieves a record that has been updated by another transaction that hasn't been committed yet.

Non-repeatable read is when, during a transaction, a row is retrieved twice and the values within the row are different between reads.

Phantom read is when, during a transaction, two identical queries are executed and the collection of rows returned by the second query is different.

Preventing these situations does come with a performance penalty, so databases provide levels of isolation (aka Transaction Isolation Levels) to allow developers to decide which scenarios they are comfortable with.

Read uncommitted
Read committed
Repeatable read
Serializable

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read uncommitted	Possible	Possible	Possible
Read committed	Not possible	Possible	Possible
Repeatable read	Not possible	Not possible	Possible
Serializable	Not possible	Not possible	Not possible

Side note: Postgres only supports Read committed and Serializable (with Read committed as the default) and InnoDB supports all four (Repeatable read as default).

HOW CAN I GET COUNTS QUICKLY?

So now we know why COUNT can be slow. Sometimes we need to know how many records a table has, though. How can we get these counts in a decent time frame?

First ask yourself, do you need to know the exact number? Or is an estimate okay?

I need to know exact numbers

You're a bit out of luck then. Split up the data more via sharding to reduce table size and, if available, run clean up processes such as VACUUM regularly to keep your table sizes down.

I am fine with estimates

You've got more options here:

Query database metadata
Use a materialized view (if supported)
Store the count yourself

Query database metadata: Database engines will often maintain metadata around tables for the purposes of query plans and database clean up. For instance, Postgres stores approximate table sizes within the pg_catalog.pg_class table.


      SELECT reltuples::bigint
      FROM pg_catalog.pg_class
      WHERE relname = 'table_name';
      ;

Use a materialized view: Since count queries can be slow, another solution is to define a materialized view where the count is stored. (Not all databases support materialized views, though) A materialized view is a view where the data returned by your view query is persisted in the database at the cost of the data being potentially stale. This will provide speedy access to counts similar to the metadata approach above.

Store the count yourself: Not all databases support materialized views. But you always have the option to create similar functionality yourself. See Summary Tables and How to Implement Materialized Views in MySQL/MSSQL for some ways you can do so.

ADDITIONAL READING

MySQL vs PostgreSQL, VACUUM vs Purge