Concepts in detail
Motivation
We want to make all database reads referentially transparent - that is, repeatable - such that at any time in the future, we can re-run a query from the past and produce exactly the same result. This is not to say that our data never changes; our data may be updated constantly! Rather, it says that we want to take into account the fact that data enters the database over time, and we must account both for what the database knows and when the database learned it.
Consider, for example, tax reporting. While a business may do everything possible to ensure accurate numbers prior to submission, some reports may require amendments; some may even require additional tax payments. In this situation, taxing authorities may ask for proof that the business acted in good faith in prior reports. A referentially transparent database makes it possible to show that data from earlier reports were accurate to the best of the business’ knowledge.
Requirements
In order for reads to be repeatable, it’s clear that data cannot be updated in place, which means that the database must be append-only. Bitemporal modeling provides a principled append-only structure for storing and retrieving data in a repeatable manner.
What is bitemporal modeling?
Bitemporal modeling allows you to track data along two time axes, hence the name, “bitemporal.” The first axis is the valid time, or Vt
, which is when an event took place in the real world. This is the time that most people think of when they think of recording an event in a database, and the user has the freedom to set the Vt
to whatever value is appropriate.
The second axis is the transaction time, or Tt
, which is when the database recorded the event. Unlike the Vt, the Tt
is set by the database itself, never by the user. This allows the database to keep an accurate record of when data was recorded.
Having two time axes allows us to plot data on a 2d plane that we call the bitemporal plane. Conceptually, data is stored in rectangles on this Tt-Vt
plane:
Vt
^ ^
| |/////////
Vt_p| |/// p ///
| |/////////
| ---------->
Vt_x| x
-------------> Tt
Tt_p
In the diagram above, the point p
at coordinates (Tt_p, Vt_p)
falls inside the shaded rectangle. Therefore, a read at p
will return the data associated with that rectangle. Contrast that with the point x
at coordinates (Tt_p, Vt_x)
. x
falls outside the rectangle, so no data is returned for a read at x
.
To modify existing data, we need to close an existing rectangle and to insert 1 or more new rectangles, as in the following diagram:
Vt
^ ^
| |//////////|\\\\\\\\\\\\\\\
Vt_p2| |//////////|\\\\\\\\\ p2 \\
| |//////////|\\\\\\\\\\\\\\\
Vt_update| |//////////|--------------->
| |//////////|///////////////
Vt_y| |//////////|///////// y ///
Vt_p| |/// p ////|///////////////
| |//////////|///////////////
| --------------------------->
Vt_x| x
------------------------------> Tt
Tt_p Tt_update Tt_p2
At a Tt
between Tt_p
and Tt_p2
and a Vt
between Vt_p
and Vt_p2
, the data was updated, which you can see from the difference in shading. The update resulted in 3 operations:
- The first rectangle was bounded at the
Tt
of the update. - A new rectangle aligned with the bottom of the first rectangle was created with the same data as the first rectangle.
- A new rectangle was created at the
Vt
of the update with the new data.
There are 3 things to notice:
- A query for
p
at(Tt_p, Vt_p)
will return the old data as it did before the update. - A query for
p2
at(Tt_p2, Vt_p2)
whereTt_p2 > Tt_update
andVt_p2 > Vt_update
will return the new data. - A query for
y
at(Tt_p2, Vt_y)
whereTt_p2 > Tt_update
andVt_p < Vt_y < Vt_update
will return the old data.
While (2) is likely expected, (1) and (3) are unique to bitemporal data. This bitemporal behavior allows us to reproduce the world as it was known at any point in history.