Proving Liveness with TLA
The TLA Toolbox now has support for proving liveness properties (i.e. that something will eventually happen).
I try it out on the Xen vchan protocol.
Working on a liveness proof with the TLA+ Toolbox.
Table of Contents
- Background
- A simple channel specification
- Temporal logic
- Proving liveness (simple case)
- Multi-step liveness
- The real protocol
- Work-arounds for bugs
- Conclusions
Background
The vchan protocol is used for efficient communication between Xen virtual machines.
In 2018, I made a TLA+ specification of vchan:
I created a specification of the protocol from the C code,
used the model checker (TLC) to test that the protocol worked on small models,
and wrote a machine-verified proof of Integrity (that the data received always matched what was sent).
I also outlined a proof of Availability (that data sent will eventually arrive, rather than the system deadlocking or going around in circles). But:
Disappointingly, we can’t actually prove
Availabilityusing TLAPS, because
currently it understands very little temporal logic
However, newer versions of TLAPS (the TLA Proof System) have added proper support for temporal logic,
so I decided to take another look.
In this post, I’ll start with a simple example of proving a liveness property,
and then look at the real protocol.
If you’re not familiar with TLA, you might want to read the earlier post first,
though I’ll briefly introduce concepts as we meet them here.
A simple channel specification
We’ll start with a simple model of a one-way channel.
There is a sender and a receiver, and they have access to a shared buffer of size BufferSize
(which is a non-zero natural number):
1 2 |
|
The model just keeps track of the total number of bytes sent and received (not the actual data):
1 2 |
|
The amount of data currently in the buffer (BufferUsed) is the difference between these counters,
and the free space (BufferFree) is the difference between that and the total buffer size:
1 2 |
|
The property we are interested in (Liveness) is that data sent eventually arrives:
1 2 3 4 |
|
Specification in terms of actions
Liveness is the property we want, but we also need to say how we plan to achieve this.
In TLA, this is done by describing the initial state and the actions that can be performed
at each step.
Initially, no bytes have been sent or received:
1 2 3 |
|
An action only looks at a single atomic step of the algorithm, and relates the values of the variables
at the start of the step (e.g. Sent) to their values at the end (written primed, e.g. Sent').
The Send action is true when the sender increases Sent
(by some amount n, limited by the free space in the buffer):
1 2 3 4 |
|
The Recv action is true when the receiver reads all of the data currently in the buffer:
1 2 3 4 |
|
Every action of our system is either a Send or a Recv:
1 2 3 |
|
Finally, Spec describes the behaviours of the whole system in these terms:
1 2 3 4 |
|
Invariants
It’s always useful to prove some basic invariants about the system
(in particular, the types of the variables):
1 2 3 4 5 |
|
Before trying to prove anything, it’s good to use the model checker (TLC) to test it first.
You’ll need to put some limits on the number of bytes sent and checked in the model,
so it isn’t a full proof, but it’s likely to spot most problems.
In this example, it passes easily.
The general idea for proving things in TLA is to get away from temporal logic as quickly as possible
and do most of the proof with simple actions.
The following pattern should work for proving any invariant:
1 2 3 4 |
|
This says that Spec implies that I is always true because:
Iis true initially.Nextsteps preserveI(and so does leavingvarsunchanged).- Therefore
SpecensuresIwill always be true.
Note: The <1> at the start of each step is the indentation level,
which for some reason TLAPS can’t work out by itself.
The final QED BY PTL (“by Propositional Temporal Logic”) is the only step that uses temporal logic.
Everything else is just regular logic.
Now all we need to do is prove that the Next action preserves I, which is straight-forward:
1 2 3 4 5 6 |
|
So for proving invariants like I we can mostly ignore temporal logic.
By for proving liveness, we’ll need to understand more about it…
Temporal logic
Temporal logic is a type of modal logic.
In a modal logic we imagine there are many worlds, and we are in one of them.
Ordinary mathematical statements refer to our world,
so e.g. Sent = 4 refers to the value of Sent in our world.
However, by using the modal operators, you can refer to other worlds.
For example, [](Sent = 4) means that Sent = 4 is true in all worlds “reachable” from the current one.
In TLA, each world corresponds to a moment in time,
and current and future times are reachable.
So []X means “X will always be true”.
<>X means “X will eventually be true”.
For example:
[](Sent = 4)means thatSent = 4is true now and always will be.<>(Sent = 4)means thatSent = 4is either true now or will be true at some point in the future.
Modal operators nest, so you can say things like <>[](Sent = 4)
(eventually Sent will equal 4 and will remain so from then onwards),
or [](F => <>G) (F always leads to G, which can also be written as F ~> G).
Proving temporal claims with TLAPS
In the proof of Spec => []I above, only one step required temporal logic (BY PTL).
Having few such steps is good because these steps can be tricky.
The main problem is that TLA propositions are written in First Order Modal Logic,
but TLAPS doesn’t have any solvers that understand this!
Instead, it has a number of solvers that understand regular non-modal logic,
plus the PTL decision procedure that handles modal logic but not much else.
For example, consider this apparently simple claim:
1
|
|
This says that if Sent is always equal to 4 then it is currently greater than 0.
TLAPS can’t prove this in a single step:
- The regular solvers don’t understand the
[]bit. - The PTL procedure doesn’t understand numbers.
Coalescing for Reasoning in First-Order Modal Logics explains in detail what’s going on,
but my rough understanding is that TLAPS replaces things a solver won’t understand with
new fresh variables.
For example, [](Sent = 4) => Sent > 0 is received by the solver as something like:
Blob1 => Sent > 0(for a regular solver), or[]Blob2 => Blob3(for PTL)
Instead, you have to break it down into separate steps:
1 2 3 |
|
In the second step, the PTL solver will receive something like this, which it can prove:
1 2 |
|
Annoyingly, TLAPS doesn’t have an easy way to show you what it replaced in this way.
If it did, I think it would be much easier to learn how to use it.
You can use tlapm spec.tla --debug tempfiles and look at the files generated for each
solver backend, but they’re quite hard to read.
Generalising
Things that can be proved without assumptions specific to the current time are always true.
In the above example, PTL converted Blob2 => Blob3 from the regular solver
to [](Blob2 => Blob3).
This was only possible because the proof of Blob2 => Blob3 didn’t depend on the current world/time.
For example:
1 2 3 4 5 |
|
We’re trying to prove that if Sent is always 4 then it’s always greater than 0.
The problem is that we proved Sent = 4 => Sent > 0 in a context that assumed Spec,
so TLAPS won’t let us generalise it (even though it looks identical to what we proved before,
and didn’t actually use anything from Spec).
When PTL fails, the Interesting Obligations view shows e.g.
ASSUME ...
Sent = 4 => Sent > 0 (* non-[] *),
PTL
PROVE [](Sent = 4) => [](Sent > 0)
The (* non-[] *) indicates that this can’t be generalised to all times; we only know it’s true now.
Instead, we can do:
1 2 3 4 5 6 7 |
|
Doing the Sent = 4 => Sent > 0 step before introducing Spec as an assumption allowed it to be generalised.
We could also have proved it as a separate lemma.
Here’s an example where we make use of a non-[] assumption:
1 2 3 4 5 |
|
If Spec is true (we’re at the start of the algorithm) then Sent = 0.
We can’t generalise to saying that Sent is always 0,
but we can still use it to prove that Sent is (trivially) eventually zero.
But having non-[] assumptions usually indicates that something has gone wrong.
Some assumptions are safe to have in the context however.
You can assume constants (e.g NEW n in Nat), and also things that will always be true (e.g. []I).
This is fine:
1 2 3 4 5 |
|
Hiding definitions
Here’s a simplified version of a problem I had:
1 2 3 4 5 6 7 8 9 |
|
As usual, the regular solvers can’t handle the [] and PTL can’t handle the numbers or the for-all.
We could create a function (F) for the temporal formula and then hide its definition:
1 2 3 4 5 6 7 8 |
|
However, due to a bug in TLAPS, this is unreliable.
Often a proof fails, then you rename things a bit and it passes, then you rename them back and it continues to pass!
I reported this at https://github.com/tlaplus/tlapm/issues/247.
Here’s a screenshot of an extreme case, where TLAPS is failing to prove a conclusion that matches an assumption
character for character!
TLAPS failing to prove something very obvious.
Until that’s fixed, I found that the following pattern works well as a work-around:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Even when this bug is fixed, I think it would be really nice if
you could say e.g. BY L1(m+1) to explicitly use L1 with a particular value of n.
Proving liveness (simple case)
OK, let’s use some temporal logic to prove Liveness:
1 2 3 4 |
|
One way to prove A ~> B (A always eventually leads to B)
is from an action A => B' (A leads to B in one step), plus a few side conditions.
We’ll do that here because we conveniently have a Recv action that immediately gets us to our goal.
To prove A ~> B a generally-useful pattern is:
1 2 3 4 5 6 7 |
|
- If we’re at
Aand perform our useful actionRecvthen afterwards we’ll be atB. - When we’re at
Awe can perform the useful action. - If the useful action is continually possible then eventually it will happen.
- If we’re at
Aand perform any actionNext, we’ll either stay atAor get toB. - The system always performs
Nextsteps.
Some examples of why these are necessary to prove Liveness:
- Without 2:
Recvrequires at least 2 bytes in the buffer; sending a single byte fails. - Without 3: someone else has to perform
Recvand we can’t be sure they’ll actually do it. - Without 4: the sender can
Sendsome data and thenRetractit before it is read. - Without 5: 4 wouldn’t be useful.
(3) and (5) come directly from the definition of Spec.
We need to write proofs for the others, but they’re all non-temporal proofs and fairly simple.
The only slightly tricky one is proving ENABLED:
1 2 3 4 5 |
|
The magic ExpandENABLED causes the definition of ENABLED to be expanded in the version given to the solver.
This is only useful if the solver also has access to the definitions of all the actions used;
AutoUSEprovides them automatically so you don’t have to list them manually.
However, the solver has to work quite hard to prove this and I found it times out on more realistic examples.
The following trick is more generally useful:
1 2 3 4 5 6 7 8 9 |
|
Recvalways modifiesvars, so<is the same as>_vars Recv.- Therefore,
ENABLED <is the same as>_vars ENABLED Recv. - Therefore, we just need to prove
ENABLED Recv, which is easier for the solver.
The above uses the magic BY ENABLEDaxioms.
This doesn’t seem to be documented; I just found it in one of the examples.
It’s really fussy about the syntax used.
e.g. it works if you ask to prove it in both directions with <=>,
but if you ask for e.g. just => then it fails!
Also, it requires < rather than <,
though you can prove the first from the second.
The ENABLEDaxioms_test.tla test file is useful for getting the syntax right.
See example.tla for the full proof.
Multi-step liveness
The above proof was easy because the Recv step gets us to our goal immediately.
It’s more complicated if we only say that each Recv step reads some of the buffer:
1 2 3 4 |
|
The trick is to define a distance metric:
1
|
|
Dist(n, i) says we are within i bytes of having received the first n bytes.
Then a simple inductive argument will do:
1 2 3 4 5 6 7 8 9 |
|
Here, R(i) says that the algorithm works if we start at most i bytes away from our goal.
R(0) is obviously true, and we show that R(i) => R(i+1).
By induction, R is true for any distance.
Notice again the use of HIDE to avoid confusing the solvers with temporal logic.
Then we just need to show Progress: we eventually get closer to our goal (Dist(n, i+1) ~> Dist(n, i)).
That can be proved from Recv in a similar way to before. See example.tla for the full proof.
The real protocol
I’ve now updated vchan.tla to prove liveness for the real system.
It was more complicated than the example above (because it also looks at the data being sent,
has many steps, models sending interrupts, etc), but the basic approach is the same.
Note: In the real system, Sent and Got are the actual messages, so the
property talks about their lengths. Sent is what the sending application has
asked the vchan library to transmit (even if it hasn’t been written to the shared
buffer yet), and Got is what the receiving application has received from its
vchan library.
Ideally, we’d like to prove something like this:
1 2 3 |
|
However, that doesn’t work because the sender (and receiver) can decide to shut down the connection at any moment.
If the sender closes the connection before transmitting all the data then obviously it might not be received.
So it’s actually defined like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
Note: SenderLive and ReceiverLive are badly named.
SenderLive = FALSE means that the sender has written all data successfully and there is nothing more to come.
ReceiverLive = FALSE means that the receiver has decided to terminate the connection early,
and probably indicates an error (think EPIPE).
Although TLAPS couldn’t prove liveness properties back in 2018, I had got most of the way there:
I’d defined ReadLimit as:
The number of bytes that the receiver will eventually get without further action
from the sender (assuming the receiver doesn’t decide to close the connection).
and likewise WriteLimit as:
The number of bytes that the sender will eventually send without further
action from the other processes or the client application, assuming the
connection isn’t closed by either end.
I’d proved some useful invariants about them. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
If ReadLimit and WriteLimit accurately predict what will happen then these properties
are good evidence that the protocol works:
WriteLimitpredicts that the sender will be able to fill the buffer initially.- If the receiver doesn’t read it then the buffer will become full and the sender will block,
at which pointReadAllIfSenderBlockedpredicts all the data will be read. - If the receiver runs out of data then it will then eventually block,
at which point,WriteAllIfReceiverBlockedpredicts more data can be written.
I’d used TLC to check that both predictions were correct (on small models),
but I hadn’t been able to prove them with the old version of TLAPS.
Proving ReadLimit
The actual proof followed the same outline as the example above.
The main extra bit was showing that if we haven’t yet read up to the ReadLimit
then the receiver process would eventually reach the code that did the read,
and with the belief that there was at least one byte available.
Mixing BY PTL and regular solvers was particularly annoying here and required stating
lots of obvious facts like PC in {"recv_got_len"} => PC = "recv_got_len" because PTL
doesn’t understand sets, etc.
It seems a bit strange to have to prove that each line of code leads to the next.
That’s something you can mostly just assume when writing software, but in TLA you need to be explicit.
The only interesting bits were the usual things about showing code terminates:
- For loops, show that they will make progress each iteration.
- Show the code doesn’t crash (e.g. no buffer overflow).
- Show that any explicit
awaitwill eventually resume.
To help with the proofs I defined RSpec as a variant of Spec:
1 2 3 4 5 |
|
Instead of being true only at the start (like Spec, which uses Init),
this only requires our invariant ([]I) to be true.
Things proved using RSpec are therefore true from any point in the protocol,
which lets them be reused in other proofs.
I was able to show that ReadLimit‘s prediction will come true:
1 2 3 4 |
|
and from that, rather magically:
1 2 3 4 5 6 7 8 9 10 11 |
|
That eliminates all mention of ReadLimit, and expresses the property in terms meaningful to the sender.
Proving Availability
I was expecting to have to prove WriteLimit was correct too.
But it turned out I didn’t.
Availability only said that the receiver would receive data that had been transmitted;
it never claimed that the sending process would succeed in doing that!
It’s right there in the comment even:
1 2 3 4 5 6 |
|
Well, this is pretty bad. The point of proofs is that instead of having to check hundreds of lines of C,
or dozens of lines of TLA actions, we only have to check that the definition of Availability is what we want.
My definition of Availability is still true if the sender blocks forever waiting for more space in the buffer.
In fact, if I change the definition of the sender process so that
the application asking for data to be sent is the only possible action,
then TLC reports that Availability still holds (which is true).
The original blog post appeared on Reddit, Hacker News and Lobsters,
and nobody else seems to have spotted this either.
Though in fairness to my past self, I did mention checking an earlier version of Availability
that did require that it works end-to-end (but didn’t make any claims about what happens on shutdown),
and I had also tested WriteLimit, so I’m not really worried that the system doesn’t work.
And so, having learnt how to do liveness proofs, which was my real goal, I decided to stop here.
Proving WriteLimit
But proof assistants are addictive, and I found myself continuing anyway.
The proof of the correctness of WriteLimit was very similar to the one for ReadLimit and I got it done much faster
now I knew more about how to do these proofs.
However, I also wanted to fix Availability to something useful and updated it to this:
1 2 3 4 5 6 7 8 9 10 11 |
|
(msg is the data yet to be transmitted)
That required strengthening some of the existing proofs to talk about clean shutdowns,
not just the connection never being closed, but it was fairly straight-forward.
End-to-end liveness
I used the ReadLimit and WriteLimit lemmas to prove some things about the system as a whole, e.g.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
and was then finally able to prove my improved version of Availability:
1
|
|
See vchan.tla for the final version with the proofs.
Work-arounds for bugs
The biggest problem I had was the bug when mixing forall and temporal formulas,
until I found the work-around above.
Here are some other problems I hit and their solutions:
SUFFICES doesn’t always generalise
This doesn’t work:
1 2 3 4 5 6 |
|
Even though the SUFFICES used BY DEF Foo, Foo still counts as a time-specific assumption.
A work-around is to ASSUME []Foo and then assert Foo in a separate step:
1 2 3 4 5 6 7 |
|
CASE with a temporal goal
This doesn’t work:
1 2 3 4 5 |
|
It fails to parse, with Non-constant CASE for temporal goal.
However, if you replace CASE with the expanded form then it works:
1 2 3 4 5 |
|
PTL and primes
This doesn’t work:
1 2 |
|
A work-around:
1 2 3 |
|
Syntax matters
Things you might expect to be the same after parsing are treated differently.
This fails:
1 2 3 4 5 |
|
But this passes:
1 2 3 4 5 |
|
Other bugs
-
If keyboard short-cuts stop working, move the focus out of the window and back in again.
Sometimes some keys stop working (e.g. Delete) while others continue (e.g. Backspace). -
If the toolbox GUI fails to launch the prover due to
NullPointerException,
try closing the specification and reopening it. -
The solver doesn’t actually tell you if everything passed; you have to look in the scrollbar for little red regions.
If a theorem is folded in the editor, the top-level bit will still show as red if any sub-step fails, but it will
NOT appear in the scrollbar!
Either expand everything and check by eye, or runtlapmfrom the command-line as a final check.
Note that the bright greenSpec Statusindicator in the status bar only means that parsing the spec succeeded,
not that the proof-checking passed.
Conclusions
Checking the specification with TLC was quick and easy and immediately found the shutdown bug.
Actually proving the specification correct using TLAPS was much more difficult,
took way longer, and didn’t uncovered any additional bugs.
It did reveal a mistake in my definition of Availability, but there are surely quicker ways of finding that.
So, in terms of value for time spent, I guess it’s only worth it for very high-value code
(or if, like me, you just like proving stuff for fun).
However, I think it’s good to know how to verify things, even if you don’t usually do it.
It has certainly made me think more carefully about exactly what you need to check to verify liveness.
