12 KiB

Introduction

This repo contains my attempts at learning Coq. I will be following coq in a hurry by Yves Bertot and Pierre Castéran.

When I feel confident enough, I will film a 1-2h tutorial on Coq and attach it to this repo.

ToC

Motivation

As a pretext to this document, I am currently "working" for a company that is developing a programming language. They describe it as a DSL on top of Solidity (i.e. a DSL for smart contracts).

I am not a fan of Solidity, it's too javascripty for my taste.

The company deals with formally verifying the smart contracts written in this DSL. The point is to make sure that the contracts are correct and that they do not have any bugs.

We are to:

  1. Develop a Coq framework to express the specification of smart contracts using high-level concepts such as identity, item, ownership, ...
  2. Apply it to formally verify a list of Ethereum smart contracts

This is why, from this point onwards, most examples will be related to smart contracts. Sorry ;C.

Installation

This one took me a while. Compiled on Arch Linux 6.13.2-arch1-1.

$ sudo pacman -S ocaml opam
$ opam init
$ opam pin add coq 8.14.0

Important

Note how I pinned the version of Coq to 8.14.0, which is NOT the latest version. This is because:

Install the VSCode extension VSCoq legacy.

Tweak around until you get it right. Only took me 10 minutes to set the path correctly, since coqtop was not in my path for whatever reason.

When it comes to usage, this guy's video is pretty good:

CLICK FOR VIDEO

Basic Coq

Introduction

Coq is a proof assistant. It is a formal proof management system. It provides a formal language to write mathematical definitions, executable algorithms and theorems together with an environment for semi-interactive development of machine-checked proofs.

Coq is not fully automated. It is based on a logical framework called the calculus of inductive constructions. This framework is a formal language that is expressive enough to define mathematical theories and to express and prove mathematical theorems in those theories.

Comment on semantics

Coq is a bit weird in the sense that it has different semantics than most programming languages.

It is similar to Haskell in the sense that it is based on the lambda calculus.

It is also similar to Dafny in the sense that it is based on the calculus of inductive constructions.

Type theory

Let's take a step back and talk about type theory.

Take this implementation of a stack in C:

typedef struct {
    int data[100];
    int top;
} stack;

void push(stack *s, int x) {
    s->data[s->top] = x;
    s->top++;
}
// etc.

You'd say that stack is a type, and push is a function that takes a stack and an int and returns nothing.

In Coq, we would define the stack as an inductive type:

Inductive stack : Type :=
    | empty : stack
    | push : stack -> nat -> stack.

Notice how stack is a Type! This is because in Coq, everything is a type. Even functions are types! This is a part of the Curry-Howard correspondence.

However, this isn't a type in the sense of a data type. It is a category of types.

Category theory states that a category is a collection of objects and morphisms. In Coq, the objects are types and the morphisms are functions.

Important

tl;dr, uppercase type is a category of types, lowercase type is a specific data type.

Terms are the inhabitants of types, a.k.a. the actual values, i.e. 0 is a nat, true is a bool, etc.

Function applications are also terms. For example, f x is a term, where f is a function and x is a term.

It makes sense mathematically, since


f(x) = \boxed{y} \leftarrow \text{This is clearly a term}

while f is a function and x is a term.

Basic Concepts + Syntax

As a preface, the . at the end of each command is a delimiter. Similar to writing a ; at the end of a line in C.

Operators

  • + : addition
  • - : subtraction
  • * : multiplication
  • <= : less than or equal to
  • >= : greater than or equal to
  • -> : implication
  • /\ : conjunction (logical AND)
  • \/ : disjunction (logical OR)
  • ~ : negation (logical NOT)
  • <> : inequality
  • = : equality

Other stuff

  • [ ] : list
  • [| |] : tuple
  • [| ] : array
  • [ x : A | P ] : set comprehension

Common Statements

  • forall x : A, P : for all x of type A, P is true
  • exists x : A, P : there exists an x of type A such that P is true
  • fun x : A => B : a function that maps x of type A to B
  • let x := ___ in ___ : local definition, equivalent to Haskell
  • match ___ with ___ end : pattern matching, equivalent to Haskell
  • if ___ then ___ else ___ : if-then-else statement

Expressions and formulas

To check if a formula is well-formed, we can use the Check command.

Check 3. (* This will return nat *)
Check true. (* This will return bool *)
Check (3 + 4). (* This will return nat *)

Expanding on this, we can check lkogical propositions as well.

(* Augmented example from the book *)
(* Reads as: *)
Check (fun x:nat => x = 3). (* x is a natural number and x = 3 *)
Check (forall x : nat, x < 3 \/ (exists y, x = 2 + y)). 
(* x is a natural number and x < 3 or there exists a y such that x = 2 + y *)

Syntax

(* This is a comment *)
(* This is a definition *)
Definition my_first_function (x : nat) : nat := x + 1.

The difference between this and fun is that Definition is a global declaration, while fun is a local declaration, almost like a lambda in Python or \x -> <expr> in Haskell.

(* This is a theorem + proof *)
Theorem my_first_theorem : forall x : nat, my_first_function x = x + 1.
Proof.
    intros x.
    unfold my_first_function.
    reflexivity.
Qed.

Primitives and Inductive Types

Primitives are the basic types in Coq. These include nat, bool, unit, list, etc.

Constructors are used to build new types from primitives. These are functions that create values of a type.

For example

  • O is the constructor for nat
  • true and false are the constructors for bool
  • S is the constructor for nat
  • nil and cons are the constructors for list

Inductive types allow us to define new types in a way that can be recursively constructed. This is exactly the same as algebraic data types in Haskell.

Inductive nat : Type :=
  | O : nat               (* Zero *)
  | S : nat -> nat.       (* Successor function *)

Base case is O (i.e. zero), and the recursive case is S (i.e. the successor function - S O is 1, S (S O) is 2, etc.)

data Nat = Zero | Succ Nat

Another example, the list type:

Inductive list (A : Type) : Type :=
  | nil : list A               (* Empty list *)
  | cons : A -> list A -> list A. (* Constructor for adding an element to the front of a list *)

Where A is a type parameter. This is similar to generics in Java or abstract types in Haskell.

  • nil is the empty list
  • cons is the constructor for adding an element to the front of a list (i.e. : in Haskell)

To use an inductive type, we can pattern match on it, similar to Haskell.

Fixpoint length {A : Type} (l : list A) : nat :=
  match l with
  | nil => O
  | cons _ l' => S (length l')
  end.
  • Fixpoint is used to define recursive functions
  • length takes a list l of type A and returns its length (nat)
  • pattern matching handles the two cases of the list: nil and cons
  • _ is a wildcard, used when we don't care about the value

This is the same as the following Haskell code:

length :: [a] -> Int
length [] = 0
length (_:xs) = 1 + length xs

Reasoning about Inductive Types

Inductive types are defined by a set of constructors. These constructors are the only way to build values of the type.

They are called inductive types because they are defined inductively, hence they can be reasoned about inductively.

For example, we can prove that the length of a list is always non-negative, since the length of an empty list is 0 and the length of a non-empty list is 1 + the length of the tail.

Theorem length_cons : forall {A : Type} (x : A) (l : list A),
  length (cons x l) = S (length l).
Proof.
  intros A x l.
  simpl. (* Simplifies the goal *)
  reflexivity. (* Proves that both sides are equal *)
Qed.

Reasoning this way is called structural induction. It is similar to mathematical induction, but instead of proving a property for all natural numbers, we prove it for all values of an inductive type.

Let's try to express the above proof mathematically:


\text{Let } P(n) = \text{length } (x : A) (l : \text{list } A) = S(\text{length } l)

Base case: P(\text{nil}) = \text{length } (\text{nil}) = 0 = S(0)

Inductive step: P(\text{cons } x \text{ l}) = \text{length } (\text{cons } x \text{ l}) = 1 + \text{length } l = S(\text{length } l)

\therefore P(n) \text{ is true } \forall n \in \text{list } A \blacksquare

To further show this, let's take a list of the form {1, 2, 3}:


\text{length } (\text{cons } 1 (\text{cons } 2 (\text{cons } 3 \text{ nil}))) \\
= 1 + \text{length } (\text{cons } 2 (\text{cons } 3 \text{ nil})) \\
= 1 + 1 + \text{length } (\text{cons } 3 \text{ nil}) \\
= 1 + 1 + 1 + \text{length } \text{nil} \\
= 1 + 1 + 1 + 0 \\
= 3

Pretty cool, right?

I think we're ready to tackle some Ethereum-related problems, so-

Primitives in Ethereum (ERC)

We're going to be looking at the ERC-20 token standard. This is a standard for fungible tokens on the Ethereum blockchain.

The standard defines a set of functions that a token contract must implement in order to be considered ERC-20 compliant.

The standard defines the following functions:

Function Description
name Returns the name of the token
symbol Returns the symbol of the token
decimals Returns the number of decimals the token uses
totalSupply Returns the total supply of the token
balanceOf Returns the balance of a given address
transfer Transfers tokens from the sender to another address
transferFrom Transfers tokens from one address to another
approve Approves another address to spend tokens on your behalf
allowance Returns the amount of tokens that an owner has approved for a spender