Case Study: Regular Expressions
The property Ev provides a simple example to illustrate inductive definitions and the basic techniques for reasoning about them, but it's not very exciting - after all, it's equivalent to the two non-inductive definitions of parity we've already seen, and it doesn't seem to offer any concrete benefits over them. To give a better sense of the power of inductive definitions, let's show how to use them to model a classic concept from computer science: regular expressions.
Regular expressions are a simple language for describing strings, defined as follows:
type Regexp (t: Type) {
emptyset
emptystr
chr (h: t)
app (st1: Regexp t) (st2: Regexp t)
union (st1: Regexp t) (st2: Regexp t)
star (st1: Regexp t)
}
Note that this definition is polymorphic: regular expressions in Reg_exp t
describe strings with characters drawn from t
- that is, lists of elements of t
.
(We deviate slightly from standard practice by not requiring the type t to be finite. This results in a slightly different theory of regular expressions, but the difference is not significant for our purposes.)
We connect regular expressions and strings through the following rules, which define when a regular expression matches a given string:
• The EmptySet expression does not match any string. • The EmptyStr expression matches the empty string []. • The Chr x expression matches the single-character string [x]. • If re1 matches s1 and re2 matches s2, then (App re1 re2) matches (App s1 s2). • If at least one of the expressions re1 and re2 matches s, then Union re1 re2 matches s. • Finally, if we can write a string s as the concatenation of a sequence of strings s = (App s_1 (App... s_k...)), and the expression re matches each of the strings s_i, then Star re matches s.
As a special case, the sequence of strings may be empty, so Star re
always matches the empty string [] regardless of what re
is.
We can easily translate this informal definition into a data definition as follows:
type Expmatch <t> ~(xs: Data.List t) (r: Regexp t) {
mempty : Expmatch t [] Regexp.emptystr
mchar (x: t) : Expmatch t [x] (Regexp.chr x)
mapp
<s1: Data.List t>
<s2: Data.List t>
<re1: Regexp t>
<re2: Regexp t>
(e1: Expmatch t s1 re1)
(e2: Expmatch t s2 re2)
: Expmatch t (Data.List.concat t s1 s2) (Regexp.app re1 re2)
munionl <s1: Data.List t> (re1: Regexp t) (re2: Regexp t)
(e1: Expmatch t s1 re1)
: Expmatch t s1 (Regexp.union re1 re2)
munionr (re1: Regexp t) <s2: Data.List t> (re2: Regexp t)
(e1: Expmatch t s2 re2)
: Expmatch t s2 (Regexp.union re1 re2)
mstarz (r: Regexp t) : Expmatch t [] (Regexp.star r)
mstarapp
<s1: Data.List t>
<s2: Data.List t>
<r1: Regexp t>
(e1: Expmatch s1 r1)
(e2: Expmatch s2 (Regexp.star r1))
: Expmatch t (Data.List.concat t s1 s2) (Regexp.star r1)
}
Let's illustrate these rules with some examples.
Regexp_ex1 : Expmatch [1] (Regexp.chr 1)
Regexp_ex1 = Expmatch.mchar 1
Regexp_ex2 : Expmatch [1, 2] (Regexp.app (Regexp.chr 1) (Regexp.chr 2))
Regexp_ex2 = Expmatch.mapp [1] (Regexp.chr 1) (Expmatch.mchar 1) [2] (Regexp.chr 2) (Expmatch.mchar 2)
Using pattern matching, we can also show that certain strings do not match a regular expression:
Regexp_ex3 : Not (Expmatch [1, 2] (Regexp.chr 1))
Regexp_ex3 = Empty.absurd _
We can define auxiliary functions to help write regular expressions. The reg_exp_of_list function constructs a regular expression that matches exactly the list it receives as an argument:
Regexp_of_list <t> (xs: List t) : Regexp t
Regexp_of_list t List.nil = Regexp.emptystr
Regexp_of_list t (List.cons xs.h xs.t) = Regexp.app (Regexp.chr xs.h) (Regexp_of_list t xs.t)
Regexp_ex4 : Expmatch [1, 2, 3] (Regexp_of_list [1, 2, 3])
Regexp_ex4 =
(Expmatch.mapp
[1]
(Regexp.chr 1)
(Expmatch.mchar 1)
[2, 3]
(Regexp_of_list [2, 3])
(Expmatch.mapp
[2]
(Regexp.chr 2)
(Expmatch.mchar 2)
[3]
(Regexp_of_list [3])
(Expmatch.mapp
[3]
(Regexp.chr 3)
(Expmatch.mchar 3)
[]
(Regexp_of_list [])
(Expmatch.mempty)
)
)
)
We can also prove general facts about Exp_match. For example, the following lemma shows that every string s
that matches re
also matches Star re
.
Mstar1 <t> <s: List t> <re: Regexp t> (e: Expmatch s re) : Expmatch s (Regexp.star re)
Mstar1 t s re e =
let msz = Expmatch.mstarz re
let mss = Expmatch.mstarapp s [] re e msz
let lst = List_concat s
let rwt = Equal.rewrite lst (x => (Expmatch t x (Regexp.star re))) mss
rwt
(Note the use of appendNilRightNeutral*** to change the goal of the theorem to exactly the same form expected by MStarApp.)
Exp_match_ex1
The following lemmas show that the informal matching rules provided at the beginning of the chapter can be derived from the formal inductive definition.
Munion
<t>
<s: List t>
<re1: Regexp t>
<re2: Regexp t>
(p: Pair (Expmatch s re1) (Expmatch s re2))
: Expmatch s (Regexp.union re1 re2)
Munion t s re1 re2 (Pair.new fst snd) =
Expmatch.munionl t s [] re1 re2
The next lemma is stated in terms of the fold function from the Poly chapter: If ss: List (List t)
represents a sequence of strings s1, ..., sn, then fold (++) ss []
is the result of concatenating all of them.
Fold <x> <y> (f: x -> y -> y) (l: Data.List x) (b: y) : y
Fold x y f Data.List.nil b = b
Fold x y f (Data.List.cons xs.h xs.t) b = f xs.h (Fold f xs.t b)
MStar_ <t>
(ss : Data.List (Data.List t))
(re : Regexp t)
(construct_match : (s : Data.List t) -> (i : In s ss) -> Expmatch s re)
: Expmatch (Concats ss) (Regexp.star re)
MStar_ ss re construct_match = ?
Reg_exp_of_list
Prove that reg_exp_of_list satisfies the following specification:
Reg_exp_of_list_spec <t> (s1: Data.List t) (s2: Data.List t) : Iff (Expmatch s1 (Regexp_of_list s2)) (Prop.Equal s1 s2)
Reg_exp_of_list_spec s1 s2 = ?
Since the definition of Exp_match has a recursive structure, it is expected that proofs involving regular expressions often require induction based on evidence. For example, suppose we wanted to prove the following intuitive result: if a regular expression re matches some string s, then every element of s must occur somewhere in re. To state this theorem, we first define a function re_chars that lists all characters occurring in a regular expression:
Re_chars <t> (re: Regexp t) : Data.List t
Re_chars t Regexp.emptyset = []
Re_chars t Regexp.emptystr = []
Re_chars t (Regexp.chr x) = [x]
Re_chars t (Regexp.app r0 r1) = Data.List.concat (Re_chars r0) (Re_chars r1)
Re_chars t (Regexp.union r0 r1) = Data.List.concat (Re_chars r0) (Re_chars r1)
Re_chars t (Regexp.star r0) = Re_chars r0
Re_star <t> (re: Regexp t) : Prop.Equal (Re_chars (Regexp.star re)) (Re_chars re)
Re_star e re = Prop.Equal.refl
We can then formulate our theorem as follows:
Destruct <a> (x: a) (lx: Data.List a) (ly: Data.List a) (i: In x (Data.List.concat a lx ly)) : (Data.Either (In x lx) (In x ly))
Destruct a x lx ly i =
let Data.Pair.new fst snd = (In_app_iff x lx ly)
let f = (fst :: (_) -> (_)) i
f
Construct <a> (x: a) (lx: Regexp a) (ly: Regexp a) (e: Data.Either (In x (Re_chars lx)) (In x (Re_chars ly))) : In x (Data.List.concat (Re_chars lx) (Re_chars ly))
Construct a x lx ly e = (Data.Pair.snd ( In_app_iff x (Re_chars lx) (Re_chars ly)) :: (_) -> (_)) e
In_re_match <a> (x: a) (re: Regexp a) (s: Data.List a) (e: Expmatch s re) (i: In x s) : In x (Re_chars re)
In_re_match a x (Regexp.emptyset) Data.List.nil e i = i
In_re_match a x (Regexp.emptyset) (Data.List.cons head tail) e i = Data.Empty.absurd _ //todo
In_re_match a x (Regexp.emptystr) (Data.List.nil) Expmatch.mempty i = i
In_re_match a x (Regexp.chr u s) (Data.List.cons t re (Data.List.nil _)) (Expmatch.mchar c) i =
let e0 = Prop.Equal.refl :: Prop.Equal c re
let e1 = Prop.Equal.refl :: Prop.Equal s c
let chn = Prop.Equal.chain e1 e0
let rrt = Prop.Equal.rewrite (Prop.Equal.mirror chn) (y => (Data.Either (Prop.Equal _ x y) _)) i
rrt
In_re_match a x (Regexp.app t z y) (Data.List.concat u l1 l2) e i =
let e0 = Prop.Equal.refl :: Prop.Equal u a
let l3 = l1 :: Data.List a
let l4 = l2 :: Data.List a
let e1 = Prop.Equal.refl :: Prop.Equal l1 l3
let e2 = Prop.Equal.refl :: Prop.Equal l2 l4
let e3 = Prop.Equal.refl :: Prop.Equal (Data.List.concat u l1 l2) (Data.List.concat a l3 l4)
let i = i :: (In a x (Data.List.concat u l1 l2))
let rrt = Prop.Equal.rewrite e3 ((k: (Data.List a))=> (In a x k)) i
let des = Destruct x l3 l4 rrt
des
In_re_match a x re Data.List.nil (Expmatch.mstarz z) i = Data.Empty.absurd _ //todo
Re_not_empty
Write a recursive function re_not_empty that tests whether a regular expression matches any string. Prove that your function is correct.
Re_not_empty <t> (re: Regexp t) : Data.Bool
Re_not_empty t re = ?
Re_not_empty_correct <t> <re: Regexp t>
: Equivalence (Data.Sigma (Data.List t) (s => Expmatch s re)) (Prop.Equal (Re_not_empty re) Data.Bool.true)
Re_not_empty_correct t re = ?
The remember Tactic
Rewriting the section, dependent pattern matching solves all of this.
A potentially confusing feature of the induction tactic is that it easily allows you to attempt setting up an induction on a term that is not sufficiently general. The effect of this is to lose information (just as destruct can do) and leave you unable to conclude the proof. Here's an example:
Star_app <a> (re: Regexp a) (s1: Data.List a) (s2: Data.List a) (exp0: Expmatch a s1 (Regexp.star re)) (exp1: Expmatch a s2 (Regexp.star re))
: (Expmatch (Data.List.concat s1 s2) (Regexp.star re))
Star_app a re (Data.List.nil _) s2 (Expmatch.mstarz e) exp1 = exp1
Star_app a re s1 s2 (Expmatch.mstarapp t r s r1 m ms) exp1 =
let e0 = Prop.Equal.refl :: Prop.Equal r1 re
let e1 = Prop.Equal.refl :: Prop.Equal (Data.List.concat r s) s1
let ind = Expmatch.mstarapp m (Star_app r1 s s2 ms exp1)
let rrt = Prop.Equal.rewrite e0 (x => (Expmatch _ _ (Regexp.star _ x))) ind
let aux = App_assoc r s s2
let rrt = Prop.Equal.rewrite aux (x => (Expmatch _ (x) (_))) rrt
let rrt = Prop.Equal.rewrite e1 (x => (Expmatch t (Data.List.concat _ x s2) (Regexp.star t re))) rrt
rrt
Exp_match_ex2
MStar2 <t> (re: Regexp t) (s: Data.List t) (ss: Data.List (Data.List t)) (exp: Expmatch s (Regexp.star re))
: (Data.Pair (Prop.Equal s (Fold (x => y => Data.List.concat x y ) ss [])) ((s2: Data.List t) -> (In s2 ss) -> (Expmatch s2 re)))
The lemma MStar'' below (along with its inverse, the exercise MStar' above), shows that our definition of Exp_match for Star is equivalent to the one informally given earlier.
pumping
One of the earliest truly interesting theorems in the theory of regular expressions is the so-called pumping lemma, which states, informally, that any sufficiently long string s matching a regular expression re can be "pumped" by repeating some middle section of s an arbitrary number of times to produce a new string also matching re.
To start, we need to define what it means to be "sufficiently long". Since we are working in a constructive logic, we actually need to be able to compute, for each regular expression re, the minimum length for strings s to guarantee "pumpability".
pumping_constant :
Next, it is useful to define an auxiliary function that repeats a string (appends itself) a certain number of times.
napp :
Now, the pumping lemma itself states that if s => re
and if the length of s
is at leas't the pumping constant of re
, then s
can be divided into three substrings s1 ++ s2 ++ s3
such that s2
can be repeated any' number of times, and the result, when combined with s1
and s3
, will still match re
. Since D is also guaranteed to be non-empty, this gives us a (constructive!) way of generating strings matching re
that are as long as we desire.
pumping :
To speed up the proof (which you should fill in), the omega tactic, which is enabled by the Require below, is helpful in several places to automatically complete tedious low-level arguments involving equalities or inequalities over natural numbers. We will revisit the omega tactic in a later chapter, but feel free to try it now if you'd like. The first case of the induction gives an example of how it is used.
pumping m le = ?pumping_rhs