Case Study: Regular Expressions

The property Ev provides a simple example to illustrate inductive definitions and the basic techniques for reasoning about them, but it's not very exciting - after all, it's equivalent to the two non-inductive definitions of parity we've already seen, and it doesn't seem to offer any concrete benefits over them. To give a better sense of the power of inductive definitions, let's show how to use them to model a classic concept from computer science: regular expressions.

Regular expressions are a simple language for describing strings, defined as follows:

type Regexp (t: Type) { emptyset emptystr chr (h: t) app (st1: Regexp t) (st2: Regexp t) union (st1: Regexp t) (st2: Regexp t) star (st1: Regexp t) }

Note that this definition is polymorphic: regular expressions in Reg_exp t describe strings with characters drawn from t - that is, lists of elements of t.

(We deviate slightly from standard practice by not requiring the type t to be finite. This results in a slightly different theory of regular expressions, but the difference is not significant for our purposes.)

We connect regular expressions and strings through the following rules, which define when a regular expression matches a given string:

• The EmptySet expression does not match any string. • The EmptyStr expression matches the empty string []. • The Chr x expression matches the single-character string [x]. • If re1 matches s1 and re2 matches s2, then (App re1 re2) matches (App s1 s2). • If at least one of the expressions re1 and re2 matches s, then Union re1 re2 matches s. • Finally, if we can write a string s as the concatenation of a sequence of strings s = (App s_1 (App... s_k...)), and the expression re matches each of the strings s_i, then Star re matches s.

As a special case, the sequence of strings may be empty, so Star re always matches the empty string [] regardless of what re is.

We can easily translate this informal definition into a data definition as follows:

type Expmatch <t> ~(xs: Data.List t) (r: Regexp t) { mempty : Expmatch t [] Regexp.emptystr mchar (x: t) : Expmatch t [x] (Regexp.chr x) mapp <s1: Data.List t> <s2: Data.List t> <re1: Regexp t> <re2: Regexp t> (e1: Expmatch t s1 re1) (e2: Expmatch t s2 re2) : Expmatch t (Data.List.concat t s1 s2) (Regexp.app re1 re2) munionl <s1: Data.List t> (re1: Regexp t) (re2: Regexp t) (e1: Expmatch t s1 re1) : Expmatch t s1 (Regexp.union re1 re2) munionr (re1: Regexp t) <s2: Data.List t> (re2: Regexp t) (e1: Expmatch t s2 re2) : Expmatch t s2 (Regexp.union re1 re2) mstarz (r: Regexp t) : Expmatch t [] (Regexp.star r) mstarapp <s1: Data.List t> <s2: Data.List t> <r1: Regexp t> (e1: Expmatch s1 r1) (e2: Expmatch s2 (Regexp.star r1)) : Expmatch t (Data.List.concat t s1 s2) (Regexp.star r1) }

Let's illustrate these rules with some examples.

Regexp_ex1 : Expmatch [1] (Regexp.chr 1) Regexp_ex1 = Expmatch.mchar 1 Regexp_ex2 : Expmatch [1, 2] (Regexp.app (Regexp.chr 1) (Regexp.chr 2)) Regexp_ex2 = Expmatch.mapp [1] (Regexp.chr 1) (Expmatch.mchar 1) [2] (Regexp.chr 2) (Expmatch.mchar 2)

Using pattern matching, we can also show that certain strings do not match a regular expression:

Regexp_ex3 : Not (Expmatch [1, 2] (Regexp.chr 1)) Regexp_ex3 = Empty.absurd _

We can define auxiliary functions to help write regular expressions. The reg_exp_of_list function constructs a regular expression that matches exactly the list it receives as an argument:

Regexp_of_list <t> (xs: List t) : Regexp t Regexp_of_list t List.nil = Regexp.emptystr Regexp_of_list t (List.cons xs.h xs.t) = Regexp.app (Regexp.chr xs.h) (Regexp_of_list t xs.t) Regexp_ex4 : Expmatch [1, 2, 3] (Regexp_of_list [1, 2, 3]) Regexp_ex4 = (Expmatch.mapp [1] (Regexp.chr 1) (Expmatch.mchar 1) [2, 3] (Regexp_of_list [2, 3]) (Expmatch.mapp [2] (Regexp.chr 2) (Expmatch.mchar 2) [3] (Regexp_of_list [3]) (Expmatch.mapp [3] (Regexp.chr 3) (Expmatch.mchar 3) [] (Regexp_of_list []) (Expmatch.mempty) ) ) )

We can also prove general facts about Exp_match. For example, the following lemma shows that every string s that matches re also matches Star re.

Mstar1 <t> <s: List t> <re: Regexp t> (e: Expmatch s re) : Expmatch s (Regexp.star re) Mstar1 t s re e = let msz = Expmatch.mstarz re let mss = Expmatch.mstarapp s [] re e msz let lst = List_concat s let rwt = Equal.rewrite lst (x => (Expmatch t x (Regexp.star re))) mss rwt

(Note the use of appendNilRightNeutral*** to change the goal of the theorem to exactly the same form expected by MStarApp.)

Exp_match_ex1

The following lemmas show that the informal matching rules provided at the beginning of the chapter can be derived from the formal inductive definition.

Munion <t> <s: List t> <re1: Regexp t> <re2: Regexp t> (p: Pair (Expmatch s re1) (Expmatch s re2)) : Expmatch s (Regexp.union re1 re2) Munion t s re1 re2 (Pair.new fst snd) = Expmatch.munionl t s [] re1 re2

The next lemma is stated in terms of the fold function from the Poly chapter: If ss: List (List t) represents a sequence of strings s1, ..., sn, then fold (++) ss [] is the result of concatenating all of them.

Fold <x> <y> (f: x -> y -> y) (l: Data.List x) (b: y) : y Fold x y f Data.List.nil b = b Fold x y f (Data.List.cons xs.h xs.t) b = f xs.h (Fold f xs.t b) MStar_ <t> (ss : Data.List (Data.List t)) (re : Regexp t) (construct_match : (s : Data.List t) -> (i : In s ss) -> Expmatch s re) : Expmatch (Concats ss) (Regexp.star re) MStar_ ss re construct_match = ?

Reg_exp_of_list

Prove that reg_exp_of_list satisfies the following specification:

Reg_exp_of_list_spec <t> (s1: Data.List t) (s2: Data.List t) : Iff (Expmatch s1 (Regexp_of_list s2)) (Prop.Equal s1 s2) Reg_exp_of_list_spec s1 s2 = ?

Since the definition of Exp_match has a recursive structure, it is expected that proofs involving regular expressions often require induction based on evidence. For example, suppose we wanted to prove the following intuitive result: if a regular expression re matches some string s, then every element of s must occur somewhere in re. To state this theorem, we first define a function re_chars that lists all characters occurring in a regular expression:

Re_chars <t> (re: Regexp t) : Data.List t Re_chars t Regexp.emptyset = [] Re_chars t Regexp.emptystr = [] Re_chars t (Regexp.chr x) = [x] Re_chars t (Regexp.app r0 r1) = Data.List.concat (Re_chars r0) (Re_chars r1) Re_chars t (Regexp.union r0 r1) = Data.List.concat (Re_chars r0) (Re_chars r1) Re_chars t (Regexp.star r0) = Re_chars r0 Re_star <t> (re: Regexp t) : Prop.Equal (Re_chars (Regexp.star re)) (Re_chars re) Re_star e re = Prop.Equal.refl

We can then formulate our theorem as follows:

Destruct <a> (x: a) (lx: Data.List a) (ly: Data.List a) (i: In x (Data.List.concat a lx ly)) : (Data.Either (In x lx) (In x ly)) Destruct a x lx ly i = let Data.Pair.new fst snd = (In_app_iff x lx ly) let f = (fst :: (_) -> (_)) i f Construct <a> (x: a) (lx: Regexp a) (ly: Regexp a) (e: Data.Either (In x (Re_chars lx)) (In x (Re_chars ly))) : In x (Data.List.concat (Re_chars lx) (Re_chars ly)) Construct a x lx ly e = (Data.Pair.snd ( In_app_iff x (Re_chars lx) (Re_chars ly)) :: (_) -> (_)) e In_re_match <a> (x: a) (re: Regexp a) (s: Data.List a) (e: Expmatch s re) (i: In x s) : In x (Re_chars re) In_re_match a x (Regexp.emptyset) Data.List.nil e i = i In_re_match a x (Regexp.emptyset) (Data.List.cons head tail) e i = Data.Empty.absurd _ //todo In_re_match a x (Regexp.emptystr) (Data.List.nil) Expmatch.mempty i = i In_re_match a x (Regexp.chr u s) (Data.List.cons t re (Data.List.nil _)) (Expmatch.mchar c) i = let e0 = Prop.Equal.refl :: Prop.Equal c re let e1 = Prop.Equal.refl :: Prop.Equal s c let chn = Prop.Equal.chain e1 e0 let rrt = Prop.Equal.rewrite (Prop.Equal.mirror chn) (y => (Data.Either (Prop.Equal _ x y) _)) i rrt In_re_match a x (Regexp.app t z y) (Data.List.concat u l1 l2) e i = let e0 = Prop.Equal.refl :: Prop.Equal u a let l3 = l1 :: Data.List a let l4 = l2 :: Data.List a let e1 = Prop.Equal.refl :: Prop.Equal l1 l3 let e2 = Prop.Equal.refl :: Prop.Equal l2 l4 let e3 = Prop.Equal.refl :: Prop.Equal (Data.List.concat u l1 l2) (Data.List.concat a l3 l4) let i = i :: (In a x (Data.List.concat u l1 l2)) let rrt = Prop.Equal.rewrite e3 ((k: (Data.List a))=> (In a x k)) i let des = Destruct x l3 l4 rrt des In_re_match a x re Data.List.nil (Expmatch.mstarz z) i = Data.Empty.absurd _ //todo

Re_not_empty

Write a recursive function re_not_empty that tests whether a regular expression matches any string. Prove that your function is correct.

Re_not_empty <t> (re: Regexp t) : Data.Bool Re_not_empty t re = ? Re_not_empty_correct <t> <re: Regexp t> : Equivalence (Data.Sigma (Data.List t) (s => Expmatch s re)) (Prop.Equal (Re_not_empty re) Data.Bool.true) Re_not_empty_correct t re = ?

The remember Tactic

Rewriting the section, dependent pattern matching solves all of this.

A potentially confusing feature of the induction tactic is that it easily allows you to attempt setting up an induction on a term that is not sufficiently general. The effect of this is to lose information (just as destruct can do) and leave you unable to conclude the proof. Here's an example:

Star_app <a> (re: Regexp a) (s1: Data.List a) (s2: Data.List a) (exp0: Expmatch a s1 (Regexp.star re)) (exp1: Expmatch a s2 (Regexp.star re)) : (Expmatch (Data.List.concat s1 s2) (Regexp.star re)) Star_app a re (Data.List.nil _) s2 (Expmatch.mstarz e) exp1 = exp1 Star_app a re s1 s2 (Expmatch.mstarapp t r s r1 m ms) exp1 = let e0 = Prop.Equal.refl :: Prop.Equal r1 re let e1 = Prop.Equal.refl :: Prop.Equal (Data.List.concat r s) s1 let ind = Expmatch.mstarapp m (Star_app r1 s s2 ms exp1) let rrt = Prop.Equal.rewrite e0 (x => (Expmatch _ _ (Regexp.star _ x))) ind let aux = App_assoc r s s2 let rrt = Prop.Equal.rewrite aux (x => (Expmatch _ (x) (_))) rrt let rrt = Prop.Equal.rewrite e1 (x => (Expmatch t (Data.List.concat _ x s2) (Regexp.star t re))) rrt rrt

Exp_match_ex2

MStar2 <t> (re: Regexp t) (s: Data.List t) (ss: Data.List (Data.List t)) (exp: Expmatch s (Regexp.star re)) : (Data.Pair (Prop.Equal s (Fold (x => y => Data.List.concat x y ) ss [])) ((s2: Data.List t) -> (In s2 ss) -> (Expmatch s2 re)))

The lemma MStar'' below (along with its inverse, the exercise MStar' above), shows that our definition of Exp_match for Star is equivalent to the one informally given earlier.

pumping

One of the earliest truly interesting theorems in the theory of regular expressions is the so-called pumping lemma, which states, informally, that any sufficiently long string s matching a regular expression re can be "pumped" by repeating some middle section of s an arbitrary number of times to produce a new string also matching re.

To start, we need to define what it means to be "sufficiently long". Since we are working in a constructive logic, we actually need to be able to compute, for each regular expression re, the minimum length for strings s to guarantee "pumpability".

pumping_constant :

Next, it is useful to define an auxiliary function that repeats a string (appends itself) a certain number of times.

napp :

Now, the pumping lemma itself states that if s => re and if the length of s is at leas't the pumping constant of re, then s can be divided into three substrings s1 ++ s2 ++ s3 such that s2 can be repeated any' number of times, and the result, when combined with s1 and s3, will still match re. Since D is also guaranteed to be non-empty, this gives us a (constructive!) way of generating strings matching re that are as long as we desire.

pumping :

To speed up the proof (which you should fill in), the omega tactic, which is enabled by the Require below, is helpful in several places to automatically complete tedious low-level arguments involving equalities or inequalities over natural numbers. We will revisit the omega tactic in a later chapter, but feel free to try it now if you'd like. The first case of the induction gives an example of how it is used.

pumping m le = ?pumping_rhs