module StringLab.Data.List.Lyndon {o } (A : Trichotomous o ) where

Lyndon Words🔗

Let be a trichotomous order. A word is a Lyndon word if it is lexicographically smaller than all of its non-empty proper right factors.

Note

Note that we deviate slightly from other resources, and allow Lyndon words to be empty.

is-lyndon : (xs : List  A )  Type (o  )
is-lyndon xs =  vs   Nonempty vs   vs ⊏ʳ xs  xs  vs

Properties of Lyndon words🔗

We can immediately observe that Lyndon words are border-free.

lyndon-no-border
  :  (us xs : List  A )
    Nonempty us 
   is-lyndon xs
   us ⊏ˡ xs
   us ⊏ʳ xs
   

To see why, suppose that is a border of a Lyndon word By definition, must be a right factor of so However, is also a proper left factor fo so a contradiction!

lyndon-no-border us xs xs-lyn us⊏ˡxs us⊏ʳxs =
  ≺-asym (⊏ˡ-≺ us⊏ˡxs) (xs-lyn us us⊏ʳxs)

Lyndon words are minimal elements of their conjugacy classes.

lyndon-minimal-conj
  :  {xs : List  A }
   is-lyndon xs
   (us vs : List  A )
    Nonempty us    Nonempty vs 
   us ++ vs  xs  xs  vs ++ us
lyndon-minimal-conj xs-lyn (u  us) vs us+vs=xs =

The proof is quite elementary. If is a proper factorization of then is a proper right factor, and

  ≺-++-injl (xs-lyn vs (u , us , us+vs=xs))

It turns out that our previous result is both a neccesary and sufficient condition.

minimal-conj→is-lyndon
  : (xs : List  A )
   (∀ (us vs : List  A )   Nonempty us    Nonempty vs   us ++ vs  xs  xs  vs ++ us)
   is-lyndon xs

Unfortunately, this proof is nowhere near as elementary as the previous one. Suppose that is minimal amonst its conjugacy class, and that is a proper right factor of we need to show that

Let denote the left portion of the factorisation of is the minimal element of its conjugacy class, so If is not a border of then we immediately have Therefore, it suffices to show that is not a left factor of

minimal-conj→is-lyndon xs minimal-conj (v  vs) vs⊏xs@(u , us , us+vs=xs) =
  ≺-++-restrictr [] (u  us) ([ ¬vs=xs , ¬vs⊏xs ]  ⊑ˡ-strengthen) $
  ≼-≺-trans (inr (++-idr xs)) $
  xs≺vs+us
  where

    xs≺vs+us : xs  v  vs ++ u  us
    xs≺vs+us = minimal-conj (u  us) (v  vs) us+vs=xs

Clearly, so it suffices to show that is not a proper left factor of

    ¬vs=xs : ¬ (v  vs  xs)
    ¬vs=xs = ⊏ʳ-irrefl' vs⊏xs

Unfortunately, proving this is somewhat involved. Aiming for a contradiction, suppose that is a left factor of Our plan is to show that this implies that which contradicts our assumption that is the minimal element of its conjugacy class.

    ¬vs⊏xs : ¬ (v  vs ⊏ˡ xs)
    ¬vs⊏xs vs⊏xs@(w , ws , vs+ws=xs) =
      absurd (≺-asym xs≺vs+us vs+us≺xs)
      where

Let denote the right factor of corresponding to Note that the factorisation is a conjugate of and thus

        us+vs≺ws+vs : u  us ++ v  vs  w  ws ++ v  vs
        us+vs≺ws+vs =
          ≼-≺-trans (inr us+vs=xs) $
          minimal-conj (v  vs) (w  ws) vs+ws=xs

It now suffices to show that is not a left factor of if this where the case, then we’d get that from which in turn yields

        ¬ws⊑us-suffices : ¬ (w  ws ⊑ˡ u  us)  v  vs ++ u  us  v  vs ++ w  ws
        ¬ws⊑us-suffices ¬ws⊑us =
           Equiv.to (++-pres-≺l-≃ (v  vs) (v  vs) refl) $
           ≺-++-restrictr (v  vs) (v  vs) ¬ws⊑us us+vs≺ws+vs

This final step is ommited in most resources, but is a total slog. We will not dwell too deeply on the details: what follows is a brute force calculation.

        ¬ws⊑us : ¬ (w  ws ⊑ˡ u  us)
        ¬ws⊑us (ts , ws+ts=us) =
          ≺-irrefl $
          ≺-trans ws+xs+ws≺ws+ts+vs+ws $
          ≼-≺-trans (inr ws+ts+vs+ws=us+vs+ws) $
          ≺-≼-trans (≺-++-extendr (w  ws) (w  ws) ¬us+vs⊑ws+vs us+vs≺ws+vs)
          (inr ws+vs+ws=ws+xs)
          where
            ws+xs+ws≺ws+ts+vs+ws : w  ws ++ xs  w  ws ++ (ts ++ v  vs) ++ w  ws
            ws+xs+ws≺ws+ts+vs+ws =
              Equiv.to (++-pres-≺l-≃ (w  ws) (w  ws) refl) $
              minimal-conj (w  ws) (ts ++ v  vs) $
                w  ws ++ ts ++ v  vs   ≡˘⟨ ++-assoc (w  ws) ts (v  vs) ≡˘
                (w  ws ++ ts) ++ v  vs ≡⟨ ap (_++ v  vs) ws+ts=us 
                u  us ++ v  vs         ≡⟨ us+vs=xs 
                xs                       

            ws+ts+vs+ws=us+vs+ws : w  ws ++ (ts ++ v  vs) ++ w  ws  (u  us ++ v  vs) ++ w  ws
            ws+ts+vs+ws=us+vs+ws =
              w  ws ++ (ts ++ v  vs) ++ w  ws   ≡⟨ ap (w  ws ++_) (++-assoc ts (v  vs) (w  ws)) 
              w  ws ++ ts ++ v  vs ++ w  ws     ≡˘⟨ ++-assoc (w  ws) ts (v  vs ++ w  ws) ≡˘
               w  ws ++ ts  ++ v  vs ++ w  ws ≡⟨ ap! ws+ts=us 
              u  us ++ v  vs ++ w  ws           ≡˘⟨ ++-assoc (u  us) (v  vs) (w  ws) ≡˘
              (u  us ++ v  vs) ++ w  ws         

            ¬us+vs⊏ws+vs : ¬ (u  us ++ v  vs ⊏ˡ w  ws ++ v  vs)
            ¬us+vs⊏ws+vs us+vs⊏ws+vs =
              Nat.<-irrefl (ap length (us+vs=xs  sym vs+ws=xs)) $
              Nat.≤-trans (⊏ˡ-length-< us+vs⊏ws+vs) $
              Nat.≤-refl' (length-conj (w  ws) (v  vs))

            ¬us+vs⊑ws+vs : ¬ (u  us ++ v  vs ⊑ˡ w  ws ++ v  vs)
            ¬us+vs⊑ws+vs = [ ≺-irrefl' us+vs≺ws+vs , ¬us+vs⊏ws+vs ]  ⊑ˡ-strengthen

            ws+vs+ws=ws+xs : (w  ws ++ v  vs) ++ w  ws  w  ws ++ xs
            ws+vs+ws=ws+xs =
              (w  ws ++ v  vs) ++ w  ws ≡⟨ ++-assoc (w  ws) (v  vs) (w  ws) 
              w  ws ++ v  vs ++ w  ws   ≡⟨ ap (w  ws ++_) vs+ws=xs 
              w  ws ++ xs                 

If we put everything together, we get our contradictory proof that completing the proof.

        vs+us≺xs : v  vs ++ u  us  xs
        vs+us≺xs = ≺-≼-trans (¬ws⊑us-suffices ¬ws⊑us) (inr vs+ws=xs)

Next, a small technical lemma. Let be a proper right factor where is Lyndon. if then

lyndon-⊏ʳ-++-≺-projl
  :  {xs ys zs : List  A }
   is-lyndon zs
   xs ⊏ʳ ys ++ zs
   ys  xs
   ys ++ zs  xs

Aiming for a contradiction, suppose that Clearly so we can focus our attention on the case where

Either or is a proper left factor of and the corresponding right factor is smaller than The first case immediately contradicts our assumption, and the second case follows from our assumption that is lyndon.

lyndon-⊏ʳ-++-≺-projl {xs = xs} {ys = ys} {zs = zs} zs-lyn xs⊏ys+zs ys≺xs =
  ¬≽→≺ (ys ++ zs) xs λ where
    (inr xs=ys+zs) 
      absurd (⊏ʳ-⊑ˡ-asym xs⊏ys+zs (⊑ˡ-refl' (sym xs=ys+zs)))
    (inl xs≺ys+zs) 
      case ≺-split-++ xs ys zs xs≺ys+zs of λ where
        (inl xs≼ys) 
          absurd (≺→¬≽ ys xs ys≺xs xs≼ys)
        (inr (ys⊏xs , us≺zs)) 
          absurd $ᵢ ≺-asym us≺zs $
          zs-lyn (⊏ˡ-factor ys⊏xs) $
          ⊏ʳ-++-restrictr-⊑ˡ xs ys zs xs⊏ys+zs (⊏ˡ-weaken ys⊏xs)

From this, we can deduce that if with nonempty and Lyndon, then

lyndon-++-≺-projl
  :  {xs ys : List  A }
    Nonempty xs 
   is-lyndon ys  xs  ys
   xs ++ ys  ys
lyndon-++-≺-projl {xs = xs} {ys = ys} ys-lyn xs≺ys =
  lyndon-⊏ʳ-++-≺-projl ys-lyn (⊏ʳ-++-inr xs ys) xs≺ys

If is a proper right factor of with and is Lyndon, then

lyndon-⊏ʳ-++-≼-injl
  :  {xs ys zs : List  A }
   is-lyndon zs
   xs ⊏ʳ ys ++ zs
   xs  ys ++ zs
   xs  ys
lyndon-⊏ʳ-++-≼-injl {xs} {ys} {zs} zs-lyn xs⊏ys+zs xs≺ys+zs with ≺-tri xs ys
... | lt xs≺ys _ _ =
  inl xs≺ys
... | eq _ xs=ys _ = inr xs=ys
... | gt _ _ ys≺xs =
  absurd $ᵢ ≺-asym xs≺ys+zs (lyndon-⊏ʳ-++-≺-projl zs-lyn xs⊏ys+zs ys≺xs)

If all of and are Lyndon and is nonempty, then

lyndon-++-≺
  :  {xs ys : List  A }
    Nonempty ys 
   is-lyndon xs  is-lyndon ys  is-lyndon (xs ++ ys)
   xs  ys
lyndon-++-≺ {xs = []} {ys = y  ys} xs-lyn ys-lyn xs+ys-lyn =
  ≺-nil
lyndon-++-≺ {xs = x  xs} {ys = y  ys} xs-lyn ys-lyn xs+ys-lyn =
  ≺-trans (≺-++-injr refl) $
  xs+ys-lyn (y  ys) (⊏ʳ-++-inr (x  xs) (y  ys))

Note that all powers of nonempty Lyndon words greater than 2 are not Lyndon.

powers-not-lyndon
  :  {xs}
    Nonempty xs 
   (k : Nat)
   is-lyndon xs
   ¬ (is-lyndon (powers xs (2 + k) []))
powers-not-lyndon {xs = xs} zero xs-lyn xs^k-lyn =
  ≺-irrefl $
  lyndon-++-≺ xs-lyn xs-lyn $
  subst is-lyndon (ap (xs ++_) (++-idr xs)) xs^k-lyn
powers-not-lyndon {xs = xs} (suc k) xs-lyn xs^k-lyn =
  powers-not-lyndon k xs-lyn λ vs vs⊏x^k 
  ≺-trans (≺-++-injr refl) $
  ≼-≺-trans (inr (sym (powers-comm xs (2 + k)))) $
  (xs^k-lyn vs (⊏ʳ-trans vs⊏x^k (⊏ʳ-++-inr xs (powers xs (2 + k) []))))

If is lyndon, then the first element is less than or equal to every other index.

lyndon-head-≤
  :  {xs : List  A }
   is-lyndon xs
    i  head (xs ! i) xs  (xs ! i)

The proof is easy but cute. The key idea is that every index is associated to a a right factor of if is lyndon, then none of those can be larger than itself!

lyndon-head-≤ {xs = x  xs} xs-lyn fzero =
  inr refl
lyndon-head-≤ {xs = x  xs} xs-lyn (fsuc i) =
  ¬>→≤ x ((x  xs) ! fsuc i) λ xs[i]<x 
    ≺-asym
      (drop-head-≺ xs (x  xs) i xs[i]<x)
      (xs-lyn (drop (suc (Fin.to-nat i)) (x  xs))
         drop-to-nat-nonempty xs i 
        (drop-suc-⊏ʳ (Fin.to-nat i) (x  xs)))

Indexing can be a bit tedious to work with, so we also provide a more specialized forms of the previous lemma.

lyndon-head-++-≤
  :  (xs ys : List  A )
    _ : Nonempty xs   _ : Nonempty ys 
   is-lyndon (xs ++ ys)
   head⁺ xs  head⁺ ys
lyndon-head-++-≤ (x  xs) (y  ys) xs+ys-lyn =
  ¬>→≤ x y λ y<x 
    ≺-asym
      (xs+ys-lyn (y  ys) (⊏ʳ-++-inr (x  xs) (y  ys)))
      (≺-∷-head y<x)

lyndon-head-last-≤
  :  (xs : List  A )
    _ : Nonempty xs 
   is-lyndon xs
   head⁺ xs  last⁺ xs
lyndon-head-last-≤ (x  []) xs-lyn = inr refl
lyndon-head-last-≤ (x  x'  xs) xs-lyn =
  ¬>→≤ x (last xs x') λ last<head 
    ≺-asym
      (xs-lyn (last xs x'  []) (last-nonempty-⊏ʳ x (x'  xs) ))
      (≺-∷-head last<head)

Closure properties of Lyndon words🔗

The empty word is Lyndon.

empty-lyndon :  (xs : List  A )   _ : Empty xs   is-lyndon xs
empty-lyndon [] (v  vs) vs⊏[] = absurd (⊏ʳ-strict-bot (v  vs) vs⊏[])

Any atomic word is Lyndon.

atom-lyndon :  (x :  A )  is-lyndon (x  [])
atom-lyndon x (v  vs) vs⊏x =
  absurd (⊑ʳ-strict-bot (⊏ʳ-tailr vs⊏x))

If and are both Lyndon, and then is Lyndon.

++-≺-lyndon
  :  {xs ys : List  A }
   is-lyndon xs  is-lyndon ys  xs  ys
   is-lyndon (xs ++ ys)

If is empty, then and Lyndon by our assumption.

++-≺-lyndon {xs = []} {ys = ys} xs-lyn ys-lyn xs≺ys vs vs⊏xs+ys =
  ys-lyn vs vs⊏xs+ys

Let be a proper right factor of we ened to show that There are 3 cases to consider:

  1. is a proper right factor of this immediately follows from our assumption that is Lyndon.
  2. is equal this also follows from our assumption that is Lyndon.
  3. is a proper right factor of this follows from our assumption that is lyndon.
++-≺-lyndon {xs = x  xs} {ys = ys} xs-lyn ys-lyn xs≺ys vs vs⊏xs+ys =
  case ⊏ʳ-split-++ vs (x  xs) ys vs⊏xs+ys of λ where
    (inl vs⊏ys) 
      ≺-trans (lyndon-++-≺-projl ys-lyn xs≺ys) $
      ys-lyn vs vs⊏ys
    (inr (inl vs=ys)) 
      ≺-≼-trans (lyndon-++-≺-projl ys-lyn xs≺ys) $
      inr (sym vs=ys)
    (inr (inr ((w , ws , ws+vs=ys) , ws⊏xs))) 
      ≺-≼-trans (≺-++-extendr ys ys (⊏ʳ-⊑ˡ-asym ws⊏xs) (xs-lyn (w  ws) ws⊏xs)) $
      inr ws+vs=ys

We can generalize the previous theorem to powers of Lyndon words.

powers-≺-lyndon
  :  {xs ys : List  A }
   (k : Nat)
   is-lyndon xs  is-lyndon ys
   xs  ys
   is-lyndon (powers xs (suc k) ys)
powers-≺-lyndon {xs = xs} {ys = y  ys} zero xs-lyn ys-lyn xs≺ys =
  ++-≺-lyndon xs-lyn ys-lyn xs≺ys
powers-≺-lyndon {xs = xs} {ys = y  ys} (suc k) xs-lyn ys-lyn xs≺ys =
  ++-≺-lyndon xs-lyn (powers-≺-lyndon k xs-lyn ys-lyn xs≺ys)
    (≺-++-injr  powers-nonemptyr xs k (y  ys) nonempty  refl)

The following lemma is due to Duval: if is Lyndon, and is an atom that is strictly larger than then is Lyndon.

++-∷r-≺-lyndon
  :  {xs ys : List  A } {z}
    Nonempty ys 
   is-lyndon (xs ++ ys)
   ys  z  []
   is-lyndon (xs ∷r z)

If is empty, then this is trivially true.

++-∷r-≺-lyndon {xs = []} {ys = ys} {z = z} xs+ys-lyn ys≺z =
  atom-lyndon z

Now for the hard case. Let be a proper right factor of There are 3 cases to consider:

  1. is a proper right factor of
  2. is proper right factor of
++-∷r-≺-lyndon {xs = x  xs} {ys = y  ys} {z = z} xs+ys-lyn ys≺z@(≺-∷-head y<z) vs vs⊏xs∷z =
  case ⊏ʳ-split-++ vs (x  xs) (z  []) vs⊏xs∷z of λ where

The first case is trivially false, as there are no nonempty proper right factors of atoms.

    (inl vs⊏z) 
      absurd (nonempty-¬⊏ʳ-atom auto vs⊏z)

The second case is a bit tricker, but only just. First, note that as is Lyndon. Moreover, which finishes the proof.

    (inr (inl vs=z)) 
      let x<z = ≤-<-trans (lyndon-head-++-≤ (x  xs) (y  ys) xs+ys-lyn) y<z
      in ≺-≼-trans (≺-∷-head x<z) (inr (sym vs=z))

Now for the hard case. Let where is a proper right factor of we need to show that

First, observe that is Lyndon, so it suffices to show that by lyndon-⊏ʳ-++-≺-projl. Moreover, is Lyndon, so

\ Warning

This proof is different from the one presented by Duval, which contains has a mistake in this case. His proof attempts to use what ammmounts to ≺-++-extendr, which would require him to prove that is not a prefix of instead, he proves that cannot be a prefix of

    (inr (inr (z⊏vs@(w , ws , ws+z=vs) , ws⊏xs))) 
      lyndon-⊏ʳ-++-≺-projl {ys = x  xs} (atom-lyndon z) vs⊏xs∷z $
      ≺-trans (≺-++-injr refl) $
      ≺-trans (xs+ys-lyn (w  ws ++ y  ys) (⊏ʳ-++-extendr ws⊏xs refl)) $
      ≺-≼-trans (≺-++-extendl (w  ws) (w  ws) refl ys≺z) $
      (inr ws+z=vs)

Following the usual theme, we can generalize our previous lemma to powers.

powers-∷r-≺-lyndon
  :  {xs xs' : List  A } {x' :  A }
   (k : Nat)
   is-lyndon xs
   (xs'⊏xs : xs' ⊏ˡ xs)
   xs'⊏xs .fst < x'
   is-lyndon (powers xs (suc k) (xs' ∷r x'))

The proof is basically just cobbling together previous lemmas, so we shall not dwell on it for too long.

powers-∷r-≺-lyndon {xs = xs} {xs' = xs'} {x' = x'} k xs-lyn xs'⊏xs@(u , us , xs'+us=xs) u<x' =
  powers-≺-lyndon k xs-lyn xs'∷x'-lyn xs≺xs'∷x'
  where
    xs'∷x'-lyn : is-lyndon (xs' ∷r x')
    xs'∷x'-lyn =
      ++-∷r-≺-lyndon
        (subst is-lyndon (sym xs'+us=xs) xs-lyn)
        (≺-∷-head u<x')

    xs≺xs'∷x' : xs  xs' ∷r x'
    xs≺xs'∷x' =
      ≼-≺-trans (inr (sym xs'+us=xs)) $
      ≺-++-extendl xs' xs' refl (≺-∷-head u<x')

In a similar vein, if we have a sesquipower where equals the character after the end of the prefix then is not Lyndon.

powers-∷r-≡-not-lyndon
  :  {xs xs' : List  A } {x' :  A }
   (k : Nat)
   (xs'⊏xs : xs' ⊏ˡ xs)
   xs'⊏xs .fst  x'
   ¬ (is-lyndon (powers xs (suc k) (xs' ∷r x')))

The proof is conceptually simple: has a border, so it cannot be Lyndon. The formal proof does take a bit more work, but it’s mostly busywork.

powers-∷r-≡-not-lyndon {xs = xs} {xs' = xs'} {x' = x'} k xs'⊏xs@(u , us , xs'+us=xs) u=x' x^k-lyn =
  lyndon-no-border
    (xs' ∷r x')
    (powers xs (suc k) (xs' ∷r x'))
    x^k-lyn
    xs'∷x'⊏ˡxs^k+xs'∷x'
    xs'∷x'⊏ʳxs^k+xs'∷x'
  where
    xs'∷x'⊏ˡxs^k+xs'∷x' : xs' ∷r x' ⊏ˡ powers xs (suc k) (xs' ∷r x')
    xs'∷x'⊏ˡxs^k+xs'∷x' =
      ++-nonempty-⊏ˡ
        (xs' ∷r x')
        (us ++ powers xs k (xs' ∷r x'))
        (powers xs (suc k) (xs' ∷r x'))
        (++-nonemptyr us (powers xs k (xs' ∷r x')) (powers-nonemptyr xs k (xs' ∷r x') auto)) $
          xs' ∷r x' ++ us ++ powers xs k (xs' ∷r x')      ≡˘⟨ ++-assoc (xs' ∷r x') us (powers xs k (xs' ∷r x')) ≡˘
           xs' ∷r x' ++ us  ++ powers xs k (xs' ∷r x')  ≡⟨ ap! (++-assoc xs' (x'  []) us) 
          (xs' ++  x'   us) ++ powers xs k (xs' ∷r x') ≡⟨ ap! (sym u=x') 
           xs' ++ u  us  ++ powers xs k (xs' ∷r x')    ≡⟨ ap! xs'+us=xs 
          powers xs (suc k) (xs' ∷r x') 

    xs'∷x'⊏ʳxs^k+xs'∷x' : xs' ∷r x' ⊏ʳ powers xs (suc k) (xs' ∷r x')
    xs'∷x'⊏ʳxs^k+xs'∷x' =
      ⊏ʳ-powers-inr xs k (xs' ∷r x')  nonempty-⊏ˡ xs'⊏xs 

Likewise, if we have a sesquipower where is smaller than the character after the end of the prefix then is not Lyndon.

powers-∷r->-not-lyndon
  :  {xs xs' : List  A } {x' :  A }
   (k : Nat)
   (xs'⊏xs : xs' ⊏ˡ xs)
   x' < xs'⊏xs .fst
   ¬ (is-lyndon (powers xs (suc k) (xs' ∷r x')))

Aiming for a contradiction, suppose that is Lyndon. Moreover, let denote the right factor of corresponding to

Note that is a right factor of so Moreover, we have:

This lets us conclude that However, this immediately implies that which contradicts our hypothesis that is smaller than the character after

powers-∷r->-not-lyndon {xs = xs} {xs' = xs'} {x' = x'} k xs'⊏xs@(u , us , xs'+us=xs) x'<u xs^k-lyn =
  <-asym x'<u u<x'
  where
    instance
      Nonempty-xs^k+xs'∷x' : Nonempty (powers xs k (xs' ∷r x'))
      Nonempty-xs^k+xs'∷x' = powers-nonemptyr xs k (xs' ∷r x') auto

    xs'⊏xs^k+xs'∷x' : xs' ⊏ˡ powers xs (suc k) (xs' ∷r x')
    xs'⊏xs^k+xs'∷x' =
      ⊏ˡ-trans xs'⊏xs $
      ⊏ˡ-++-inl xs (powers xs k (xs' ∷r x'))

    xs^k+xs'∷xs'≺xs'∷x' : powers xs (suc k) (xs' ∷r x')  (xs' ∷r x')
    xs^k+xs'∷xs'≺xs'∷x' =
      xs^k-lyn (xs' ∷r x') $
      ⊏ʳ-powers-inr xs k (xs' ∷r x')  nonempty-⊏ˡ xs'⊏xs 

    u<x' : u < x'
    u<x' =
      ≺-atom-nonempty $
      ≺-++-restrict-⊏ˡ xs'⊏xs^k+xs'∷x' xs^k+xs'∷xs'≺xs'∷x'

Decidablity🔗

The following is a näive algorithm for checking if a word is Lyndon, where we check every single suffix in order to see if it is lexicographically larger than This is only required for our proofs, so performance is not too much of a concern: a better version would construct a suffix array.

is-lyndon?
  :  (xs : List  A )
   is-lyndon xs  Σ[ vs  List  A  ] (Nonempty vs × vs ⊏ʳ xs × vs  xs)
is-lyndon? [] = inl (empty-lyndon [])
is-lyndon? (x  xs) with ≺-on-suffixes xs (x  xs) (x , [] , refl)
... | inl xs-lyn = inl λ us us⊏xs  xs-lyn us (⊏ʳ-tailr us⊏xs)
... | inr (vs , vs-ne , vs⊏xs , vs≺xs) = inr (vs , vs-ne , ⊑ʳ-consr vs⊏xs , vs≺xs)

instance
  Dec-is-lyndon :  {xs}  Dec (is-lyndon xs)
  Dec-is-lyndon {xs = xs} with is-lyndon? xs
  ... | inl xs-lyn =
    yes xs-lyn
  ... | inr (vs , vs-ne , vs⊏xs , vs≺xs) =
    no λ xs-lyn  ≺-asym vs≺xs (xs-lyn vs  vs-ne  vs⊏xs)

The main reason we need to check if a word is lyndon is the following result: if a word is not lyndon, then there is a larger suffix.

not-lyndon-≼
  :  (xs : List  A )
   ¬ (is-lyndon xs)
   Σ[ vs  List  A  ] (Nonempty vs × vs ⊏ʳ xs × vs  xs)
not-lyndon-≼ [] ¬xs-lyn = absurd (¬xs-lyn (empty-lyndon []))
not-lyndon-≼ (x  xs) ¬xs-lyn with ≺-on-suffixes xs (x  xs) (x , [] , refl)
... | inl xs-lyn = absurd (¬xs-lyn  us us⊏xs  xs-lyn us (⊏ʳ-tailr us⊏xs)))
... | inr (vs , vs-ne , vs⊑x∷xs , vs≺x∷xs) = vs , vs-ne , ⊑ʳ-consr vs⊑x∷xs , vs≺x∷xs

Standard factorizations🔗

A pair of words for form a right standard pair if the are both nonempty, and is the largest right factor of that is Lyndon. We say that a right standard pair is Lyndon if is Lyndon.

record is-right-standard (xs ys : List  A )
  : Type (o  ) where
  constructor standard-factors
  field
    lfactor-nonempty : Nonempty xs
    rfactor-nonempty : Nonempty ys
    rfactor-lyndon  : is-lyndon ys
    rfactor-maximal :  us  us ⊏ʳ (xs ++ ys)  ys ⊏ʳ us  ¬ (is-lyndon us)

  rfactor-⊑ʳ :  us  us ⊏ʳ (xs ++ ys)  is-lyndon us  us ⊑ʳ ys
  rfactor-⊑ʳ us us⊏xs+ys us-lyn with ⊏ʳ-split-++ us xs ys us⊏xs+ys
  ... | inl us⊑ys = ⊏ʳ-weaken us⊑ys
  ... | inr (inl us=ys) = ⊑ʳ-refl' us=ys
  ... | inr (inr (ys⊏us , _)) = absurd (rfactor-maximal us us⊏xs+ys ys⊏us us-lyn)

open is-right-standard

Dually, a pair of words form a left standard pair if they are both nonempty, and is the largest left factor of that is Lyndon.

record is-left-standard (xs ys : List  A )
  : Type (o  ) where
  constructor standard-factors
  field
    lfactor-nonempty : Nonempty xs
    rfactor-nonempty : Nonempty ys
    lfactor-lyndon  : is-lyndon ys
    lfactor-maximal :  us  us ⊏ˡ (xs ++ ys)  ys ⊏ˡ us  ¬ (is-lyndon us)

open is-left-standard
\ Warning

There is some deep confusion about this in the literature. The original paper by Chen, Lyndon and Fox (Chen, Fox, and Lyndon 1958) uses right standard pairs (though they just refer to them as standard), and Melançon and others keep with this convention. Notably, this is the direction that (Pierre Duval 1983) uses.

The problems start with (Combinatorics on Words 1997), who mentions right standard pairs. However, the book later uses a different notion of Hall set (which is only used in an exercise, 5.4.3 to be precise!).

To make matters more confusing, some authors use left-standard factorisations; see (Perrin and Reutenauer 2018) and (Viennot 1978).

Both of these choices end up leading to the same list of factors, but yield radically different factorisation trees. This in turn makes the proofs wildly different. In particular, right factorisations are much easier to reason about: Lyndon words lex minimal right factors, so less juggling between left and right factors is required.

Unfortunately, Duval’s algorithm naturally uses the left factorisation, as it works left-to-right.

If is a Lyndon right standard pair, then is Lyndon.

right-standard-lfactor-lyndon
  :  {xs ys : List  A }
   is-lyndon (xs ++ ys)
   is-right-standard xs ys
   is-lyndon xs

If is an atom, then it is clearly Lyndon.

right-standard-lfactor-lyndon {xs = x  []} {ys = y  ys} xs+ys-lyn xs+ys-std =
  atom-lyndon x

On to the hard case. Suppose that has at least two elements, and let be a proper right factor of is a proper right factor of and is a proper right factor of so cannot be Lyndon, as is a right standard pair. This means that there must be some right factor of that is lex lesser that

right-standard-lfactor-lyndon {xs = x  x'  xs} {ys = y  ys} xs+ys-lyn xs+ys-std vs vs⊏xs
  using vs+ys⊏xs+ys⊏ʳ-++-extendr vs⊏xs refl
  using ys⊏vs+ys⊏ʳ-++-inr vs (y  ys)
  using ¬vs+ys-lynrfactor-maximal xs+ys-std (vs ++ y  ys) vs+ys⊏xs+ys ys⊏vs+ys
  with not-lyndon-≼ (vs ++ y  ys) ¬vs+ys-lyn
... | (ws , ws-ne , ws⊏vs+ys , ws≺vs+ys) =

From here, we can use the fact that both and are Lyndon to show that

  ≺-trans (≺-++-injr refl) $
  ≺-≼-trans (xs+ys-lyn ws  ws-ne  ws⊏xs+ys) $
  lyndon-⊏ʳ-++-≼-injl (xs+ys-std .rfactor-lyndon) ws⊏vs+ys ws≺vs+ys
  where
    ws⊏xs+ys : ws ⊏ʳ (x  x'  xs ++ y  ys)
    ws⊏xs+ys =
      ⊏ʳ-trans ws⊏vs+ys $
      ⊏ʳ-++-extendr vs⊏xs refl

As a nice corollary, we get that for every right-standard Lyndon pair.

right-standard-factors-≺
  :  {xs ys : List  A }
   is-lyndon (xs ++ ys)
   is-right-standard xs ys
   xs  ys
right-standard-factors-≺ {xs = xs} {ys = ys} xs+ys-lyn xs+ys-std =
  ≺-trans (≺-++-injr  xs+ys-std .rfactor-nonempty  refl) $
  xs+ys-lyn ys  xs+ys-std .rfactor-nonempty  $
  ⊏ʳ-++-inr xs ys  xs+ys-std .lfactor-nonempty 

If for an atom and is lyndon, then is a right standard pair.

∷-≺-right-standard
  :  {ys : List  A }
   (x :  A )
   is-lyndon ys
   x  []  ys
   is-right-standard (x  []) ys

The only interesting part here is showing that is maximal; this follows immediately from asymmetry.

∷-≺-right-standard {ys = y  ys} x ys-lyn x≺ys .lfactor-nonempty = nonempty
∷-≺-right-standard {ys = y  ys} x ys-lyn x≺ys .rfactor-nonempty = nonempty
∷-≺-right-standard {ys = y  ys} x ys-lyn x≺ys .rfactor-lyndon = ys-lyn
∷-≺-right-standard {ys = y  ys} x ys-lyn x≺ys .rfactor-maximal us us⊏x∷ys ys⊏us us-lyn =
  absurd (⊏ʳ-⊑ʳ-asym ys⊏us (⊏ʳ-tailr us⊏x∷ys))

Now for the hardest theorem of the bunch. If is a Lyndon right standard pair, is Lyndon, and then is a right standard pair.

std-++-lyndon-≼
  :  {xs ys zs : List  A }
    Nonempty zs 
   is-lyndon (xs ++ ys)
   is-right-standard xs ys
   is-lyndon zs
   zs  ys
   is-right-standard (xs ++ ys) zs
std-++-lyndon-≼ {xs} {ys} {zs} ys-lyn xs+ys-std zs-lyn zs≼ys =

The proof proceeds by well-founded induction over the proper right factors of aiming to show that is a right standard pair for every nonempty right factor of

  Wf-induction _⊏ʳ_ ⊏ʳ-wf
     z   _ : Nonempty z   z ⊑ʳ xs ++ ys  is-right-standard z zs)
    worker
    (xs ++ ys)
     ++-nonemptyr xs ys (xs+ys-std .rfactor-nonempty) 
    ⊑ʳ-refl
  where

On to the induction. Suppose that is a nonempty right factor of We assumed that is nonempty and Lyndon, so all that remains is to show that is the largest proper right Lyndon factor.

  worker
    :  (us : List  A )
     (∀ vs  vs ⊏ʳ us   Nonempty vs   vs ⊑ʳ xs ++ ys  is-right-standard vs zs)
      Nonempty us 
     us ⊑ʳ xs ++ ys
     is-right-standard us zs
  worker us rec us⊑xs+ys .lfactor-nonempty = auto
  worker us rec us⊑xs+ys .rfactor-nonempty = auto
  worker us rec us⊑xs+ys .rfactor-lyndon = zs-lyn

Aiming for a contradiction, suppose that was a proper right Lyndon factor of and that was a proper right factor of Let denote the corresponding left factor of There are 3 cases to consider:

  1. is a right factor of
  2. is a right factor of

All 3 of these will lead to contradictions!

  worker us rec us⊑xs+ys .rfactor-maximal vs vs⊏us+zs zs⊏vs@(w , ws , ws+zs=vs) vs-lyn =
    case ⊑ʳ-equidiv (⊏ʳ-weaken ws⊏xs+ys) (xs , refl) of λ where
      (lt ws⊏ys _ _)  absurd (¬ws⊏ys ws⊏ys)
      (eq _ ws=ys _)  absurd (¬ws=ys ws=ys)
      (gt _ _ ys⊏ws)  absurd (¬ys⊏ws ys⊏ws)
    where

      ws⊏xs+ys =
        ⊏ʳ-++-restrictl zs zs refl $
        ⊑ʳ-⊏ʳ-trans (⊑ʳ-refl' ws+zs=vs) $
        ⊏ʳ-⊑ʳ-trans vs⊏us+zs $
        ⊑ʳ-++-extendr us⊑xs+ys refl

Case 1 is relatively straightforward. is Lyndon and is a right factor of so we have However, as is Lyndon and is not a prefix of

      ¬ws⊏ys : ¬ (w  ws ⊏ʳ ys)
      ¬ws⊏ys ws⊏ys =
        ≺-asym (vs-lyn zs zs⊏vs) $
        ≼-≺-trans zs≼ys $
        subst₂ _≺_ (++-idr ys) ws+zs=vs $
        ≺-++-extendr [] zs (⊏ʳ-⊑ˡ-asym ws⊏ys) $
        rfactor-lyndon xs+ys-std (w  ws) ws⊏ys

Case 2 is similarly easy: we have

      ¬ws=ys : ¬ (w  ws  ys)
      ¬ws=ys ws=ys =
        ≺-asym (vs-lyn zs zs⊏vs) $
        ≼-≺-trans zs≼ys $
        ≺-≼-trans (≺-++-injr (sym ws=ys)) $
        inr ws+zs=vs

Now for the hard case. The first two steps are relatively straightforward: is the maximal proper right Lyndon factor of so it suffices to show that is Lyndon to get our contradiction. By our previous lemma, it suffices to show that is the left half of a right standard pair. We obtain this standard pair by invoking our inductive hypothesis on to show that is right standard, noting that is a proper right prefix of

Note

I have not been able to find any proofs that explain this step in detail; even the original paper by Chen, Lyndon, and Fox (Chen, Fox, and Lyndon 1958) handwaves this step away, and do not mention well-founded induction at all.

      ¬ys⊏ws : ¬ (ys ⊏ʳ w  ws)
      ¬ys⊏ws ys⊏ws =
        rfactor-maximal xs+ys-std (w  ws) ws⊏xs+ys ys⊏ws $
        right-standard-lfactor-lyndon (subst is-lyndon (sym ws+zs=vs) vs-lyn) $
        rec (w  ws) ws⊏us (⊑ʳ-trans (⊏ʳ-weaken ws⊏us) us⊑xs+ys)
        where
          ws⊏us : w  ws ⊏ʳ us
          ws⊏us = ⊏ʳ-++-restrictl zs zs refl (⊑ʳ-⊏ʳ-trans (⊑ʳ-refl' ws+zs=vs) vs⊏us+zs)

Note that right standard pairs are uniquely defined.

opaque
  right-standard-pair-rfactor-unique
    :  {ls rs ls' rs'}
     is-right-standard ls rs
     is-right-standard ls' rs'
     ls ++ rs  ls' ++ rs'
     rs  rs'
  right-standard-pair-rfactor-unique {ls} {rs} {ls'} {rs'} ls-rs-std ls'-rs'-std ls+rs=ls'+rs' =
    ⊑ʳ-antisym
      (rfactor-⊑ʳ ls'-rs'-std rs
        (++-nonempty-⊏ʳ ls rs (ls' ++ rs') (lfactor-nonempty ls-rs-std) ls+rs=ls'+rs')
        (rfactor-lyndon ls-rs-std ))
      (rfactor-⊑ʳ ls-rs-std rs'
        (++-nonempty-⊏ʳ ls' rs' (ls ++ rs) (lfactor-nonempty ls'-rs'-std) (sym ls+rs=ls'+rs'))
        (rfactor-lyndon ls'-rs'-std))

Lyndon factorisations🔗

Next, we inductively define a type that describes Lyndon factorisations.

data is-lyndon-factorisation (xs : List  A ) : List (List  A )  Type (o  ) where
  nil
    : Empty xs  is-lyndon-factorisation xs []
  factor
    :  {us vs : List  A } {vs* : List (List  A )}
      _ : Nonempty us 
     us ++ vs  xs
     is-lyndon us
     head [] vs*  us
     is-lyndon-factorisation vs vs*
     is-lyndon-factorisation xs (us  vs*)

If is Lyndon, and can be factorised into a sequence of then can be factorised as

lyn-factor-powers
  :  {xs ys ys*}
    Nonempty xs 
   (n : Nat)
   is-lyndon xs
   head [] ys*  xs
   is-lyndon-factorisation ys ys*
   is-lyndon-factorisation (powers xs n ys) (replicate xs n ys*)
lyn-factor-powers {xs} {ys} {ys*} zero xs-lyn xs≼ys* ys-fac =
  ys-fac
lyn-factor-powers {xs} {ys} {ys*} (suc zero) xs-lyn xs≼ys* ys-fac =
  factor refl xs-lyn xs≼ys* ys-fac
lyn-factor-powers {xs} {ys} {ys*} (suc (suc n)) xs-lyn xs≼ys* ys-fac =
 factor refl xs-lyn (inr refl) (lyn-factor-powers (suc n) xs-lyn xs≼ys* ys-fac)

References