module StringLab.Data.List.Lyndon {o ℓ} (A : Trichotomous o ℓ) where
Lyndon Words🔗
Let be a trichotomous order. A word is a Lyndon word if it is lexicographically smaller than all of its non-empty proper right factors.
Note that we deviate slightly from other resources, and allow Lyndon words to be empty.
is-lyndon : (xs : List ⌞ A ⌟) → Type (o ⊔ ℓ) is-lyndon xs = ∀ vs → ⦃ Nonempty vs ⦄ → vs ⊏ʳ xs → xs ≺ vs
Properties of Lyndon words🔗
We can immediately observe that Lyndon words are border-free.
lyndon-no-border : ∀ (us xs : List ⌞ A ⌟) → ⦃ Nonempty us ⦄ → is-lyndon xs → us ⊏ˡ xs → us ⊏ʳ xs → ⊥
To see why, suppose that is a border of a Lyndon word By definition, must be a right factor of so However, is also a proper left factor fo so a contradiction!
lyndon-no-border us xs xs-lyn us⊏ˡxs us⊏ʳxs = ≺-asym (⊏ˡ-≺ us⊏ˡxs) (xs-lyn us us⊏ʳxs)
Lyndon words are minimal elements of their conjugacy classes.
lyndon-minimal-conj : ∀ {xs : List ⌞ A ⌟} → is-lyndon xs → (us vs : List ⌞ A ⌟) → ⦃ Nonempty us ⦄ → ⦃ Nonempty vs ⦄ → us ++ vs ≡ xs → xs ≺ vs ++ us lyndon-minimal-conj xs-lyn (u ∷ us) vs us+vs=xs =
The proof is quite elementary. If is a proper factorization of then is a proper right factor, and
≺-++-injl (xs-lyn vs (u , us , us+vs=xs))
It turns out that our previous result is both a neccesary and sufficient condition.
minimal-conj→is-lyndon : (xs : List ⌞ A ⌟) → (∀ (us vs : List ⌞ A ⌟) → ⦃ Nonempty us ⦄ → ⦃ Nonempty vs ⦄ → us ++ vs ≡ xs → xs ≺ vs ++ us) → is-lyndon xs
Unfortunately, this proof is nowhere near as elementary as the previous one. Suppose that is minimal amonst its conjugacy class, and that is a proper right factor of we need to show that
Let denote the left portion of the factorisation of is the minimal element of its conjugacy class, so If is not a border of then we immediately have Therefore, it suffices to show that is not a left factor of
minimal-conj→is-lyndon xs minimal-conj (v ∷ vs) vs⊏xs@(u , us , us+vs=xs) = ≺-++-restrictr [] (u ∷ us) ([ ¬vs=xs , ¬vs⊏xs ] ∘ ⊑ˡ-strengthen) $ ≼-≺-trans (inr (++-idr xs)) $ xs≺vs+us where xs≺vs+us : xs ≺ v ∷ vs ++ u ∷ us xs≺vs+us = minimal-conj (u ∷ us) (v ∷ vs) us+vs=xs
Clearly, so it suffices to show that is not a proper left factor of
¬vs=xs : ¬ (v ∷ vs ≡ xs) ¬vs=xs = ⊏ʳ-irrefl' vs⊏xs
Unfortunately, proving this is somewhat involved. Aiming for a contradiction, suppose that is a left factor of Our plan is to show that this implies that which contradicts our assumption that is the minimal element of its conjugacy class.
¬vs⊏xs : ¬ (v ∷ vs ⊏ˡ xs) ¬vs⊏xs vs⊏xs@(w , ws , vs+ws=xs) = absurd (≺-asym xs≺vs+us vs+us≺xs) where
Let denote the right factor of corresponding to Note that the factorisation is a conjugate of and thus
us+vs≺ws+vs : u ∷ us ++ v ∷ vs ≺ w ∷ ws ++ v ∷ vs us+vs≺ws+vs = ≼-≺-trans (inr us+vs=xs) $ minimal-conj (v ∷ vs) (w ∷ ws) vs+ws=xs
It now suffices to show that is not a left factor of if this where the case, then we’d get that from which in turn yields
¬ws⊑us-suffices : ¬ (w ∷ ws ⊑ˡ u ∷ us) → v ∷ vs ++ u ∷ us ≺ v ∷ vs ++ w ∷ ws ¬ws⊑us-suffices ¬ws⊑us = Equiv.to (++-pres-≺l-≃ (v ∷ vs) (v ∷ vs) refl) $ ≺-++-restrictr (v ∷ vs) (v ∷ vs) ¬ws⊑us us+vs≺ws+vs
This final step is ommited in most resources, but is a total slog. We will not dwell too deeply on the details: what follows is a brute force calculation.
¬ws⊑us : ¬ (w ∷ ws ⊑ˡ u ∷ us) ¬ws⊑us (ts , ws+ts=us) = ≺-irrefl $ ≺-trans ws+xs+ws≺ws+ts+vs+ws $ ≼-≺-trans (inr ws+ts+vs+ws=us+vs+ws) $ ≺-≼-trans (≺-++-extendr (w ∷ ws) (w ∷ ws) ¬us+vs⊑ws+vs us+vs≺ws+vs) (inr ws+vs+ws=ws+xs) where ws+xs+ws≺ws+ts+vs+ws : w ∷ ws ++ xs ≺ w ∷ ws ++ (ts ++ v ∷ vs) ++ w ∷ ws ws+xs+ws≺ws+ts+vs+ws = Equiv.to (++-pres-≺l-≃ (w ∷ ws) (w ∷ ws) refl) $ minimal-conj (w ∷ ws) (ts ++ v ∷ vs) $ w ∷ ws ++ ts ++ v ∷ vs ≡˘⟨ ++-assoc (w ∷ ws) ts (v ∷ vs) ⟩≡˘ (w ∷ ws ++ ts) ++ v ∷ vs ≡⟨ ap (_++ v ∷ vs) ws+ts=us ⟩≡ u ∷ us ++ v ∷ vs ≡⟨ us+vs=xs ⟩≡ xs ∎ ws+ts+vs+ws=us+vs+ws : w ∷ ws ++ (ts ++ v ∷ vs) ++ w ∷ ws ≡ (u ∷ us ++ v ∷ vs) ++ w ∷ ws ws+ts+vs+ws=us+vs+ws = w ∷ ws ++ (ts ++ v ∷ vs) ++ w ∷ ws ≡⟨ ap (w ∷ ws ++_) (++-assoc ts (v ∷ vs) (w ∷ ws)) ⟩≡ w ∷ ws ++ ts ++ v ∷ vs ++ w ∷ ws ≡˘⟨ ++-assoc (w ∷ ws) ts (v ∷ vs ++ w ∷ ws) ⟩≡˘ ⌜ w ∷ ws ++ ts ⌝ ++ v ∷ vs ++ w ∷ ws ≡⟨ ap! ws+ts=us ⟩≡ u ∷ us ++ v ∷ vs ++ w ∷ ws ≡˘⟨ ++-assoc (u ∷ us) (v ∷ vs) (w ∷ ws) ⟩≡˘ (u ∷ us ++ v ∷ vs) ++ w ∷ ws ∎ ¬us+vs⊏ws+vs : ¬ (u ∷ us ++ v ∷ vs ⊏ˡ w ∷ ws ++ v ∷ vs) ¬us+vs⊏ws+vs us+vs⊏ws+vs = Nat.<-irrefl (ap length (us+vs=xs ∙ sym vs+ws=xs)) $ Nat.≤-trans (⊏ˡ-length-< us+vs⊏ws+vs) $ Nat.≤-refl' (length-conj (w ∷ ws) (v ∷ vs)) ¬us+vs⊑ws+vs : ¬ (u ∷ us ++ v ∷ vs ⊑ˡ w ∷ ws ++ v ∷ vs) ¬us+vs⊑ws+vs = [ ≺-irrefl' us+vs≺ws+vs , ¬us+vs⊏ws+vs ] ∘ ⊑ˡ-strengthen ws+vs+ws=ws+xs : (w ∷ ws ++ v ∷ vs) ++ w ∷ ws ≡ w ∷ ws ++ xs ws+vs+ws=ws+xs = (w ∷ ws ++ v ∷ vs) ++ w ∷ ws ≡⟨ ++-assoc (w ∷ ws) (v ∷ vs) (w ∷ ws) ⟩≡ w ∷ ws ++ v ∷ vs ++ w ∷ ws ≡⟨ ap (w ∷ ws ++_) vs+ws=xs ⟩≡ w ∷ ws ++ xs ∎
If we put everything together, we get our contradictory proof that completing the proof.
vs+us≺xs : v ∷ vs ++ u ∷ us ≺ xs vs+us≺xs = ≺-≼-trans (¬ws⊑us-suffices ¬ws⊑us) (inr vs+ws=xs)
Next, a small technical lemma. Let be a proper right factor where is Lyndon. if then
lyndon-⊏ʳ-++-≺-projl : ∀ {xs ys zs : List ⌞ A ⌟} → is-lyndon zs → xs ⊏ʳ ys ++ zs → ys ≺ xs → ys ++ zs ≺ xs
Aiming for a contradiction, suppose that Clearly so we can focus our attention on the case where
Either or is a proper left factor of and the corresponding right factor is smaller than The first case immediately contradicts our assumption, and the second case follows from our assumption that is lyndon.
lyndon-⊏ʳ-++-≺-projl {xs = xs} {ys = ys} {zs = zs} zs-lyn xs⊏ys+zs ys≺xs = ¬≽→≺ (ys ++ zs) xs λ where (inr xs=ys+zs) → absurd (⊏ʳ-⊑ˡ-asym xs⊏ys+zs (⊑ˡ-refl' (sym xs=ys+zs))) (inl xs≺ys+zs) → case ≺-split-++ xs ys zs xs≺ys+zs of λ where (inl xs≼ys) → absurd (≺→¬≽ ys xs ys≺xs xs≼ys) (inr (ys⊏xs , us≺zs)) → absurd $ᵢ ≺-asym us≺zs $ zs-lyn (⊏ˡ-factor ys⊏xs) $ ⊏ʳ-++-restrictr-⊑ˡ xs ys zs xs⊏ys+zs (⊏ˡ-weaken ys⊏xs)
From this, we can deduce that if with nonempty and Lyndon, then
lyndon-++-≺-projl : ∀ {xs ys : List ⌞ A ⌟} → ⦃ Nonempty xs ⦄ → is-lyndon ys → xs ≺ ys → xs ++ ys ≺ ys lyndon-++-≺-projl {xs = xs} {ys = ys} ys-lyn xs≺ys = lyndon-⊏ʳ-++-≺-projl ys-lyn (⊏ʳ-++-inr xs ys) xs≺ys
If is a proper right factor of with and is Lyndon, then
lyndon-⊏ʳ-++-≼-injl : ∀ {xs ys zs : List ⌞ A ⌟} → is-lyndon zs → xs ⊏ʳ ys ++ zs → xs ≺ ys ++ zs → xs ≼ ys lyndon-⊏ʳ-++-≼-injl {xs} {ys} {zs} zs-lyn xs⊏ys+zs xs≺ys+zs with ≺-tri xs ys ... | lt xs≺ys _ _ = inl xs≺ys ... | eq _ xs=ys _ = inr xs=ys ... | gt _ _ ys≺xs = absurd $ᵢ ≺-asym xs≺ys+zs (lyndon-⊏ʳ-++-≺-projl zs-lyn xs⊏ys+zs ys≺xs)
If all of and are Lyndon and is nonempty, then
lyndon-++-≺ : ∀ {xs ys : List ⌞ A ⌟} → ⦃ Nonempty ys ⦄ → is-lyndon xs → is-lyndon ys → is-lyndon (xs ++ ys) → xs ≺ ys lyndon-++-≺ {xs = []} {ys = y ∷ ys} xs-lyn ys-lyn xs+ys-lyn = ≺-nil lyndon-++-≺ {xs = x ∷ xs} {ys = y ∷ ys} xs-lyn ys-lyn xs+ys-lyn = ≺-trans (≺-++-injr refl) $ xs+ys-lyn (y ∷ ys) (⊏ʳ-++-inr (x ∷ xs) (y ∷ ys))
Note that all powers of nonempty Lyndon words greater than 2 are not Lyndon.
powers-not-lyndon : ∀ {xs} → ⦃ Nonempty xs ⦄ → (k : Nat) → is-lyndon xs → ¬ (is-lyndon (powers xs (2 + k) [])) powers-not-lyndon {xs = xs} zero xs-lyn xs^k-lyn = ≺-irrefl $ lyndon-++-≺ xs-lyn xs-lyn $ subst is-lyndon (ap (xs ++_) (++-idr xs)) xs^k-lyn powers-not-lyndon {xs = xs} (suc k) xs-lyn xs^k-lyn = powers-not-lyndon k xs-lyn λ vs vs⊏x^k → ≺-trans (≺-++-injr refl) $ ≼-≺-trans (inr (sym (powers-comm xs (2 + k)))) $ (xs^k-lyn vs (⊏ʳ-trans vs⊏x^k (⊏ʳ-++-inr xs (powers xs (2 + k) []))))
If is lyndon, then the first element is less than or equal to every other index.
lyndon-head-≤ : ∀ {xs : List ⌞ A ⌟} → is-lyndon xs → ∀ i → head (xs ! i) xs ≤ (xs ! i)
The proof is easy but cute. The key idea is that every index is associated to a a right factor of if is lyndon, then none of those can be larger than itself!
lyndon-head-≤ {xs = x ∷ xs} xs-lyn fzero = inr refl lyndon-head-≤ {xs = x ∷ xs} xs-lyn (fsuc i) = ¬>→≤ x ((x ∷ xs) ! fsuc i) λ xs[i]<x → ≺-asym (drop-head-≺ xs (x ∷ xs) i xs[i]<x) (xs-lyn (drop (suc (Fin.to-nat i)) (x ∷ xs)) ⦃ drop-to-nat-nonempty xs i ⦄ (drop-suc-⊏ʳ (Fin.to-nat i) (x ∷ xs)))
Indexing can be a bit tedious to work with, so we also provide a more specialized forms of the previous lemma.
lyndon-head-++-≤ : ∀ (xs ys : List ⌞ A ⌟) → ⦃ _ : Nonempty xs ⦄ ⦃ _ : Nonempty ys ⦄ → is-lyndon (xs ++ ys) → head⁺ xs ≤ head⁺ ys lyndon-head-++-≤ (x ∷ xs) (y ∷ ys) xs+ys-lyn = ¬>→≤ x y λ y<x → ≺-asym (xs+ys-lyn (y ∷ ys) (⊏ʳ-++-inr (x ∷ xs) (y ∷ ys))) (≺-∷-head y<x) lyndon-head-last-≤ : ∀ (xs : List ⌞ A ⌟) → ⦃ _ : Nonempty xs ⦄ → is-lyndon xs → head⁺ xs ≤ last⁺ xs lyndon-head-last-≤ (x ∷ []) xs-lyn = inr refl lyndon-head-last-≤ (x ∷ x' ∷ xs) xs-lyn = ¬>→≤ x (last xs x') λ last<head → ≺-asym (xs-lyn (last xs x' ∷ []) (last-nonempty-⊏ʳ x (x' ∷ xs) )) (≺-∷-head last<head)
Closure properties of Lyndon words🔗
The empty word is Lyndon.
empty-lyndon : ∀ (xs : List ⌞ A ⌟) → ⦃ _ : Empty xs ⦄ → is-lyndon xs empty-lyndon [] (v ∷ vs) vs⊏[] = absurd (⊏ʳ-strict-bot (v ∷ vs) vs⊏[])
Any atomic word is Lyndon.
atom-lyndon : ∀ (x : ⌞ A ⌟) → is-lyndon (x ∷ []) atom-lyndon x (v ∷ vs) vs⊏x = absurd (⊑ʳ-strict-bot (⊏ʳ-tailr vs⊏x))
If and are both Lyndon, and then is Lyndon.
++-≺-lyndon : ∀ {xs ys : List ⌞ A ⌟} → is-lyndon xs → is-lyndon ys → xs ≺ ys → is-lyndon (xs ++ ys)
If is empty, then and Lyndon by our assumption.
++-≺-lyndon {xs = []} {ys = ys} xs-lyn ys-lyn xs≺ys vs vs⊏xs+ys = ys-lyn vs vs⊏xs+ys
Let be a proper right factor of we ened to show that There are 3 cases to consider:
- is a proper right factor of this immediately follows from our assumption that is Lyndon.
- is equal this also follows from our assumption that is Lyndon.
- is a proper right factor of this follows from our assumption that is lyndon.
++-≺-lyndon {xs = x ∷ xs} {ys = ys} xs-lyn ys-lyn xs≺ys vs vs⊏xs+ys = case ⊏ʳ-split-++ vs (x ∷ xs) ys vs⊏xs+ys of λ where (inl vs⊏ys) → ≺-trans (lyndon-++-≺-projl ys-lyn xs≺ys) $ ys-lyn vs vs⊏ys (inr (inl vs=ys)) → ≺-≼-trans (lyndon-++-≺-projl ys-lyn xs≺ys) $ inr (sym vs=ys) (inr (inr ((w , ws , ws+vs=ys) , ws⊏xs))) → ≺-≼-trans (≺-++-extendr ys ys (⊏ʳ-⊑ˡ-asym ws⊏xs) (xs-lyn (w ∷ ws) ws⊏xs)) $ inr ws+vs=ys
We can generalize the previous theorem to powers of Lyndon words.
powers-≺-lyndon : ∀ {xs ys : List ⌞ A ⌟} → (k : Nat) → is-lyndon xs → is-lyndon ys → xs ≺ ys → is-lyndon (powers xs (suc k) ys) powers-≺-lyndon {xs = xs} {ys = y ∷ ys} zero xs-lyn ys-lyn xs≺ys = ++-≺-lyndon xs-lyn ys-lyn xs≺ys powers-≺-lyndon {xs = xs} {ys = y ∷ ys} (suc k) xs-lyn ys-lyn xs≺ys = ++-≺-lyndon xs-lyn (powers-≺-lyndon k xs-lyn ys-lyn xs≺ys) (≺-++-injr ⦃ powers-nonemptyr xs k (y ∷ ys) nonempty ⦄ refl)
The following lemma is due to Duval: if is Lyndon, and is an atom that is strictly larger than then is Lyndon.
++-∷r-≺-lyndon : ∀ {xs ys : List ⌞ A ⌟} {z} → ⦃ Nonempty ys ⦄ → is-lyndon (xs ++ ys) → ys ≺ z ∷ [] → is-lyndon (xs ∷r z)
If is empty, then this is trivially true.
++-∷r-≺-lyndon {xs = []} {ys = ys} {z = z} xs+ys-lyn ys≺z = atom-lyndon z
Now for the hard case. Let be a proper right factor of There are 3 cases to consider:
- is a proper right factor of
- is proper right factor of
++-∷r-≺-lyndon {xs = x ∷ xs} {ys = y ∷ ys} {z = z} xs+ys-lyn ys≺z@(≺-∷-head y<z) vs vs⊏xs∷z = case ⊏ʳ-split-++ vs (x ∷ xs) (z ∷ []) vs⊏xs∷z of λ where
The first case is trivially false, as there are no nonempty proper right factors of atoms.
(inl vs⊏z) → absurd (nonempty-¬⊏ʳ-atom auto vs⊏z)
The second case is a bit tricker, but only just. First, note that as is Lyndon. Moreover, which finishes the proof.
(inr (inl vs=z)) → let x<z = ≤-<-trans (lyndon-head-++-≤ (x ∷ xs) (y ∷ ys) xs+ys-lyn) y<z in ≺-≼-trans (≺-∷-head x<z) (inr (sym vs=z))
Now for the hard case. Let where is a proper right factor of we need to show that
First, observe that
is Lyndon, so it suffices to show that
by lyndon-⊏ʳ-++-≺-projl. Moreover,
is Lyndon, so
This proof is different from the one presented by Duval, which
contains has a mistake in this case. His proof attempts to use what
ammmounts to ≺-++-extendr, which would
require him to prove that
is not a prefix of
instead, he proves that
cannot be a prefix of
(inr (inr (z⊏vs@(w , ws , ws+z=vs) , ws⊏xs))) → lyndon-⊏ʳ-++-≺-projl {ys = x ∷ xs} (atom-lyndon z) vs⊏xs∷z $ ≺-trans (≺-++-injr refl) $ ≺-trans (xs+ys-lyn (w ∷ ws ++ y ∷ ys) (⊏ʳ-++-extendr ws⊏xs refl)) $ ≺-≼-trans (≺-++-extendl (w ∷ ws) (w ∷ ws) refl ys≺z) $ (inr ws+z=vs)
Following the usual theme, we can generalize our previous lemma to powers.
powers-∷r-≺-lyndon : ∀ {xs xs' : List ⌞ A ⌟} {x' : ⌞ A ⌟} → (k : Nat) → is-lyndon xs → (xs'⊏xs : xs' ⊏ˡ xs) → xs'⊏xs .fst < x' → is-lyndon (powers xs (suc k) (xs' ∷r x'))
The proof is basically just cobbling together previous lemmas, so we shall not dwell on it for too long.
powers-∷r-≺-lyndon {xs = xs} {xs' = xs'} {x' = x'} k xs-lyn xs'⊏xs@(u , us , xs'+us=xs) u<x' = powers-≺-lyndon k xs-lyn xs'∷x'-lyn xs≺xs'∷x' where xs'∷x'-lyn : is-lyndon (xs' ∷r x') xs'∷x'-lyn = ++-∷r-≺-lyndon (subst is-lyndon (sym xs'+us=xs) xs-lyn) (≺-∷-head u<x') xs≺xs'∷x' : xs ≺ xs' ∷r x' xs≺xs'∷x' = ≼-≺-trans (inr (sym xs'+us=xs)) $ ≺-++-extendl xs' xs' refl (≺-∷-head u<x')
In a similar vein, if we have a sesquipower where equals the character after the end of the prefix then is not Lyndon.
powers-∷r-≡-not-lyndon : ∀ {xs xs' : List ⌞ A ⌟} {x' : ⌞ A ⌟} → (k : Nat) → (xs'⊏xs : xs' ⊏ˡ xs) → xs'⊏xs .fst ≡ x' → ¬ (is-lyndon (powers xs (suc k) (xs' ∷r x')))
The proof is conceptually simple: has a border, so it cannot be Lyndon. The formal proof does take a bit more work, but it’s mostly busywork.
powers-∷r-≡-not-lyndon {xs = xs} {xs' = xs'} {x' = x'} k xs'⊏xs@(u , us , xs'+us=xs) u=x' x^k-lyn = lyndon-no-border (xs' ∷r x') (powers xs (suc k) (xs' ∷r x')) x^k-lyn xs'∷x'⊏ˡxs^k+xs'∷x' xs'∷x'⊏ʳxs^k+xs'∷x' where xs'∷x'⊏ˡxs^k+xs'∷x' : xs' ∷r x' ⊏ˡ powers xs (suc k) (xs' ∷r x') xs'∷x'⊏ˡxs^k+xs'∷x' = ++-nonempty-⊏ˡ (xs' ∷r x') (us ++ powers xs k (xs' ∷r x')) (powers xs (suc k) (xs' ∷r x')) (++-nonemptyr us (powers xs k (xs' ∷r x')) (powers-nonemptyr xs k (xs' ∷r x') auto)) $ xs' ∷r x' ++ us ++ powers xs k (xs' ∷r x') ≡˘⟨ ++-assoc (xs' ∷r x') us (powers xs k (xs' ∷r x')) ⟩≡˘ ⌜ xs' ∷r x' ++ us ⌝ ++ powers xs k (xs' ∷r x') ≡⟨ ap! (++-assoc xs' (x' ∷ []) us) ⟩≡ (xs' ++ ⌜ x' ⌝ ∷ us) ++ powers xs k (xs' ∷r x') ≡⟨ ap! (sym u=x') ⟩≡ ⌜ xs' ++ u ∷ us ⌝ ++ powers xs k (xs' ∷r x') ≡⟨ ap! xs'+us=xs ⟩≡ powers xs (suc k) (xs' ∷r x') ∎ xs'∷x'⊏ʳxs^k+xs'∷x' : xs' ∷r x' ⊏ʳ powers xs (suc k) (xs' ∷r x') xs'∷x'⊏ʳxs^k+xs'∷x' = ⊏ʳ-powers-inr xs k (xs' ∷r x') ⦃ nonempty-⊏ˡ xs'⊏xs ⦄
Likewise, if we have a sesquipower where is smaller than the character after the end of the prefix then is not Lyndon.
powers-∷r->-not-lyndon : ∀ {xs xs' : List ⌞ A ⌟} {x' : ⌞ A ⌟} → (k : Nat) → (xs'⊏xs : xs' ⊏ˡ xs) → x' < xs'⊏xs .fst → ¬ (is-lyndon (powers xs (suc k) (xs' ∷r x')))
Aiming for a contradiction, suppose that is Lyndon. Moreover, let denote the right factor of corresponding to
Note that is a right factor of so Moreover, we have:
This lets us conclude that However, this immediately implies that which contradicts our hypothesis that is smaller than the character after
powers-∷r->-not-lyndon {xs = xs} {xs' = xs'} {x' = x'} k xs'⊏xs@(u , us , xs'+us=xs) x'<u xs^k-lyn = <-asym x'<u u<x' where instance Nonempty-xs^k+xs'∷x' : Nonempty (powers xs k (xs' ∷r x')) Nonempty-xs^k+xs'∷x' = powers-nonemptyr xs k (xs' ∷r x') auto xs'⊏xs^k+xs'∷x' : xs' ⊏ˡ powers xs (suc k) (xs' ∷r x') xs'⊏xs^k+xs'∷x' = ⊏ˡ-trans xs'⊏xs $ ⊏ˡ-++-inl xs (powers xs k (xs' ∷r x')) xs^k+xs'∷xs'≺xs'∷x' : powers xs (suc k) (xs' ∷r x') ≺ (xs' ∷r x') xs^k+xs'∷xs'≺xs'∷x' = xs^k-lyn (xs' ∷r x') $ ⊏ʳ-powers-inr xs k (xs' ∷r x') ⦃ nonempty-⊏ˡ xs'⊏xs ⦄ u<x' : u < x' u<x' = ≺-atom-nonempty $ ≺-++-restrict-⊏ˡ xs'⊏xs^k+xs'∷x' xs^k+xs'∷xs'≺xs'∷x'
Decidablity🔗
The following is a näive algorithm for checking if a word is Lyndon, where we check every single suffix in order to see if it is lexicographically larger than This is only required for our proofs, so performance is not too much of a concern: a better version would construct a suffix array.
is-lyndon? : ∀ (xs : List ⌞ A ⌟) → is-lyndon xs ⊎ Σ[ vs ∈ List ⌞ A ⌟ ] (Nonempty vs × vs ⊏ʳ xs × vs ≺ xs) is-lyndon? [] = inl (empty-lyndon []) is-lyndon? (x ∷ xs) with ≺-on-suffixes xs (x ∷ xs) (x , [] , refl) ... | inl xs-lyn = inl λ us us⊏xs → xs-lyn us (⊏ʳ-tailr us⊏xs) ... | inr (vs , vs-ne , vs⊏xs , vs≺xs) = inr (vs , vs-ne , ⊑ʳ-consr vs⊏xs , vs≺xs) instance Dec-is-lyndon : ∀ {xs} → Dec (is-lyndon xs) Dec-is-lyndon {xs = xs} with is-lyndon? xs ... | inl xs-lyn = yes xs-lyn ... | inr (vs , vs-ne , vs⊏xs , vs≺xs) = no λ xs-lyn → ≺-asym vs≺xs (xs-lyn vs ⦃ vs-ne ⦄ vs⊏xs)
The main reason we need to check if a word is lyndon is the following result: if a word is not lyndon, then there is a larger suffix.
not-lyndon-≼ : ∀ (xs : List ⌞ A ⌟) → ¬ (is-lyndon xs) → Σ[ vs ∈ List ⌞ A ⌟ ] (Nonempty vs × vs ⊏ʳ xs × vs ≺ xs) not-lyndon-≼ [] ¬xs-lyn = absurd (¬xs-lyn (empty-lyndon [])) not-lyndon-≼ (x ∷ xs) ¬xs-lyn with ≺-on-suffixes xs (x ∷ xs) (x , [] , refl) ... | inl xs-lyn = absurd (¬xs-lyn (λ us us⊏xs → xs-lyn us (⊏ʳ-tailr us⊏xs))) ... | inr (vs , vs-ne , vs⊑x∷xs , vs≺x∷xs) = vs , vs-ne , ⊑ʳ-consr vs⊑x∷xs , vs≺x∷xs
Standard factorizations🔗
A pair of words for form a right standard pair if the are both nonempty, and is the largest right factor of that is Lyndon. We say that a right standard pair is Lyndon if is Lyndon.
record is-right-standard (xs ys : List ⌞ A ⌟) : Type (o ⊔ ℓ) where constructor standard-factors field lfactor-nonempty : Nonempty xs rfactor-nonempty : Nonempty ys rfactor-lyndon : is-lyndon ys rfactor-maximal : ∀ us → us ⊏ʳ (xs ++ ys) → ys ⊏ʳ us → ¬ (is-lyndon us) rfactor-⊑ʳ : ∀ us → us ⊏ʳ (xs ++ ys) → is-lyndon us → us ⊑ʳ ys rfactor-⊑ʳ us us⊏xs+ys us-lyn with ⊏ʳ-split-++ us xs ys us⊏xs+ys ... | inl us⊑ys = ⊏ʳ-weaken us⊑ys ... | inr (inl us=ys) = ⊑ʳ-refl' us=ys ... | inr (inr (ys⊏us , _)) = absurd (rfactor-maximal us us⊏xs+ys ys⊏us us-lyn) open is-right-standard
Dually, a pair of words form a left standard pair if they are both nonempty, and is the largest left factor of that is Lyndon.
record is-left-standard (xs ys : List ⌞ A ⌟) : Type (o ⊔ ℓ) where constructor standard-factors field lfactor-nonempty : Nonempty xs rfactor-nonempty : Nonempty ys lfactor-lyndon : is-lyndon ys lfactor-maximal : ∀ us → us ⊏ˡ (xs ++ ys) → ys ⊏ˡ us → ¬ (is-lyndon us) open is-left-standard
There is some deep confusion about this in the literature. The original paper by Chen, Lyndon and Fox (Chen, Fox, and Lyndon 1958) uses right standard pairs (though they just refer to them as standard), and Melançon and others keep with this convention. Notably, this is the direction that (Pierre Duval 1983) uses.
The problems start with (Combinatorics on Words 1997), who mentions right standard pairs. However, the book later uses a different notion of Hall set (which is only used in an exercise, 5.4.3 to be precise!).
To make matters more confusing, some authors use left-standard factorisations; see (Perrin and Reutenauer 2018) and (Viennot 1978).
Both of these choices end up leading to the same list of factors, but yield radically different factorisation trees. This in turn makes the proofs wildly different. In particular, right factorisations are much easier to reason about: Lyndon words lex minimal right factors, so less juggling between left and right factors is required.
Unfortunately, Duval’s algorithm naturally uses the left factorisation, as it works left-to-right.
If is a Lyndon right standard pair, then is Lyndon.
right-standard-lfactor-lyndon : ∀ {xs ys : List ⌞ A ⌟} → is-lyndon (xs ++ ys) → is-right-standard xs ys → is-lyndon xs
If is an atom, then it is clearly Lyndon.
right-standard-lfactor-lyndon {xs = x ∷ []} {ys = y ∷ ys} xs+ys-lyn xs+ys-std = atom-lyndon x
On to the hard case. Suppose that has at least two elements, and let be a proper right factor of is a proper right factor of and is a proper right factor of so cannot be Lyndon, as is a right standard pair. This means that there must be some right factor of that is lex lesser that
right-standard-lfactor-lyndon {xs = x ∷ x' ∷ xs} {ys = y ∷ ys} xs+ys-lyn xs+ys-std vs vs⊏xs using vs+ys⊏xs+ys ← ⊏ʳ-++-extendr vs⊏xs refl using ys⊏vs+ys ← ⊏ʳ-++-inr vs (y ∷ ys) using ¬vs+ys-lyn ← rfactor-maximal xs+ys-std (vs ++ y ∷ ys) vs+ys⊏xs+ys ys⊏vs+ys with not-lyndon-≼ (vs ++ y ∷ ys) ¬vs+ys-lyn ... | (ws , ws-ne , ws⊏vs+ys , ws≺vs+ys) =
From here, we can use the fact that both and are Lyndon to show that
≺-trans (≺-++-injr refl) $ ≺-≼-trans (xs+ys-lyn ws ⦃ ws-ne ⦄ ws⊏xs+ys) $ lyndon-⊏ʳ-++-≼-injl (xs+ys-std .rfactor-lyndon) ws⊏vs+ys ws≺vs+ys where ws⊏xs+ys : ws ⊏ʳ (x ∷ x' ∷ xs ++ y ∷ ys) ws⊏xs+ys = ⊏ʳ-trans ws⊏vs+ys $ ⊏ʳ-++-extendr vs⊏xs refl
As a nice corollary, we get that for every right-standard Lyndon pair.
right-standard-factors-≺ : ∀ {xs ys : List ⌞ A ⌟} → is-lyndon (xs ++ ys) → is-right-standard xs ys → xs ≺ ys right-standard-factors-≺ {xs = xs} {ys = ys} xs+ys-lyn xs+ys-std = ≺-trans (≺-++-injr ⦃ xs+ys-std .rfactor-nonempty ⦄ refl) $ xs+ys-lyn ys ⦃ xs+ys-std .rfactor-nonempty ⦄ $ ⊏ʳ-++-inr xs ys ⦃ xs+ys-std .lfactor-nonempty ⦄
If for an atom and is lyndon, then is a right standard pair.
∷-≺-right-standard : ∀ {ys : List ⌞ A ⌟} → (x : ⌞ A ⌟) → is-lyndon ys → x ∷ [] ≺ ys → is-right-standard (x ∷ []) ys
The only interesting part here is showing that is maximal; this follows immediately from asymmetry.
∷-≺-right-standard {ys = y ∷ ys} x ys-lyn x≺ys .lfactor-nonempty = nonempty ∷-≺-right-standard {ys = y ∷ ys} x ys-lyn x≺ys .rfactor-nonempty = nonempty ∷-≺-right-standard {ys = y ∷ ys} x ys-lyn x≺ys .rfactor-lyndon = ys-lyn ∷-≺-right-standard {ys = y ∷ ys} x ys-lyn x≺ys .rfactor-maximal us us⊏x∷ys ys⊏us us-lyn = absurd (⊏ʳ-⊑ʳ-asym ys⊏us (⊏ʳ-tailr us⊏x∷ys))
Now for the hardest theorem of the bunch. If is a Lyndon right standard pair, is Lyndon, and then is a right standard pair.
std-++-lyndon-≼ : ∀ {xs ys zs : List ⌞ A ⌟} → ⦃ Nonempty zs ⦄ → is-lyndon (xs ++ ys) → is-right-standard xs ys → is-lyndon zs → zs ≼ ys → is-right-standard (xs ++ ys) zs std-++-lyndon-≼ {xs} {ys} {zs} ys-lyn xs+ys-std zs-lyn zs≼ys =
The proof proceeds by well-founded induction over the proper right factors of aiming to show that is a right standard pair for every nonempty right factor of
Wf-induction _⊏ʳ_ ⊏ʳ-wf (λ z → ⦃ _ : Nonempty z ⦄ → z ⊑ʳ xs ++ ys → is-right-standard z zs) worker (xs ++ ys) ⦃ ++-nonemptyr xs ys (xs+ys-std .rfactor-nonempty) ⦄ ⊑ʳ-refl where
On to the induction. Suppose that is a nonempty right factor of We assumed that is nonempty and Lyndon, so all that remains is to show that is the largest proper right Lyndon factor.
worker : ∀ (us : List ⌞ A ⌟) → (∀ vs → vs ⊏ʳ us → ⦃ Nonempty vs ⦄ → vs ⊑ʳ xs ++ ys → is-right-standard vs zs) → ⦃ Nonempty us ⦄ → us ⊑ʳ xs ++ ys → is-right-standard us zs worker us rec us⊑xs+ys .lfactor-nonempty = auto worker us rec us⊑xs+ys .rfactor-nonempty = auto worker us rec us⊑xs+ys .rfactor-lyndon = zs-lyn
Aiming for a contradiction, suppose that was a proper right Lyndon factor of and that was a proper right factor of Let denote the corresponding left factor of There are 3 cases to consider:
- is a right factor of
- is a right factor of
All 3 of these will lead to contradictions!
worker us rec us⊑xs+ys .rfactor-maximal vs vs⊏us+zs zs⊏vs@(w , ws , ws+zs=vs) vs-lyn = case ⊑ʳ-equidiv (⊏ʳ-weaken ws⊏xs+ys) (xs , refl) of λ where (lt ws⊏ys _ _) → absurd (¬ws⊏ys ws⊏ys) (eq _ ws=ys _) → absurd (¬ws=ys ws=ys) (gt _ _ ys⊏ws) → absurd (¬ys⊏ws ys⊏ws) where ws⊏xs+ys = ⊏ʳ-++-restrictl zs zs refl $ ⊑ʳ-⊏ʳ-trans (⊑ʳ-refl' ws+zs=vs) $ ⊏ʳ-⊑ʳ-trans vs⊏us+zs $ ⊑ʳ-++-extendr us⊑xs+ys refl
Case 1 is relatively straightforward. is Lyndon and is a right factor of so we have However, as is Lyndon and is not a prefix of
¬ws⊏ys : ¬ (w ∷ ws ⊏ʳ ys) ¬ws⊏ys ws⊏ys = ≺-asym (vs-lyn zs zs⊏vs) $ ≼-≺-trans zs≼ys $ subst₂ _≺_ (++-idr ys) ws+zs=vs $ ≺-++-extendr [] zs (⊏ʳ-⊑ˡ-asym ws⊏ys) $ rfactor-lyndon xs+ys-std (w ∷ ws) ws⊏ys
Case 2 is similarly easy: we have
¬ws=ys : ¬ (w ∷ ws ≡ ys) ¬ws=ys ws=ys = ≺-asym (vs-lyn zs zs⊏vs) $ ≼-≺-trans zs≼ys $ ≺-≼-trans (≺-++-injr (sym ws=ys)) $ inr ws+zs=vs
Now for the hard case. The first two steps are relatively straightforward: is the maximal proper right Lyndon factor of so it suffices to show that is Lyndon to get our contradiction. By our previous lemma, it suffices to show that is the left half of a right standard pair. We obtain this standard pair by invoking our inductive hypothesis on to show that is right standard, noting that is a proper right prefix of
I have not been able to find any proofs that explain this step in detail; even the original paper by Chen, Lyndon, and Fox (Chen, Fox, and Lyndon 1958) handwaves this step away, and do not mention well-founded induction at all.
¬ys⊏ws : ¬ (ys ⊏ʳ w ∷ ws) ¬ys⊏ws ys⊏ws = rfactor-maximal xs+ys-std (w ∷ ws) ws⊏xs+ys ys⊏ws $ right-standard-lfactor-lyndon (subst is-lyndon (sym ws+zs=vs) vs-lyn) $ rec (w ∷ ws) ws⊏us (⊑ʳ-trans (⊏ʳ-weaken ws⊏us) us⊑xs+ys) where ws⊏us : w ∷ ws ⊏ʳ us ws⊏us = ⊏ʳ-++-restrictl zs zs refl (⊑ʳ-⊏ʳ-trans (⊑ʳ-refl' ws+zs=vs) vs⊏us+zs)
Note that right standard pairs are uniquely defined.
opaque right-standard-pair-rfactor-unique : ∀ {ls rs ls' rs'} → is-right-standard ls rs → is-right-standard ls' rs' → ls ++ rs ≡ ls' ++ rs' → rs ≡ rs' right-standard-pair-rfactor-unique {ls} {rs} {ls'} {rs'} ls-rs-std ls'-rs'-std ls+rs=ls'+rs' = ⊑ʳ-antisym (rfactor-⊑ʳ ls'-rs'-std rs (++-nonempty-⊏ʳ ls rs (ls' ++ rs') (lfactor-nonempty ls-rs-std) ls+rs=ls'+rs') (rfactor-lyndon ls-rs-std )) (rfactor-⊑ʳ ls-rs-std rs' (++-nonempty-⊏ʳ ls' rs' (ls ++ rs) (lfactor-nonempty ls'-rs'-std) (sym ls+rs=ls'+rs')) (rfactor-lyndon ls'-rs'-std))
Lyndon factorisations🔗
Next, we inductively define a type that describes Lyndon factorisations.
data is-lyndon-factorisation (xs : List ⌞ A ⌟) : List (List ⌞ A ⌟) → Type (o ⊔ ℓ) where nil : Empty xs → is-lyndon-factorisation xs [] factor : ∀ {us vs : List ⌞ A ⌟} {vs* : List (List ⌞ A ⌟)} → ⦃ _ : Nonempty us ⦄ → us ++ vs ≡ xs → is-lyndon us → head [] vs* ≼ us → is-lyndon-factorisation vs vs* → is-lyndon-factorisation xs (us ∷ vs*)
If is Lyndon, and can be factorised into a sequence of then can be factorised as
lyn-factor-powers : ∀ {xs ys ys*} → ⦃ Nonempty xs ⦄ → (n : Nat) → is-lyndon xs → head [] ys* ≼ xs → is-lyndon-factorisation ys ys* → is-lyndon-factorisation (powers xs n ys) (replicate xs n ys*) lyn-factor-powers {xs} {ys} {ys*} zero xs-lyn xs≼ys* ys-fac = ys-fac lyn-factor-powers {xs} {ys} {ys*} (suc zero) xs-lyn xs≼ys* ys-fac = factor refl xs-lyn xs≼ys* ys-fac lyn-factor-powers {xs} {ys} {ys*} (suc (suc n)) xs-lyn xs≼ys* ys-fac = factor refl xs-lyn (inr refl) (lyn-factor-powers (suc n) xs-lyn xs≼ys* ys-fac)
References
- Chen, K. T., R. H. Fox, and R. C. Lyndon. 1958. “Free Differential Calculus, IV. The Quotient Groups of the Lower Central Series.” Annals of Mathematics 68 (1): 81–95. https://doi.org/10.2307/1970044.
- Combinatorics on Words. 1997. 2nd ed. Cambridge Mathematical Library. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511566097.
- Perrin, Dominique, and Christophe Reutenauer. 2018. “Hall Sets, Lazard Sets and Comma-Free Codes.” Discrete Mathematics 341 (1): 232–43. https://doi.org/10.1016/j.disc.2017.08.034.
- Pierre Duval, Jean. 1983. “Factorizing Words over an Ordered Alphabet.” Journal of Algorithms 4 (4): 363–81. https://doi.org/10.1016/0196-6774(83)90017-2.
- Viennot, Gérard. 1978. Algèbres de Lie Libres Et Monoïdes Libres. Vol. 691. Lecture Notes in Mathematics. Berlin, Heidelberg: Springer. https://doi.org/10.1007/BFb0067950.