Lesson 12 of 15

LZ Complexity and Run-Length Encoding

Lempel-Ziv Complexity

LZ complexity (LZ78 variant) measures the number of distinct phrases in the LZ78 parsing of a string. It quantifies the structural complexity of a sequence.

LZ78 Parsing Algorithm

Parse the string left to right, building up a phrase one character at a time:

  1. Start with an empty phrase and an empty dictionary
  2. Add the next character to the current phrase
  3. If the phrase is not in the dictionary: add it, increment the count, reset phrase to empty
  4. If the phrase is in the dictionary: keep extending

After the loop, any leftover phrase that wasn't added counts as one more phrase.

def lz_complexity(s):
    if not s:
        return 0
    phrases = set()
    phrase = ""
    c = 0
    for ch in s:
        phrase += ch
        if phrase not in phrases:
            phrases.add(phrase)
            c += 1
            phrase = ""
    if phrase:
        c += 1
    return c

Trace Example: 'aababc'

Stepcharphrasein dict?action
0a'a'noadd 'a', c=1
1a'a'yesextend
2b'ab'noadd 'ab', c=2
3a'a'yesextend
4b'ab'yesextend
5c'abc'noadd 'abc', c=3

Result: 3 phrases

Run-Length Encoding

RLE compresses repeated characters by storing (character, count) pairs:

aaabbc[(a,3),(b,2),(c,1)]\texttt{aaabbc} \to [(\texttt{a}, 3), (\texttt{b}, 2), (\texttt{c}, 1)]

Your Task

Implement:

  • lz_complexity(s) — LZ78 phrase count
  • run_length_encode(s) — list of (char, count) tuples
Python runtime loading...
Loading...
Click "Run" to execute your code.