Lesson 12 of 15
LZ Complexity and Run-Length Encoding
Lempel-Ziv Complexity
LZ complexity (LZ78 variant) measures the number of distinct phrases in the LZ78 parsing of a string. It quantifies the structural complexity of a sequence.
LZ78 Parsing Algorithm
Parse the string left to right, building up a phrase one character at a time:
- Start with an empty phrase and an empty dictionary
- Add the next character to the current phrase
- If the phrase is not in the dictionary: add it, increment the count, reset phrase to empty
- If the phrase is in the dictionary: keep extending
After the loop, any leftover phrase that wasn't added counts as one more phrase.
def lz_complexity(s):
if not s:
return 0
phrases = set()
phrase = ""
c = 0
for ch in s:
phrase += ch
if phrase not in phrases:
phrases.add(phrase)
c += 1
phrase = ""
if phrase:
c += 1
return c
Trace Example: 'aababc'
| Step | char | phrase | in dict? | action |
|---|---|---|---|---|
| 0 | a | 'a' | no | add 'a', c=1 |
| 1 | a | 'a' | yes | extend |
| 2 | b | 'ab' | no | add 'ab', c=2 |
| 3 | a | 'a' | yes | extend |
| 4 | b | 'ab' | yes | extend |
| 5 | c | 'abc' | no | add 'abc', c=3 |
Result: 3 phrases
Run-Length Encoding
RLE compresses repeated characters by storing (character, count) pairs:
Your Task
Implement:
lz_complexity(s)— LZ78 phrase countrun_length_encode(s)— list of(char, count)tuples
Python runtime loading...
Loading...
Click "Run" to execute your code.