Lesson 10 of 15

CpG Islands

Promoter Landmarks

A CpG island is a stretch of DNA with unusually high GC content and an abundance of CpG dinucleotides (cytosine followed by guanine). About 70% of human gene promoters — the regulatory switches that turn genes on — overlap with CpG islands.

To find CpG islands, we use a sliding window: compute GC content in a small window, then slide it one position at a time across the sequence:

def sliding_window_gc(seq, window_size):
    results = []
    for i in range(len(seq) - window_size + 1):
        window = seq[i:i+window_size]
        gc = (window.count("G") + window.count("C")) / window_size
        results.append(round(gc, 2))
    return results

This produces a GC profile of the sequence — peaks in the profile mark candidate promoter regions.

AlphaGenome's convolutional layers implicitly learn these kinds of local composition patterns across thousands of training examples, and its transformer layers connect these local patterns to long-range regulatory context.

Your Task

Implement sliding_window_gc(seq, window_size) that returns GC fractions (0.0–1.0) for each window position.

Python runtime loading...
Loading...
Click "Run" to execute your code.