Lesson 9 of 15

Motif Finding

Transcription Factor Binding

Most of the genome does not code for proteins — it is regulatory DNA. These regions contain short sequences called motifs that serve as landing pads for transcription factors (TF) — proteins that bind DNA and switch genes on or off.

A classic example: the TATA box (motif: TATA) is found about 30 bases before many gene start sites. It recruits the transcription machinery.

Finding where a motif occurs in a genome is a fundamental bioinformatics operation:

def find_motif(seq, motif):
    positions = []
    for i in range(len(seq) - len(motif) + 1):
        if seq[i:i+len(motif)] == motif:
            positions.append(i)
    return positions

seq = "AATATAATATA"
print(find_motif(seq, "TATA"))  # [2, 7]

AlphaGenome predicts transcription factor binding across the whole genome — thousands of different TFs, each recognizing its own sequence motif. It can detect when a single-base mutation creates or destroys a binding site.

Your Task

Implement find_motif(seq, motif) that returns all positions where the motif occurs, and motif_count(seq, motif) that returns the count.

Python runtime loading...
Loading...
Click "Run" to execute your code.