Motif Finding
Transcription Factor Binding
Most of the genome does not code for proteins — it is regulatory DNA. These regions contain short sequences called motifs that serve as landing pads for transcription factors (TF) — proteins that bind DNA and switch genes on or off.
A classic example: the TATA box (motif: TATA) is found about 30 bases before many gene start sites. It recruits the transcription machinery.
Finding where a motif occurs in a genome is a fundamental bioinformatics operation:
def find_motif(seq, motif):
positions = []
for i in range(len(seq) - len(motif) + 1):
if seq[i:i+len(motif)] == motif:
positions.append(i)
return positions
seq = "AATATAATATA"
print(find_motif(seq, "TATA")) # [2, 7]
AlphaGenome predicts transcription factor binding across the whole genome — thousands of different TFs, each recognizing its own sequence motif. It can detect when a single-base mutation creates or destroys a binding site.
Your Task
Implement find_motif(seq, motif) that returns all positions where the motif occurs, and motif_count(seq, motif) that returns the count.