Lesson 5 of 15
Open Reading Frames
Finding Genes
An Open Reading Frame (ORF) is a stretch of DNA that could encode a protein. It starts at an ATG start codon and ends at the first in-frame stop codon (TAA, TAG, or TGA).
...AAATGCGATAACC...
↑ start ↑ stop
ATG→CGA→TAA
To find an ORF:
- Scan the sequence for ATG
- From that ATG, read codons in triplets
- Return the sequence from ATG up to and including the first stop codon
def find_orf(seq):
for i in range(len(seq) - 2):
if seq[i:i+3] == "ATG":
for j in range(i, len(seq) - 2, 3):
codon = seq[j:j+3]
if codon in ("TAA", "TAG", "TGA"):
return seq[i:j+3]
return ""
print(find_orf("AAATGCGATAA")) # ATGCGATAA
print(find_orf("ATGTTTGCCTAG")) # ATGTTTGCCTAG
print(find_orf("AAACCC")) # (empty — no ORF)
AlphaGenome predicts where genes start and end across hundreds of different cell types — essentially running this kind of search at genomic scale with learned, cell-type-specific rules.
Your Task
Implement find_orf(seq) that returns the first ORF found, or an empty string if none exists.
Python runtime loading...
Loading...
Click "Run" to execute your code.