Lesson 8 of 15
Splice Sites
Cutting and Pasting RNA
Most genes in complex organisms are split — the protein-coding regions (exons) are interrupted by non-coding sequences (introns). After transcription, introns are removed from the pre-mRNA in a process called splicing.
The cell uses short sequence signals to mark intron boundaries:
- Donor site (intron start): almost always begins with GT
- Acceptor site (intron end): almost always ends with AG
This is called the GT–AG rule.
def find_splice_sites(dna):
donors = [i for i in range(len(dna)-1) if dna[i:i+2] == "GT"]
acceptors = [i for i in range(len(dna)-1) if dna[i:i+2] == "AG"]
return donors, acceptors
def is_canonical_intron(seq):
return seq[:2] == "GT" and seq[-2:] == "AG"
Splicing errors are a major cause of genetic disease. AlphaGenome directly predicts splice site usage — the probability that each GT or AG in the genome is actually used for splicing — a capability with direct clinical applications.
Your Task
Implement find_splice_sites(dna) and is_canonical_intron(seq).
Python runtime loading...
Loading...
Click "Run" to execute your code.