Lesson 8 of 15

Splice Sites

Cutting and Pasting RNA

Most genes in complex organisms are split — the protein-coding regions (exons) are interrupted by non-coding sequences (introns). After transcription, introns are removed from the pre-mRNA in a process called splicing.

The cell uses short sequence signals to mark intron boundaries:

  • Donor site (intron start): almost always begins with GT
  • Acceptor site (intron end): almost always ends with AG

This is called the GT–AG rule.

def find_splice_sites(dna):
    donors    = [i for i in range(len(dna)-1) if dna[i:i+2] == "GT"]
    acceptors = [i for i in range(len(dna)-1) if dna[i:i+2] == "AG"]
    return donors, acceptors

def is_canonical_intron(seq):
    return seq[:2] == "GT" and seq[-2:] == "AG"

Splicing errors are a major cause of genetic disease. AlphaGenome directly predicts splice site usage — the probability that each GT or AG in the genome is actually used for splicing — a capability with direct clinical applications.

Your Task

Implement find_splice_sites(dna) and is_canonical_intron(seq).

Python runtime loading...
Loading...
Click "Run" to execute your code.