Lesson 4 of 15

Codons

The Genetic Code

DNA is read in groups of three bases called codons. Each codon specifies one amino acid (or a start/stop signal). This is the genetic code.

There are 4³ = 64 possible codons but only 20 amino acids, so the code is redundant — multiple codons can encode the same amino acid.

Two special codons:

  • ATG — the start codon. It marks where a gene begins and codes for Methionine (M).
  • TAA, TAG, TGAstop codons. They signal the end of the protein.
def split_into_codons(seq):
    return [seq[i:i+3] for i in range(0, len(seq) - 2, 3)]

def is_start_codon(codon):
    return codon == "ATG"

def is_stop_codon(codon):
    return codon in ("TAA", "TAG", "TGA")

seq = "ATGCGATAA"
codons = split_into_codons(seq)
print(codons)              # ['ATG', 'CGA', 'TAA']
print(is_start_codon(codons[0]))  # True
print(is_stop_codon(codons[-1]))  # True

AlphaGenome predicts transcription start sites — exactly where in the genome a gene begins to be read. Understanding codons is the first step to understanding those predictions.

Your Task

Implement split_into_codons(seq), is_start_codon(codon), and is_stop_codon(codon).

Python runtime loading...
Loading...
Click "Run" to execute your code.