Lesson 3 of 15

GC Content

Measuring Sequence Composition

GC content is the percentage of bases in a DNA sequence that are either Guanine (G) or Cytosine (C). It is a basic but informative measure of genomic composition.

GC% = (count(G) + count(C)) / total_length × 100

GC content matters for several reasons:

  • Stability: G–C base pairs have three hydrogen bonds (vs. two for A–T), making GC-rich regions more thermally stable.
  • CpG islands: Regions with unusually high GC content and many C–G dinucleotides. They typically mark gene promoters — the switches that turn genes on.
  • Genome variation: The human genome averages ~41% GC, but individual regions vary widely.
def gc_content(seq):
    if len(seq) == 0:
        return 0.0
    gc = seq.count("G") + seq.count("C")
    return round(gc / len(seq) * 100, 2)

print(gc_content("ATCG"))    # 50.0  (2 out of 4)
print(gc_content("GCGCGC"))  # 100.0
print(gc_content("ATATAT"))  # 0.0

AlphaGenome's architecture uses convolutional layers to detect GC-rich regions and CpG islands among thousands of other short sequence patterns.

Your Task

Implement gc_content(seq) that returns GC percentage rounded to 2 decimal places.

Python runtime loading...
Loading...
Click "Run" to execute your code.