Lesson 3 of 15
GC Content
Measuring Sequence Composition
GC content is the percentage of bases in a DNA sequence that are either Guanine (G) or Cytosine (C). It is a basic but informative measure of genomic composition.
GC% = (count(G) + count(C)) / total_length × 100
GC content matters for several reasons:
- Stability: G–C base pairs have three hydrogen bonds (vs. two for A–T), making GC-rich regions more thermally stable.
- CpG islands: Regions with unusually high GC content and many C–G dinucleotides. They typically mark gene promoters — the switches that turn genes on.
- Genome variation: The human genome averages ~41% GC, but individual regions vary widely.
def gc_content(seq):
if len(seq) == 0:
return 0.0
gc = seq.count("G") + seq.count("C")
return round(gc / len(seq) * 100, 2)
print(gc_content("ATCG")) # 50.0 (2 out of 4)
print(gc_content("GCGCGC")) # 100.0
print(gc_content("ATATAT")) # 0.0
AlphaGenome's architecture uses convolutional layers to detect GC-rich regions and CpG islands among thousands of other short sequence patterns.
Your Task
Implement gc_content(seq) that returns GC percentage rounded to 2 decimal places.
Python runtime loading...
Loading...
Click "Run" to execute your code.