Lesson 14 of 15

Regulatory Scoring

Predicting Regulatory Activity

Regulatory sequences control gene expression. We can build a scoring function that estimates a region's regulatory activity based on the sequence features we have learned:

  • TATA box (motif TATA): core promoter element, recruits transcription
  • GC box (motif CACCC): Sp1 transcription factor binding site
  • CpG dinucleotides (CG): marks of active gene promoters
def regulatory_score(seq):
    score = 0
    score += seq.count("TATA")  * 10  # TATA box
    score += seq.count("CACCC") * 5   # GC box
    score += seq.count("CG")    * 2   # CpG dinucleotides
    return score

This is a toy model, but the idea is identical to what AlphaGenome does — except AlphaGenome learns its scoring weights from millions of experimental measurements across hundreds of cell types, capturing patterns far too subtle for humans to write by hand.

Your Task

Implement regulatory_score(seq) using the scoring rules above.

Python runtime loading...
Loading...
Click "Run" to execute your code.