Lesson 14 of 15
Regulatory Scoring
Predicting Regulatory Activity
Regulatory sequences control gene expression. We can build a scoring function that estimates a region's regulatory activity based on the sequence features we have learned:
- TATA box (motif TATA): core promoter element, recruits transcription
- GC box (motif CACCC): Sp1 transcription factor binding site
- CpG dinucleotides (CG): marks of active gene promoters
def regulatory_score(seq):
score = 0
score += seq.count("TATA") * 10 # TATA box
score += seq.count("CACCC") * 5 # GC box
score += seq.count("CG") * 2 # CpG dinucleotides
return score
This is a toy model, but the idea is identical to what AlphaGenome does — except AlphaGenome learns its scoring weights from millions of experimental measurements across hundreds of cell types, capturing patterns far too subtle for humans to write by hand.
Your Task
Implement regulatory_score(seq) using the scoring rules above.
Python runtime loading...
Loading...
Click "Run" to execute your code.