Tumour Mutational Burden¶
Track: D - Clinical Difficulty: Intermediate Time estimate: 2-4 hours
What You'll Build¶
A ClawBio skill that takes a somatic VCF file and calculates the tumour mutational burden (mutations per megabase), estimates microsatellite instability (MSI) status, and reports on immunotherapy relevance. The output is a structured clinical summary with key metrics.
Why This Matters¶
TMB and MSI are approved biomarkers for immunotherapy response. The FDA has approved pembrolizumab for TMB-high tumours (>= 10 mut/Mb). Automating this calculation from a standard VCF brings precision oncology within reach of any sequencing lab.
Inputs and Outputs¶
Input: A somatic VCF file (tumour-normal or tumour-only variant calls) Output: Markdown report with: TMB score (mutations/Mb), MSI estimation, mutation type breakdown (SNV/indel), immunotherapy relevance statement, and a mutation spectrum bar chart (PNG)
Key APIs / Data Sources¶
- No external API required; all computation is local
- Reference: coding region size from Ensembl (approximately 35 Mb for whole exome)
- FDA TMB guidance for clinical thresholds
Getting Started¶
- Create your skill folder:
skills/tmb-calculator/ - Parse the somatic VCF and filter variants by quality thresholds
- Count qualifying mutations and divide by the target region size (in Mb)
- Classify mutations by type: C>A, C>G, C>T, T>A, T>C, T>G for the mutation spectrum
- Estimate MSI by counting indels at homopolymer regions (runs of 5+ identical bases)
Domain Decisions for SKILL.md¶
- Variant allele frequency (VAF) threshold: minimum 0.05 (5%) to count as somatic
- Minimum read depth: 30x at the variant position
- Count only coding region variants (filter by BED file or VCF annotation)
- TMB thresholds: low (< 6 mut/Mb), intermediate (6-10), high (>= 10)
- MSI estimation: high indel fraction at homopolymers (> 20%) suggests MSI-H
- Include both SNVs and indels in the TMB count (per TMB Harmonization guidelines)
Demo Data¶
Create a synthetic somatic VCF with 200 variants distributed across chromosomes 1-22. Include: 150 SNVs (mix of C>T transitions and other substitutions), 30 coding indels, and 20 filtered-out variants (low VAF or low depth). Set the target region to 1.5 Mb to give a TMB of approximately 100 mut/Mb (TMB-high).
Stretch Goals¶
- Add trinucleotide context analysis for mutational signature detection
- Compare observed spectrum against COSMIC SBS signatures
- Support panel-based TMB calculation with user-defined BED file
- Generate a PDF clinical report suitable for tumour board review