Multi-Database Aggregator¶
Track: A - AI Engineers Difficulty: Advanced Time estimate: 2-4 hours
What You'll Build¶
A ClawBio skill that takes an rsID or gene name and queries three major databases in parallel: ClinVar (clinical significance), gnomAD (population frequencies), and Open Targets (drug associations and disease links). It merges the results into a single unified report.
Why This Matters¶
Researchers and clinicians currently visit multiple websites and manually cross-reference data. A federated query skill reduces a 30-minute manual process to a single command, and catches connections that manual searching often misses.
Inputs and Outputs¶
Input: An rsID (e.g. "rs3798220") or gene name (e.g. "LPA") Output: Unified markdown report combining: ClinVar classification and review status, gnomAD population frequencies, Open Targets associated diseases and drugs
Key APIs / Data Sources¶
- ClinVar E-utilities - esearch + esummary for variant records
- gnomAD GraphQL API - population allele frequencies
- Open Targets Platform API - gene-disease and gene-drug associations
Getting Started¶
- Create your skill folder:
skills/multi-db-aggregator/ - Build three query functions, one per database, each returning a standardised dict
- Use
concurrent.futures.ThreadPoolExecutorto run all three queries in parallel - Merge results into a single structured report with clear section headers
- Handle partial failures gracefully: if one API is down, still return results from the others
Domain Decisions for SKILL.md¶
- For rsID input, query ClinVar and gnomAD directly; map to gene for Open Targets
- For gene input, fetch top 5 ClinVar variants by review stars, gnomAD constraint metrics, and Open Targets top 10 associations
- Always include data freshness: show the date of each database query
- Confidence tier: label each finding as Reviewed, Preliminary, or Computational based on source evidence level
Demo Data¶
Use rs3798220 (LPA gene, associated with heart disease risk and lipoprotein(a) levels). This variant has ClinVar entries, clear population frequency differences in gnomAD, and strong Open Targets disease associations, making it an ideal demonstration of cross-database value.
Stretch Goals¶
- Add a fourth source: PharmGKB for pharmacogenomic annotations
- Cache results locally to avoid repeated API calls during development
- Generate a confidence-weighted summary score for clinical actionability
- Support batch input: a list of rsIDs producing a combined report