PubMed Research Summariser¶
Track: A - AI Engineers Difficulty: Beginner Time estimate: 2-4 hours
What You'll Build¶
A ClawBio skill that takes a gene name or disease term, queries the PubMed E-utilities API, and returns a structured markdown summary of the top recent papers. The summary includes titles, authors, journals, publication dates, and key findings extracted from abstracts.
Why This Matters¶
Researchers spend hours manually searching PubMed and reading abstracts to stay current. An automated summariser turns a keyword into an actionable briefing in seconds, helping clinicians and researchers make faster evidence-based decisions.
Inputs and Outputs¶
Input: A gene name (e.g. "BRCA1") or disease term (e.g. "type 2 diabetes") Output: Structured markdown with paper titles, authors, journal, date, and a one-sentence abstract highlight for each of the top 10 results
Key APIs / Data Sources¶
- PubMed E-utilities - esearch.fcgi for IDs, efetch.fcgi for records
- Use retmode=xml and parse with Python's xml.etree.ElementTree
Getting Started¶
- Fork the ClawBio repo and create a new skill folder:
skills/pubmed-summariser/ - Use
esearch.fcgi?db=pubmed&term=BRCA1&retmax=10&sort=dateto get PMIDs - Use
efetch.fcgi?db=pubmed&id=PMID1,PMID2&retmode=xmlto fetch full records - Parse the XML to extract title, authors, journal, date, and abstract text
- Format results as a markdown table or structured list
Domain Decisions for SKILL.md¶
- Sort by publication date (most recent first) rather than relevance
- Limit to 10 results by default, configurable up to 50
- Extract the first sentence of each abstract as the "key finding"
- Filter to English-language articles only using
AND english[la]
Demo Data¶
Search for "BRCA1" or "type 2 diabetes" as your test queries. PubMed is freely accessible with no API key required (though an API key raises the rate limit from 3 to 10 requests per second). Register for a free key at https://www.ncbi.nlm.nih.gov/account/.
Stretch Goals¶
- Add MeSH term expansion to improve recall
- Include citation counts using the elink API
- Generate a word cloud from combined abstracts
- Support date range filtering (e.g. last 6 months only)