ClawBio Workshop

Agentic Genomics

Hands-on with AI for Variant Interpretation and GWAS

Dr Manuel Corpas · University of Westminster · 22 April 2026

Agentic engineering: the new frontier

Biology has become a data-saturated science. The tools to analyse it have not kept up.

A single human genome produces three billion base pairs. Clinical interpretation requires alignment, variant calling, functional annotation, literature cross-referencing, and synthesis into actionable reports. Each step depends on specialised software with its own dependencies, version histories, and configuration requirements.

Workflow managers (Nextflow, Snakemake, Galaxy) emerged because manual orchestration is unsustainable. Yet even with these frameworks, a biologist who wants to analyse their own sequencing data must either learn to code, hire someone who can, or rely on graphical interfaces that may not support the analysis they need.

The models are good enough. The harness is not.

Modern LLMs can write, debug, and execute code. They can plan multi-step operations, adapt based on intermediate results, and coordinate dozens of tools.

But a general-purpose LLM generating bioinformatics code from scratch is unreliable. Output varies between sessions. It lacks the specificity that domain experts build into workflows over years. It halluccinates gene-drug associations and invents variant classifications.

The problem is not the model. The problem is the harness: what constrains the model, what tools it can call, what guardrails prevent silent errors. Agentic engineering is the discipline of building that harness.

What is agentic AI?

The first wave of LLMs in the life sciences was information retrieval: summarising papers, answering questions about pathways, extracting structured data from text. Useful but incremental.

The second wave is qualitatively different. When connected to file systems, databases, and command-line tools, LLMs become autonomous agents that plan multi-step operations, execute them, and adapt based on intermediate results.

The shift: from AI that tells you things to AI that does things. The researcher's role shifts from constructing the analysis to evaluating it.

What is an agent?

Agentic genomics is the use of autonomous AI agents, powered by large language models and operating within domain-constrained skill libraries, to discover, plan, execute, and iteratively refine multi-step genomic analyses, where the agent exercises runtime decision-making over tool selection, parameterisation, error handling, and output evaluation.

Corpas, Fatumo, Guio. "Agentic Genomics: From Pipeline Automation to Autonomous Validation."

Four necessary conditions: autonomy (runtime decisions), domain constraint (skill libraries, not ad hoc code), iterative refinement (error diagnosis and self-repair), and natural language mediation (no programming required).

How does it work?

Traditional workflow

Researcher writes code
Configures tools manually
Runs pipeline
Interprets results

Bottleneck: code production

Agentic workflow

Researcher describes intent in natural language
Agent discovers and executes skills from a library
Agent handles errors and adapts
Researcher validates results

Bottleneck: validation and judgement

Why is it useful?

Metric	Human-directed	Agent-mediated
Setup time	2-4 hours	5-15 minutes
Monitoring required	Continuous	Minimal (agent handles errors)
Reproducibility	Variable	High (skill specification fixed)
Error recovery	Manual debugging	Automated diagnosis and repair
Prerequisite expertise	Bioinformatics training	Domain knowledge for validation
Primary failure mode	Config errors, version conflicts	Silent plausible-looking errors

Comparative analysis across exome and scRNA-seq workflows (Corpas, Fatumo, Guio).

How does this apply to genomics?

What is ClawBio?

An open-source toolkit of AI agent skills for genomic analysis.

A skill is a self-contained, versioned unit of bioinformatics functionality: a SKILL.md contract that encapsulates code, configuration, data references, I/O specifications, and test suites. The AI reads the contract; it never overrides it.

40+bioinformatics skills

737+GitHub stars

754registered today

0cost

How can it help me do better research?

Variant interpretation

VEP, ClinVar, gnomAD, ACMG classification, pharmacogenomics via CPIC. One command, structured report.

GWAS + PRS

Query 9 databases in parallel, compute polygenic risk scores, fine-map loci with SuSiE. All from summary statistics.

Equity audits

HEIM equity scorer measures how well a dataset represents diverse populations. Flags ancestry bias in analyses.

Today you will use two of these. Session 1: annotate a real genome and discover clinical findings. Session 2: run a GWAS lookup, compute PRS, and fine-map a locus. All in Google Colab, for free.

What are the limitations?

Agentic genomics lowers the barrier to generating analyses. It does not lower the barrier to evaluating them.

Silent plausible errors

The primary failure mode: results that look correct but are not. One skill returned "all normal" pharmacogenomics for an empty input file.

Hallucination

Agents can cite non-existent gene-disease associations, fabricate references, or generate variant annotations that conflate unrelated loci.

Equity bias

Agents default to European-ancestry resources. 86% of GWAS data is European. Agentic genomics can automate existing biases at scale.

Validation gap

Domain expertise built over years cannot be shortcut by AI. A novice user will not catch the errors an experienced analyst would.

The democratisation is real, but partial. It expands the capacity to produce; it does not expand the capacity to judge.

Democratising genomics

The infrastructure barrier is gone.

No HPC

Everything runs in Google Colab on a free tier.

No data agreements

Summary statistics are publicly released. No application, no waiting.

No bioinformatics team

ClawBio wraps the full pipeline. One command per analysis.

No cost

Google Colab is free. ClawBio is MIT-licensed. All databases are public.

A researcher in Lima, Kampala, or Dhaka can run the same analyses as one at the Broad Institute. Today.

Built with the Global South in mind

This workshop builds on research developed with Prof Segun Fatumo (Queen Mary University of London / PHURI) around removing the barriers that keep genomics concentrated in wealthy institutions:

HPC infrastructure

Replaced by Google Colab. Zero setup, zero cost, runs anywhere with a browser.

Data access

Public summary statistics cover most common analyses. No institutional gatekeeping.

Training

Open workshops, open materials, open code. Researchers anywhere can run publication-quality analyses.

A researcher in Lima, Kampala, or Dhaka can run the same analyses as one at the Broad Institute. Today.

Hands-on

How do I get started?

Open Google Colab now

Today's schedule

Session	What you'll do	Time
Introduction	What is agentic genomics, how ClawBio works, what we will cover today	10 min
Session 1: Variant Interpretation	Annotate a real human genome (the Corpasome). Discover Factor V Leiden, CFTR carrier status, warfarin sensitivity, APOE risk, and haemochromatosis. VEP, ClinVar, gnomAD, CPIC.	30 min
Session 2: GWAS	Query variants across 9 federated databases. Compute polygenic risk scores. Fine-map a GWAS locus with SuSiE. Explore cross-ancestry differences.	30 min
Q&A	Discussion and questions	20 min

Requirements: A Google account and a web browser. Nothing to install. No API keys. No payment.

Getting started

Open the tutorial materials in a new tab:

Session 1: Variant Interpretation

docs.clawbio.ai/tutorials/variant-interpretation-workshop

Session 2: GWAS

docs.clawbio.ai/tutorials/gwas-workshop

Click the Colab link in the tutorial page to open the notebook. Then click "Copy to Drive" and follow along.

Take-home messages

AI makes the mechanical steps faster, but domain expertise in molecular biology, genetics, and clinical context remains essential.
Pharmacogenomics is saving lives today. Drug-gene interactions like warfarin/CYP2C9/VKORC1 are implemented in hospitals now.
"We don't know yet" is a valid answer. Over half of all variants are VUS. Honest uncertainty is a strength, not a weakness.
Cross-ancestry analysis is not optional. Always check whether findings transfer across populations.
Infrastructure is no longer a barrier. Google Colab + ClawBio = publication-quality analysis, for free, anywhere.

Join the community

WhatsApp

Join the group for questions, updates, and future workshops.

WhatsApp Group

Discord

Chat with RoboTerri, share results, and collaborate with other users.

Discord Server

GitHub

Star the repo, open issues, contribute skills. All MIT-licensed.

ClawBio/ClawBio

Resources

Resource	Link
ClawBio GitHub	github.com/ClawBio/ClawBio
Documentation	docs.clawbio.ai
Variant Interpretation tutorial	docs.clawbio.ai/tutorials/variant-interpretation-workshop
GWAS tutorial	docs.clawbio.ai/tutorials/gwas-workshop
Corpasome (Zenodo)	doi:10.5281/zenodo.19297389
WhatsApp group	WhatsApp
Discord	Discord

github.com/ClawBio/ClawBio