For the Sidra Mendelian Team

What if the diagnosis
is in the blood?

Genomics England added one blood draw to 5,412 rare-disease patients and solved another 6.8%.

Sidra and QBB can do it better. The Gulf-ancestry control reference is already in the freezer.

Manuel Corpas · Sidra Medicine · 6 May 2026

Source: medRxiv 2026.03.19.26348811 · Genomics England 100,000 Genomes Project

The Source Paper

5,412 individuals. Blood RNA-seq. +15 confirmed candidate diagnoses from RNA outliers alone.

Genomics England, March 2026. Already-WGS-ed rare-disease participants from the 100,000 Genomes Project, blood PAXgene RNA-seq, OUTRIDER + FRASER2 via the DROP pipeline.

5,412

individuals sequenced

Whole-blood RNA-seq, paired with WGS in the Genomics England 100,000 Genomes Project.

20%

at least one outlier event

In a disorder-relevant gene. Triaged down to clinical-grade candidates.

6.8%

cohort confirmed diagnostic rate

Highest in dysmorphic / congenital (16.5%), skeletal (13.1%), neurodevelopmental (11.4%).

Why this matters

A meaningful chunk of WGS-negative rare-disease patients can be solved by adding a single blood draw. The pipeline is open source. The bottleneck is having the right control reference and clinical workflow.

The Diagnostic Principle

For every gene, ask: does the patient look like the controls?

Quantify expression for every gene in every sample (counts → CPM → log2)
For each gene, learn what "normal" looks like across a control reference panel (median, dispersion)
Score each patient gene as outlier (modified z-score, or autoencoder residual in OUTRIDER)
Filter to dosage-sensitive disease genes (ClinGen haploinsufficient, PanelApp)
Rank, hand to MDT, correlate with phenotype and WGS variants

The ClawBio skill

Implements the diagnostic principle in ~250 lines of Python. Production-grade swap to OUTRIDER + FRASER2 (DROP pipeline) is a configuration change, not a rewrite. Same I/O contract.

Live Demo

One command. End-to-end in seconds.

100 synthetic Gulf-ancestry control samples + 2 cases with injected outliers (FBN1 collapse in CASE-001, NF1 elevation in CASE-002), 200 genes screened.

$ python clawbio.py run rdoutlier --demo

# Generating synthetic cohort: 100 controls + 2 cases · 200 genes
# Library-size normalising, computing per-gene robust statistics
# Scoring case samples against control reference panel
# Filtering against ClinGen haploinsufficient panel (50 genes)

{
  "cases": 2,
  "controls": 100,
  "genes_screened": 200,
  "z_threshold": 3.0,
  "outlier_events": 5,
  "panel_outlier_events": 3,
  "cases_with_panel_hit": 2
}

Report: rdoutlier_report/report.md [OK]

Result

Both injected outliers detected. Background gene noise filtered out.

How to read: rows are top outlier genes, columns are case samples. Red = case sample over-expressed vs control panel. Blue = case sample under-expressed. Magnitude = modified z-score against the 100-control reference.

CASE-001 · FBN1 down (z = -18.6)
Consistent with Marfan syndrome. LoF / haploinsufficiency. Reflex to WGS variant review in FBN1.

CASE-002 · NF1 up (z = +9.0)
Atypical direction. Plausible loss-of-regulation, allele-specific expression, or dominant-negative. Requires VUS workup.

SMC1A weak 2 background
Off-panel and weak panel signals correctly de-prioritised.

Demo → Production

Same skill. Swap the components.

The skill's I/O contract does not change between today's demo and tomorrow's clinical-grade pipeline. Each cell below is a configuration choice, not a rewrite.

Component	Demo (today)	Sidra production
Reference cohort	100 synthetic samples	QBB n ≈ 12K PAXgene blood, ancestry-matched
Cases	2 synthetic	Sidra paediatric WGS-negative referrals
Aligner + quantifier	none (counts in directly)	STAR + featureCounts on existing SLURM
Outlier algorithm	per-gene robust z-score	OUTRIDER autoencoder + FRASER2 splicing (DROP)
Disease panel	50 ClinGen haploinsufficient	ClinGen + Genomics England PanelApp
Reproducibility	bundle auto-emitted	Same bundle, attached to MDT report
Return-of-result loop	report.md	Sidra MDT reflex from WGS-negative referrals
Governance	local-only, no network	Compatible with Sidra secure environment

Honest Limits

What we will not pretend.

IRB Existing QBB consents likely do not cover use as controls for Sidra clinical cases. New protocol or umbrella amendment required. Rate-limiting step, not the compute.

Tissue ceiling Blood expresses ~70% of disease genes. Conditions with brain-, muscle-, liver-restricted expression are missed. The pilot does not claim to replace tissue biopsies.

Algorithm gap Today's demo uses robust z-scoring (the diagnostic principle, runnable in seconds). Clinical-grade calls require the full DROP pipeline with confounder correction. Both are open source. The swap is configuration, not research.

Off-target outliers ~0.5 false-positive outliers per case at z ≥ 3 with 200 genes screened. Disease-panel filter removes most; MDT review handles the rest. Same constraint as the GE NGRL paper.

What if the diagnosisis in the blood?

Genomics England added one blood draw to 5,412 rare-disease patients and solved another 6.8%.Sidra and QBB can do it better. The Gulf-ancestry control reference is already in the freezer.

5,412 individuals. Blood RNA-seq. +15 confirmed candidate diagnoses from RNA outliers alone.

Genomics England, March 2026. Already-WGS-ed rare-disease participants from the 100,000 Genomes Project, blood PAXgene RNA-seq, OUTRIDER + FRASER2 via the DROP pipeline.

For every gene, ask: does the patient look like the controls?

One command. End-to-end in seconds.

100 synthetic Gulf-ancestry control samples + 2 cases with injected outliers (FBN1 collapse in CASE-001, NF1 elevation in CASE-002), 200 genes screened.

Both injected outliers detected. Background gene noise filtered out.

Same skill. Swap the components.

The skill's I/O contract does not change between today's demo and tomorrow's clinical-grade pipeline. Each cell below is a configuration choice, not a rewrite.

What we will not pretend.

What if the diagnosis
is in the blood?

Genomics England added one blood draw to 5,412 rare-disease patients and solved another 6.8%.

Sidra and QBB can do it better. The Gulf-ancestry control reference is already in the freezer.