1 / 10
For the Sidra Mendelian Team

What if the diagnosis
is in the blood?

Genomics England added one blood draw to 5,412 rare-disease patients and solved another 6.8%.

Sidra and QBB can do it better. The Gulf-ancestry control reference is already in the freezer.

Manuel Corpas · Sidra Medicine · 6 May 2026

Source: medRxiv 2026.03.19.26348811 · Genomics England 100,000 Genomes Project

The Source Paper

5,412 individuals. Blood RNA-seq. +15 confirmed candidate diagnoses from RNA outliers alone.

Genomics England, March 2026. Already-WGS-ed rare-disease participants from the 100,000 Genomes Project, blood PAXgene RNA-seq, OUTRIDER + FRASER2 via the DROP pipeline.

5,412
individuals sequenced
Whole-blood RNA-seq, paired with WGS in the Genomics England 100,000 Genomes Project.
20%
at least one outlier event
In a disorder-relevant gene. Triaged down to clinical-grade candidates.
6.8%
cohort confirmed diagnostic rate
Highest in dysmorphic / congenital (16.5%), skeletal (13.1%), neurodevelopmental (11.4%).
Why this matters
A meaningful chunk of WGS-negative rare-disease patients can be solved by adding a single blood draw. The pipeline is open source. The bottleneck is having the right control reference and clinical workflow.
The Diagnostic Principle

For every gene, ask: does the patient look like the controls?

The ClawBio skill
Implements the diagnostic principle in ~250 lines of Python. Production-grade swap to OUTRIDER + FRASER2 (DROP pipeline) is a configuration change, not a rewrite. Same I/O contract.
Live Demo

One command. End-to-end in seconds.

100 synthetic Gulf-ancestry control samples + 2 cases with injected outliers (FBN1 collapse in CASE-001, NF1 elevation in CASE-002), 200 genes screened.

$ python clawbio.py run rdoutlier --demo

# Generating synthetic cohort: 100 controls + 2 cases · 200 genes
# Library-size normalising, computing per-gene robust statistics
# Scoring case samples against control reference panel
# Filtering against ClinGen haploinsufficient panel (50 genes)

{
  "cases": 2,
  "controls": 100,
  "genes_screened": 200,
  "z_threshold": 3.0,
  "outlier_events": 5,
  "panel_outlier_events": 3,
  "cases_with_panel_hit": 2
}

Report: rdoutlier_report/report.md [OK]
Result

Both injected outliers detected. Background gene noise filtered out.

How to read: rows are top outlier genes, columns are case samples. Red = case sample over-expressed vs control panel. Blue = case sample under-expressed. Magnitude = modified z-score against the 100-control reference.
Per-case z-score heatmap

CASE-001 · FBN1 down (z = -18.6)
Consistent with Marfan syndrome. LoF / haploinsufficiency. Reflex to WGS variant review in FBN1.

CASE-002 · NF1 up (z = +9.0)
Atypical direction. Plausible loss-of-regulation, allele-specific expression, or dominant-negative. Requires VUS workup.

SMC1A weak 2 background
Off-panel and weak panel signals correctly de-prioritised.

Demo → Production

Same skill. Swap the components.

The skill's I/O contract does not change between today's demo and tomorrow's clinical-grade pipeline. Each cell below is a configuration choice, not a rewrite.

ComponentDemo (today)Sidra production
Reference cohort100 synthetic samplesQBB n ≈ 12K PAXgene blood, ancestry-matched
Cases2 syntheticSidra paediatric WGS-negative referrals
Aligner + quantifiernone (counts in directly)STAR + featureCounts on existing SLURM
Outlier algorithmper-gene robust z-scoreOUTRIDER autoencoder + FRASER2 splicing (DROP)
Disease panel50 ClinGen haploinsufficientClinGen + Genomics England PanelApp
Reproducibilitybundle auto-emittedSame bundle, attached to MDT report
Return-of-result loopreport.mdSidra MDT reflex from WGS-negative referrals
Governancelocal-only, no networkCompatible with Sidra secure environment
Honest Limits

What we will not pretend.

IRB Existing QBB consents likely do not cover use as controls for Sidra clinical cases. New protocol or umbrella amendment required. Rate-limiting step, not the compute.

Tissue ceiling Blood expresses ~70% of disease genes. Conditions with brain-, muscle-, liver-restricted expression are missed. The pilot does not claim to replace tissue biopsies.

Algorithm gap Today's demo uses robust z-scoring (the diagnostic principle, runnable in seconds). Clinical-grade calls require the full DROP pipeline with confounder correction. Both are open source. The swap is configuration, not research.

Off-target outliers ~0.5 false-positive outliers per case at z ≥ 3 with 200 genes screened. Disease-panel filter removes most; MDT review handles the rest. Same constraint as the GE NGRL paper.

ClawBio is a research and educational tool. Not a medical device. Does not provide clinical diagnoses.