Pipeline for Real-time Analysis and eXtraction of clInical documentation from Speech
An end-to-end AI system that listens to British GP consultations and automatically generates structured clinical notes — reducing documentation time while maintaining clinical accuracy.
British GPs spend up to 11 minutes per consultation on clinical documentation. This administrative burden reduces patient face-time, contributes to burnout, and is a leading cause of GP workforce attrition in the NHS.
Average documentation time per consultation
Of GP time spent on administrative tasks
GPs plan to leave the profession within 5 years
PRAXIS processes GP consultation audio through four intelligent stages, producing structured clinical notes ready for GP review.
Fine-tuned Whisper and MedASR models transcribe British English GP audio with medical vocabulary awareness.
Pyannote neural diarisation identifies who said what — separating Doctor and Patient speech with timestamps.
MedCAT extracts symptoms, medications, and conditions with SNOMED CT linking. medspaCy detects negations.
QLoRA fine-tuned Med42-8B generates structured clinical notes with UK English, SNOMED codes, and safety-netting.
SOAP, SBAR, Clinical Summary, Referral Letter, Discharge Summary, and EMIS-style consultation records.
Automatic clinical terminology coding validated against 73 reference SNOMED CT codes commonly used in UK primary care.
Every generated note includes safety-netting advice — red flag symptoms, when to return, emergency contacts.
Enforced British medical terminology — paracetamol not acetaminophen, A&E not ER, physiotherapy not physical therapy.
Runs entirely on-device with no cloud dependency. All models run locally via Ollama — compliant with NHS data governance.
Post-generation validation checks that diagnoses and medications in the SOAP note are grounded in the original transcript.
Evaluated on the PriMock57 dataset — 57 British GP mock consultations (17.27 hours of audio).
| Model | WER | Time/Clip |
|---|---|---|
| Best Base Whisper Small | 50.1% | 0.71s |
| Base Whisper Medium | 51.3% | 1.56s |
| LoRA Whisper v2 | 57.4% | 0.54s |
| Base MedASR | 93.1% | 0.45s |
| Fine-tuned MedASR | 82.4% | 0.46s |
| Model | Score | SNOMED |
|---|---|---|
| Best PRAXIS-SOAP-v1 | 0.98 | 100% |
| Med42-8B (Baseline) | 0.97 | 100% |
| Mistral-7B (Zero-shot) | 0.98 | 100% |
| Meditron3-8B | 0.77 | 12% |
First published evaluation of Google MedASR on British GP audio. 93.1% WER reveals critical accent/domain mismatch — the model was trained on US radiology dictation.
LoRA fine-tuning on small datasets causes catastrophic hallucination in Whisper (514% WER). Fixed by freezing the encoder.
Whisper Small (244M params) outperforms Whisper Medium (769M) on British GP audio — challenging the larger-is-better assumption.
OpenAI Whisper, Google MedASR, LoRA/PEFT fine-tuning
Pyannote Audio 3.1 neural speaker segmentation
MedCAT, medspaCy, SNOMED CT ontology
Med42-8B, Meditron3-8B, Mistral-7B via Ollama
QLoRA (4-bit NF4), HuggingFace Transformers, PEFT
PriMock57: 57 British GP consultations, 17.27 hours
jiwer (WER/CER), automated quality scoring, GP blind scoring
Python 3.12, FastAPI, Streamlit, Apple M4 Pro (fully local)
MSc AI for Business Intelligence
University of Leicester, 2025-2026. Research focus: clinical NLP, speech recognition, and medical LLM fine-tuning for NHS primary care.
Dissertation Supervisor
University of Leicester, School of Computing and Mathematical Sciences. Supervising the technical direction and academic rigour of the PRAXIS project.
Industry Partner
Healthcare analytics company providing real-world clinical requirements and deployment context for the Patient Health Data Management System (PHDMS).