AI in the Histopathology Workflow: Detecting Hepatocellular
Ballooning to Support Scoring of Steatotic Liver Disease

Detecting Hepatocellular Ballooning to Support Scoring of Steatotic Liver Disease

At-a-glance (what this page covers)

  • Why ballooning matters and the challenges of manual scoring
  • What Perspectum’s AI means for clinical trials
  • Evidence from three clinical trials cohorts
  • Pathologists validation of our ballooning model

Why Ballooning Matters

Hepatocyte ballooning is a key pathological feature used to diagnose metabolic dysfunction–associated steatohepatitis (MASH). In practice, manual balloon scoring is inconsistent: readers disagree on which individual cells are ballooned and on the overall severity of ballooning in a sample, creating noise that can mask treatment effects in clinical trials. Scoring relies on a few cells to be identified among many thousands, meaning missing just one or two balloon cells can change scoring category.  

Perspectum Ballooning AI

Perspectum has developed a model that detects classic and non-classic ballooned hepatocytes on H&E-stained biopsies and indicates their locations to the pathologist. Findings are also summarized into slide-level features, such as density of large detections, the median detection size, and the highest detection count in a region of interest. These AI metrics are designed to provide objective, quantifiable measures of ballooning, but most importantly to a clinical trial primary endpoint it can assist pathologists, improving alignment and efficiency rather than replacing expert judgement.  

Where It Fits In Your Trial Workflow

Perspectum’s AI tools are integrated into our workflow, providing operational optimisation, supporting strength of your primary endpoints, and generating new insight into drug effect.

This ballooning model provides new capabilities to the central reading team, improving screening rates and increasing alignment in this vital, yet challenging, characteristic of MASH.

Study Data: Building an Evidence Base Across Cohorts

The AI model for hepatocellular ballooning detection was trained and evaluated using a robust dataset encompassing 598 haematoxylin and eosin (H&E)-stained whole-slide images (WSIs) from three independent clinical trial cohorts representing a mix of liver diseases. This multi-cohort design ensured that the model was exposed to a wide range of tissue morphology, disease severity, and slide preparation variability, reflecting the diversity encountered in real-world trial settings.

Composition of the Dataset

Out of the total 598 WSIs:

  • 63 slides were manually annotated for hepatocellular ballooning and used for model training.
  • A human-in-the-loop relabelling process was applied after the first WSI to refine annotation accuracy and capture non-classic ballooning features that are often missed in initial labelling rounds.
  • These annotated images came from two of the three trials, allowing the model to learn morphological distinctions between normal and ballooned hepatocytes.
  • The third trial was kept separate as an external validation cohort, ensuring unbiased assessment of model generalizability across new data sources

Analytical Framework: From Pixel-Level Detections to Quantitative Biomarkers

The AI system processes entire WSIs to detect hepatocytes that display morphological hallmarks of ballooning, including cellular swelling, rarefied cytoplasm, and disruption of cell borders. These detections are then summarized into slide-level quantitative metrics that capture the extent and severity of ballooning.

Derived Whole-Slide Features

For each biopsy, the following metrics were computed:

  • Detection Density: Number of identified ballooned hepatocytes per mm² of tissue, stratified by detection size thresholds (with particular focus on cells exceeding 1200 μm²), indicating the extent of disease.
  • Median Detection Size: The midpoint area of all AI-detected hepatocytes, providing a measure of typical ballooning severity within a slide.
  • Largest Detection Size: Reflecting the presence of extreme pathology, which may indicate advanced disease.

These metrics convert pathologist-level assessments into reproducible, continuous variables that can be directly integrated into clinical trial endpoints or digital biomarker pipelines.

Statistical Validation: Bridging AI Outputs and Human Scoring

To evaluate how well AI-derived metrics align with traditional manual scoring, the study employed Kendall’s Tau correlation, a non-parametric statistic that measures ordinal association between ranked variables.

Across all three clinical trials:

  • Detection density showed Tau correlations between 0.38 and 0.60, with p-values < 0.0001.
  • Median detection size and largest detection size followed similar trends (Tau 0.32–0.55).

These results indicate that AI-derived features correlate with human judgement while offering objective quantification that may reduce inter-reader variability.

Pathologist Review: Evaluating Clinical Interpretability

Recognizing that AI in histopathology must work as a decision-support tool, not a replacement, Perspectum assessed how pathologists perceived and agreed upon AI-generated detections.
A panel of six experienced liver pathologists reviewed 216 sampled detections drawn from 54 WSIs, classifying each detection as classic HB, non-classic HB, or false positive.

The findings underscored both the promise and challenges of ballooning assessment:

  • 83% of AI detections were labelled as ballooned cells by at least one pathologist.
  • Agreement among individual readers varied widely: 43–74% pairwise agreement, consistent with historic variability in ballooning scoring.
  • For each reviewer, between 13% and 67% of detections were accepted as hepatocyte ballooning, and among these, 11–71% were identified as classic balloons.

This variation highlights the biological and perceptual complexity of ballooning but also validates that AI can detect features that fall within the broad spectrum of what experts recognize as hepatocyte ballooning.

Interpreting the Results: A New Assistive Paradigm

The strong correlations between AI metrics and manual scoring demonstrate that computational models can capture meaningful histological signals associated with disease progression.
By transforming subjective visual assessments into measurable data, this approach has the potential to:

  • Enhance reader alignment and trial reproducibility.
  • Enable sensitive detection of treatment effects in MASH and related diseases.

Collaborators

Work conducted with partners across leading authors and institutions including:

D. Allende — Cleveland Clinic Foundation, Cleveland, Ohio, USA

P. Bedossa — LiverPat, Paris, France

M. Yeh — University of Washington, Seattle, Washington, USA

K. Fleming — Green Templeton College, University of Oxford, UK

E. Fryer — Department of Cellular Pathology, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK

T. Kendall — Institute for Regeneration and Repair, University of Edinburgh, UK

R. Goldin — Section for Pathology, Imperial College London, UK

J. Breen, D. Windell, P. Wakefield, R. Kainth, P. Aljabar, C. Langford, K. Fleming, E. Fryer, T. Kendall, and R. Goldin — Perspectum Ltd, Oxford, UK

References

Clinical trial registries for the cohorts used in this work: NCT03551522, UMIN000026145, ISRCTN39463479.