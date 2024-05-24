



In a recent study published in Nature, researchers developed and evaluated a whole-slide pathology foundation model, the Providence Gigapixel Pathology Model (Prov-GigaPath), using large-scale real-world data and a novel vision transformer architecture to achieve state-of-the-art performance on digital pathology tasks.

Research: A whole-slide-based model for creating digital pathology from real-world data. Image credit: Color4260 / Shutterstock

background

Computational pathology can revolutionize cancer diagnosis through subtyping, staging, and prognostic prediction applications. However, current methods require extensive annotated data, which is costly and time-consuming. Self-supervised learning shows the potential to alleviate this need by pre-training models with unlabeled data. Challenges include the limited and variable quality of available data, difficulties in capturing local and global patterns, and limited access to pre-trained models. The foundational models provide strong generalization, which is essential for biomedical fields where unlabeled data is abundant. Further research is needed to improve the generalization and clinical applicability of these models across diverse datasets and real-world settings.

About the Research

The whole slide images (WSIs) of this study were pre-processed using a pipeline of 171,189 hematoxylin and eosin (H&E) stained and immunohistochemistry slides. For tissue segmentation, Otsu image thresholding was used to filter background regions. WSIs were resized to 0.5 μm per pixel and cropped into tiles of 256×256 pixels, discarding tiles with less than 10% tissue coverage. Prov-GigaPath was pre-trained using Vision Transformer (ViT) and Distillation of Knowledge in Networks version 2 (DINOv2) settings on 1,384,860,229 tiles. The slide encoder used the Long Sequence Network (LongNet) architecture. Pre-training, which included grid discretization, dilation, and masked autoencoder, was completed in 2 days using 16 nodes with 4×80 GB A100 GPUs.

Prov-GigaPath was compared with Hierarchical Image Pyramid Transformer (HIPT), Contrastive Learning-Based Pathology Model (CtransPath), and Robust and Data-Efficient Generalization of Self-Supervised Machine Learning for Diagnostic Images (REMEDIS). Pre-trained on The Cancer Genome Atlas (TCGA) slides, HIPT used a Hierarchical Image Pyramid Transformer architecture, while CtransPath combined a Convolutional Neural Network (CNN) with a SwinTransformer model. REMEDIS used a Resnet backbone in a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) approach. Prov-GigaPath and these models were fine-tuned on various downstream tasks using an Attention-Based Multiple Instance Learning (ABMIL) technique for slide-level embeddings.

For mutation prediction, we used Providence Pathology (Prov-Path) data to build tasks such as pan-tumor cancer (pan-cancer) biomarkers and gene mutations, and evaluated them with 10-fold cross-validation using the area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). The assessment of cancer subtypes covered nine types, and the model was fine-tuned for 20 epochs.

Vision-language alignment involved the creation of 17,383 pathology WSI report pairs processed with the open-source Contrastive Language-Image Pre-training (OpenCLIP) codebase. Reports were cleaned using Generative Pre-trained Transformer (GPT)-3.5, and text embeddings were computed using OpenAI's text-embedding-ada-002 model. For the zero-shot prediction task, we used the MI-Zero settings and prompt templates to evaluate models such as Multiple Instance Learning Zero-shot Transfer (MI-Zero), Biomedical Contrastive Language-Image Pre-training (BiomedCLIP), and Pathology-specific Language-Image Pre-training (PLIP) for subtyping and mutation status prediction.

a,Flow chart showing the model architecture of Prov-GigaPath. Prov-GigaPath first serializes each input WSI into a sequence of 256 × 256 image tiles in row-major order and converts each image tile into a visual embedding using an image tile-level encoder. Prov-GigaPath then applies a slide-level encoder based on the LongNet architecture to generate contextualized embeddings that can be used as the basis for various downstream applications. b,Image tile-level pre-training with DINOv2. c,Slide-level pre-training with LongNet using a masked autoencoder. [CLS] A classification token.

research result

This study demonstrates that Prov-GigaPath performs better on a range of digital pathology tasks compared to existing methods. Prov-GigaPath was pre-trained on Prov-Path, a large-scale dataset obtained from the Providence Healthcare System. The dataset contains 1,384,860,229 image tiles extracted across 171,189 pathology slides from approximately 30,000 patients. The model employs the GigaPath architecture, leveraging the LongNet method for ultra-large-scale context modeling of gigapixel WSIs.

Prov-GigaPath showed significant improvements in mutation prediction and cancer subtyping tasks. For example, in the lung adenocarcinoma (LUAD)-specific five-gene mutation prediction task using TCGA data, Prov-GigaPath outperformed competing models with higher AUROC and AUPRC scores. Similar results were seen in the pan-cancer 18-biomarker prediction and pan-cancer tumor mutation burden (TMB) prediction tasks, demonstrating the robustness and generalizability of the model across various datasets.

In addition to mutation prediction, Prov-GigaPath also excels in cancer subtype classification tasks, outperforming state-of-the-art models in subtype classification for nine major cancer types. The significant performance gains highlight the effectiveness of combining local tile embeddings with global slide-level contextual information using LongNet.

Prov-GigaPath also explored visual language processing by aligning pathology images with associated text reports. The model achieved the best zero-shot classification results on the non-small cell lung cancer (NSCLC) and colorectal adenocarcinoma (COADREAD) subtyping task compared to three state-of-the-art pathology visual language models. This demonstrates the benefit of slide-level alignment enabled by LongNet, which leverages real clinical data over other data sources such as Twitter (X).

Conclusion

This study highlighted the potential of Prov-GigaPath to enhance clinical diagnosis and decision support in digital pathology. Its scalability and adaptability make it a promising tool for broader biomedical applications, facilitating efficient self-supervised learning from high-resolution images. Prov-Path contains 1,384,860,229 image tiles from 171,189 pathology slides of approximately 30,000 patients, which is significantly larger than TCGA. GigaPath uses LongNet5 for ultra-large-scale context modeling of gigapixel WSI. Prov-GigaPath demonstrated state-of-the-art performance in pathology, cancer subtype classification, and visual language processing tasks on both Providence and TCGA datasets. The success of this model suggests that it can be applied to a broader biomedical domain for efficient self-supervised learning from high-resolution images.

