Eliminating batch effects in large-scale liquid chromatography/mass spectrometry experiments

A paper published by Nature Communications describes a series of batch-effect removal neural networks that remove batch effects in large-scale liquid chromatography-mass spectrometry (LC-MS) experiments with the goal of maximizing sample classification performance between conditions. (BERNN) is proposed.

In this paper, we demonstrate that while LC-MS is a powerful method for profiling complex biological samples, batch effects typically arise due to the ubiquity of confounding factors, which are often associated with biological characteristics (such as age or (e.g. gender) and non-confounding factors. Biological (e.g. batch effects). Non-biological factors are virtually unavoidable in large-scale studies due to limitations in equipment availability and sample collection schedules. Ideally, batch effects would be removed from the final biological quantitative value. It can be difficult to completely eliminate batch effects without affecting the quality of biosignals. These effects can significantly impact the interpretability of results. Correcting for batch effects is critical to the reproducibility of omics studies. However, current methods are not optimal for removing batch effects without compressing the true biological variation under study.

The authors of this multi-partner paper, representing laboratories in the United States, Canada, the Netherlands, France, and the United Kingdom, present approaches to combat batch effects. This approach differs from most other solutions. Single solution. Instead, we acknowledge that not all problems require the same solution and propose multiple potential solutions to address batch effects. Therefore, we aim to make it easy for researchers to try multiple methods simultaneously and choose the best approach for their dataset or scientific question.

Among this series of models, the authors present the first of a variational autoencoder (VAE), a domain adversarial neural network (DANN), and a domain inverse triplet loss (invTriplet) for batch correction in LC-MS. I'll show you how to use it. Furthermore, in contrast to other batch correction methods, we do not recommend using the autoencoder's corrected output for biomarker discovery by downstream analysis (e.g., differential analysis). Rather, in their paper they describe a game-theoretic approach that uses classic Shapley values ​​from game theory and its related extensions to explain the output of a machine learning model that connects optimal credit allocation with local explanations. We demonstrate how SHapley Additive exPlanations (SHAP) (2) can be used for biomarker discovery.

A comparison of batch effect correction methods across five diverse datasets (Alzheimer's disease, adenocarcinoma, aged mice, benchmark, and mixed tissue) presented in the paper shows that the BERNN model consistently shows the strongest sample classification performance This has been proven. However, the model that yields the greatest improvement in classification does not always perform best in removing batch effects.

This paper also presented results that overcorrection for batch effects results in the loss of some of the essential biological variability. These findings highlight the importance of balancing the removal of batch effects while preserving valuable biological diversity in large-scale LC-MS experiments.

The authors believe that through their discovery and the resulting paper, their contribution to researchers facing the problem of batch effects is threefold. First, they demonstrated the effectiveness of a model that, to their knowledge, has never been applied to his LC-MS experiments to correct for batch effects. Second, they demonstrated the need to try different models to solve different problems. Finally, to obtain the best classification on a given dataset, removing some of the batch effects may improve the results, but removing too many batch effects may degrade classification performance. It shows that there is.


