A computational pipeline for early identification of new SARS-CoV-2 variants
In a recent study posted on medrex sib*Preprint server, researchers can identify emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants (VOI) by analyzing SARS-CoV-2 genomic data and assigning risk scores based on it. ) for early identification. of functional and epidemiological parameters.
Background
As SARS-CoV-2 variants with enhanced immune evasion, transmissibility, and replication continue to emerge, viral genome evolution must be monitored. Early detection of SARS-CoV-2 VOI may enable prioritization of variants for evaluation, risk assessment, and public health optimization against SARS-CoV-2.
About research
In the current study, researchers developed a computational heuristic framework to rapidly detect new SARS-CoV-2 VOIs and prioritize them for wet-lab experiments.
Genomic data for each variant mutation were obtained from the Global Initiative for Sharing All Influenza Data (GISAID), GenBank, and BV-BRC (Bacterial and Virus Bioinformatics Resource Center) Databases. The sequences were processed to identify high priority VOIs for wet lab experiments. Variant prioritization was based on epidemiological dynamics and inferred functional characteristics based on sequence prevalence scores, functional impact scores and composite scores.
The framework ranks variant constellations (or covariates) to determine mutation combinations to evaluate and Omicron variants detected to validate the computational approach. The genome was pairwise aligned with the reference (Wuhan-Hu-1 strain) genome and the variant constellation of mainly SARS-CoV-2 S was extracted. and regions were used to calculate spatiotemporal epidemiological dynamics. Monthly variant growth and prevalence.
The Sequence Prevalence Score will be launched in November 2021 ( calculated from the GISAID data of the Omicron dominance period for the last 3 months. A score of 1 was assigned to each country-month combination in which the sequence prevalence exceeded 5% or the growth rate increased by more than 5-fold from the previous month. The scores were summed to obtain a final sequence prevalence score for all country/month combinations.
A functional impact score (FIS) was derived by summing sequence function of interest (SFoC) scores based on the positional overlap of SARS-CoV-2 S regions. The SFoC score was calculated based on the effect of the variant on replication, immune evasion, or binding to the angiotensin-converting enzyme 2 (ACE2) receptor or monoclonal antibodies and neutralization of the variant by vaccination or previous infection. A composite score (CS) was calculated by summing the sequence prevalence score (SPS) and the functional impact score. Emerging lineage scores were calculated from his GISAIDA data from December 2021 to January 2022 by summing the scores of lines with >15 growth rates.
result
The team identified 75 regions that had a significant impact on binding of 4 or fewer antibodies in the SARS-CoV-2 S RBD and 36 regions that had a significant impact on vaccine or convalescent serum antibody binding. did. Twelve sites with one or more mutations above a threshold (>0.1) were identified as exhibiting enhanced ACE2 affinity. Among them, site number 501 was the site of multiple conformational changes in the SARS-CoV-2 S RBD binding interaction.
Critical sites for adaptive immune responses and SARS-CoV-2 tropism are the SARS-CoV-2 S N-terminal domain (NTD) sites 14-20, 140-158, 245-264, site 614, and SARS-CoV-2 S 2 S sites 671-692. Cleavage of the furin protein. Omicron’s epidemiological data showed a low SPS but a significantly higher FIS, resulting in higher CS values. CS can also quantify small differences in single-clade covariates. BA.1 was the dominant Omicron lineage in December 2021 and had the highest emerging lineage score.
By January 2022, omicron lineages such as BA.1, BA.1.1 and BA.2 evolved with multiple covariates. The BA.2 variant constellation was identical to Omicron BA.1 with multiple unique mutation sites. Mutant BA.1 (with the R346K mutation) showed a higher functional impact score than Omicron BA.1. In contrast, many covariates had a sequence prevalence score of 0, indicating no significant threat from changes in growth.
Prior to January 2022, the N440K, G446S, L24-, R346K, A701V, and L452R mutations emerged sporadically, and mutational dynamics plots show that the G446S and R346K mutations are less prevalent, while the L24- showed that it is concomitantly more prevalent. This finding demonstrates the fitness advantage of L24-containing variants and may help distinguish between BA.2 and BA.1.
Conclusion
Overall, the study results suggest that SARS-CoV-2 variants may be identified at an early stage based on sequence prevalence, mutation prevalence, and mutational impact on SARS-CoV-2 function such as binding to the ACE2 receptor. highlighted a new computational spatio-temporal framework for detecting . There were some challenges in developing the framework. For example, ambiguous variations in sequence data during the emergence of Delta and Omicron variants, accurate data quantification for computation, and analysis of large and continuously growing data.
*Important Notices
medRxiv publishes non-peer-reviewed, preliminary scientific reports and should not be considered conclusive, to guide clinical practice/health-related actions, or to be treated as established information .
