Large-scale genome sequencing shows how SARS-CoV-2 mutated

Since the onset of the pandemic of Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), which is sweeping the globe, scientists have sequenced the viral RNA genome.
The viral genome has undergone a number of frequent changes, with the first change being made, followed by another strain of virus predominant in some countries and now worldwide.New survey published on preprint server bioRxiv* Describes the results of viral genome sequencing. This can help track the route of transmission of the virus from person to person and from country to country.
The original sequencing results showed that the viral genome was very similar to the other two SARS-like ones. Coronavirus Originally derived from bats in Yunnan, China, namely RaTG13 and RmYN02. Tracking the drop in SARS-CoV-2 confirms close identity with these viruses: early coronavirus, severe acute respiratory syndrome coronavirus (SARS-CoV), and Middle East respiratory syndrome coronavirus ( It was shown to be different from MERS-CoV). ). Some scientists believe that the current virus may be derived from one of the bat variants via an intermediate host.
Most of the current research focuses on over 4,000 full-length sequences of the viral genome derived from the Global Initiative (GISAID) EpiFlu database for sharing all influenza data. However, 11 is from the Chinese database. The sequence was uploaded over 14 weeks from the onset of the Wuhan outbreak. Researchers examined mutations to characterize the genotype.
They also analyzed another approximately 2,61,000 genomes collected worldwide during the 12 months since the pandemic began. It consists of all the genomes in the database.

The major genotypes identified in this study. (A) Unsupervised mutation clustering of all samples. It contained mutations that were called simultaneously from at least 5 samples. Eleven distinctive major mutation profiles were identified based on tree branch clustering and were named primarily based on the geographic location where a particular genotype was first or primarily reported. .. A two-letter ISO country code is used to indicate the country associated with the mutation profile (shown in the color bar below). The color bar above shows the genotype uniformity within each clustering tree branch. (B) Family tree of major genotypes. The combination of mutation clustering and available epidemiological information characterized 11 distinctive major genotypes, and the family tree showed the relationships between each genotype. The genotypes of Diamond Princess and Grand Princess, which are derived from M type and SEA type, respectively, are indicated by dashed arrows.
Superspreader introduced different genotypes
Researchers were able to identify different genotypes based on how common a particular mutation was. This helped track the Superspreader as they shaped the pandemic so much. These individuals transmitted a particular genotype with certain very common mutations. The single introduction of such a genotype led to the outbreak of infection and increased evolution with its spread.
Among these superspreader genomic sequences, M-type variants accounted for more than 80% of the sequences studied. Researchers conclude from estimates of expected substitution rates in the genome that this is not due to many identical superspreader genomes, but can be called the true founder effect.
6 offspring genotypes
They discovered six progeny genotypes that were directly derived from the ancestral strain through characteristic mutations. The most common of these genotypes was the WE1 type, which is defined by four mutations. Three of the four distinct mutations in the WE1 strain were found in three early samples collected in January 2020. 70% of the WE1 genome accounts for Western Europe (UK, Iceland, Belgium, France, the Netherlands, perhaps the border, and about 35% of cases in the United States.
The SEA type, which is the most common in the United States, has been isolated from three countries: Australia, Canada and Iceland, indicating that cases from the United States were imported from the United States. This is also known as the Washington Outbreak Clade. The other four progeny genotypes were regionally limited.
Researchers examined mutations in infected strains in four regions and concluded that type M had spread from Wuhan to other parts of China before Wuhan was blocked. Examining 34 sequences from the early Wuhan case showed two clusters. 30 belongs to the M type, but there is variety. The remaining four formed separate co-circulating clusters. Therefore, in this early stage, there were 18 different genotypes in 34 sequences.
In the United States, the epidemic strain belongs to non-M type, probably from 12 cases imported from Hubei province. In fact, these are the earliest cases reported in the United States, each showing a different genotype.
Half of the cases in the United States were SEA type, and about 35% were WE1. That is AmericaWithstood the first wave of case imports from China and the second wave from Europe. This is consistent with a recent COVID-19 study in Washington... Of the 32 patients on two cruise ships Grand princess And that Diamond princess, There were 25 different genotypes. This indicates that the virus mutates rapidly and widely during human-to-human transmission.
Origin strain algorithm is more accurate
Researchers have developed a Strain of Origin (SOO) algorithm to match each genotype to its genome by mutational profile. This approach showed 90% agreement when compared to mutation clustering. “”SOO represents a more accurate approach to genotyping because it considers only specific mutations of a particular genotype and little consideration of the effects of the remaining random mutations... “
Using the same approach, we found that three of the top four GISAID clades were descendants of WE1. They estimated that one of the three nucleotides of viral RNA was mutated over a 12-month pandemic.
Pandemic story
They analyzed the top 100 mutations and created a lineage-based family tree. The story begins with the first hypothetical case, presumed to be a patient of the ancestral SARS-CoV-2 genotype and assumed to be present on November 17, 2019. This has resulted in more infections. By January 1, 2020, the South China Market was closed and 19M-type genome samples were recorded.
However, type M has already been cultivated on the market for several weeks and currently occupies most of the genome belonging to type M. Wuhan City was closed on January 23, 2020 due to the spread of outbreaks throughout Wuhan City, and 80% of the viral genome was M-type. However, the Chinese New Year has already prompted large-scale round-trip trips to Wuhan, leading to the outbreak of Chinese and global COVID-19.
By April 7, 2020, more than 80% of cases worldwide were M-type, but by September 70% belonged to WE1 in three clades, GR, G and GH. The increase in type M continued, accounting for about 98% of cases by December 25, with nearly 90% being caused by the WE1 strain.
Importance of research
Researchers conclude that it exploded around the world, starting with a single Superspreader incident and following the first few weeks when the M type passed unrecognized and uncontrolled. The M type first acquired two co-mutations, another four defined mutations that led to the emergence of the WE1 strain, and finally the other three led to the WE1.1 strain. The rate of viral evolution with about 27 replacements per year is not uncommon, but the mechanism is still unknown.
Of the two new mutants of interest, both the D614G point mutation and the N501Y mutation in the receptor binding domain Spike protein, Is considered to be more contagious than ancestral strains. The former was first recorded in Western Europe in February 2020 and now accounts for about 90% of the strain, while the latter was first discovered in New York City on April 21, 2020 and only 0.02% of cases. Does not occupy.
Researchers warn that the study cannot distinguish between neutral and adaptive mutations. However, they say that the genotype of this virus acts as a unique identifier, helping to track the virus’s transmission pattern backwards and reveal its spread pattern forwards. They point out that their algorithms may help correlate the viral genome with genotypes. New genotypes can also be incorporated into them as they occur to further improve their performance.
“”This study not only provides an unprecedented window into the global propagation orbit of SARS-CoV-2 in the early stages, but also reveals subsequent expansion patterns of the pandemic... “
Therefore, large-scale genomic sequencing is very helpful in tracking such patterns of outbreaks caused by new pathogens and develops rapid measures to contain them in the worst-affected areas. Useful for.
*Important Notices
bioRxiv Publish preliminary scientific reports that should not be considered definitive as they are not peer-reviewed, guide clinical practice / health-related behaviors, and should not be treated as established information.
