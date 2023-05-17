In a recent study published in natural microbiologyAt , researchers used shotgun sequencing to extract human reads from deoxyribonucleic acid (DNA) in stool samples from 343 Japanese individuals, which comprised the primary dataset of this study.

They used this gut metagenomic data to reconstruct personal information. Some study participants also provided whole-genome sequencing (WGS) data for ultra-deep metagenomic shotgun sequencing analyses.

study: Reconstruction of personal information from the human genome reads intestinal metagenomic sequence data. Image credit: KaterynaKon/Shutterstock.com

Background

Our knowledge of the human microbiome, the microorganisms that inhabit the human body, has expanded significantly over the past decade thanks to rapid advances in technologies such as metagenomic shotgun sequencing.

This technique enables sequencing of non-bacterial components of microbiome samples, including host DNA. For example, fecal samples contain less than 10% host DNA, but are removed to protect donor privacy.

Human germline genotypes in metagenomic data are important for enabling re-identification of individuals. However, researchers and donors should be aware that this is highly sensitive and should be carefully considered before being shared with the community.

Aside from the ethical concerns about sharing this data, if human reads of metagenomic data were not removed prior to deposit, this data could be useful in the recovery of any kind of personal information (such as gender or ancestry). You need to understand what there is.

In addition, human gut metagenomic data reads may be an excellent resource for stool-based forensics, robust variant calling, and estimation of disease risk (e.g., type 2 diabetes) based on polygenic risk scores. there is.

This data may be useful in quantitatively and accurately reconstructing genotypic information and thus may complement human WGS data.

About research

In the present study, the researchers applied several human reads of gut metagenomic data from the main research dataset to reconstruct personal information, including genetic sex and ancestry. To predict the genetic sex and ancestry of these 343 individuals, they used sex chromosome sequence-depth and modified likelihood score-based methods, respectively.

In addition, researchers have developed methods to re-identify individuals from genotype datasets. Furthermore, they combined his two harmonized genotype calling approaches, direct calling of rare mutations and two-step imputation of common mutations, to reconstruct genotypes.

The study’s primary dataset included 343 Japanese participants, while the validation dataset for the genetic sex prediction analysis included 113 Japanese.

A multi-ancestry dataset, which helped researchers validate their ancestry prediction analysis, consisted of 73 individuals of different nationalities, including samples from individuals in New Delhi, India.

There were 196 and 147 female participants, 65 and 48 male participants, and 25 and 48 male participants in each dataset. Similarly, the age ranges for these three datasets were 20–88, 20–81, and 20–61, respectively.

Results and conclusions

Given that reads for the human gut metagenomic data were consistently obtained from all chromosomes, the read depth of the X chromosome was nearly double that of the Y chromosome in males and nearly twice that of the X chromosome in females. .

So, in a logistic regression analysis, the researchers applied a read depth ratio of 0.43 Y:X chromosomes to the validation dataset, which correctly predicted the genetic sex of 97.3% of the study samples.

In human microbiome and genetic studies, the feasibility of gender prediction using human gut metagenomic data may help eliminate mislabeled samples.

This study analysis also helped researchers predict a staggering 98.3% of individuals’ ancestry using the 1,000 Genomes Project (1KG) data as a reference.

However, the likelihood score-based method often misclassified South Asian (SAS) samples as American (AMR) and European (EUR), especially when the number of human reads was low. This is not surprising, as the genetic diversity of SAS populations is complex.

The likelihood score-based method also efficiently utilized data from genomic regions with low coverage, demonstrating the quantitative power of gut metagenomic data for re-identifying individuals, with 93.3% of individuals Successful re-identification.

Despite ethical concerns, the re-identification method used in this study may be useful for quality control of multi-omics datasets containing gut metagenomic and human germline genotype data.

Furthermore, the authors successfully reconstructed common mutations across the genome using a genomic approach. Historically, researchers used stool samples as a source of germline genomes for wild and domestic animals, but not humans.

Therefore, further development of suitable methodologies could help to efficiently utilize the human genome within gut metagenomic data and benefit animal studies.

Nonetheless, this study remarkably demonstrated that the optimized method could be useful in reconstructing personal information from human gut metagenomic data reads.

Furthermore, the results of this study may serve as a guiding resource for devising best practices for using the already accumulated human gut metagenomic data.