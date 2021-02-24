



The new database collects personal-level COVID-19 data, such as travel history and when a person first feels sick.Credit: RMB / Chugoku Shimbun / Getty

The vast international database released today will help epidemiologists answer burning questions about the coronavirus SARS-CoV-2. For example, how quickly new variants spread among people, whether vaccines prevent them, and how long immunity to COVID-19 lasts.

Unlike the Global COVID-19 Dashboard maintained by Johns Hopkins University in Baltimore, Maryland, and other popular trackers listing global COVID-19 infections and deaths, data called Global.health. A new repository of science initiatives collects an unprecedented amount of anonymized information for individual cases in one place. The database contains up to 40 relevant variables for each individual, including the date when the COVID-19 symptom first occurred, the date of a positive test, and travel history.

Such personal-level data provides epidemiologists with the clues they need to determine how the disease is spreading, says Kaitlin Rivers, an epidemiologist at Johns Hopkins, part of the project. .. By the time she understands the significance of the outbreak, she says it is often too late. The data can close that loop and speed up the process.

Researchers hope that the database will help monitor coronavirus variants and vaccines in the coming months and provide templates for tracking real-time data in future epidemics.

The repository was created by 21 researchers from seven academic institutions in the United States and Europe, with technical and financial support from Google and the Rockefeller Foundation. So far, the team has collected information from 24 million cases in approximately 150 countries.

At launch, the Global.health website provided data visualizations such as this map showing the partial distribution of COVID-19 infections in the repository. Users can click the Variants button to see where these specific viruses are reported. Dark blue indicates that there are many cases in the repository, and light blue indicates that there are few cases. Credit: Global.health

Rivers adds that such databases were useful early in the outbreak of SARS-CoV-2. Epidemiologists may have been able to confirm the frequent spread of the coronavirus from person to person in China before the World Health Organization confirmed the coronavirus on January 23, last year.

Some scientists say the emergence of comprehensive, international and publicly available repositories fuels research in several aspects. Robert Gary, a virologist at Tulane University in New Orleans, Louisiana, says this is really good and needs to be done. It’s so difficult that there is no such thing.

Collective effort

With each outbreak, epidemiologists collect newspaper articles and information drawn from health agencies and organize them into homemade spreadsheets. Details such as a person’s symptoms, age, and potential for infection can help researchers identify the cause of the disease, its contagiousness, and its mortality rate.

By mid-January 2020, epidemiologists had done this for SARS-CoV-2, but had not reached an agreement on their discovery. Epidemiologist Sam Scarpino, who directs the Emergent Epidemics Lab at Northeastern University in Boston, Massachusetts, tweeted that evidence did not confirm persistent human-to-human transmission. And he remembers Rivers replying to him with a direct message: she said, hey, I think you’re wrong.

The data was still ambiguous. However, another epidemiologist, Moritz Kraemer of Oxford University, created his own Google Sheets and shared it with the community. Scarpino analyzed the numbers and confirmed that Rivers was correct.

Soon, dozens of epidemiologists added information from cases around the world to their spreadsheets. At the same time, they and others were analyzing it. For example, London School of Economics and Tropical Medicine epidemiologist Adam Kucharski and his colleagues used the data to show about 10 times more people in Wuhan, China, who showed COVID-19 symptoms in January. I presumed that there was. Confirmed by health authorities, based in part on the number of people who have been confirmed to be infected abroad 1.

After exceeding about 100,000 cases, the original spreadsheet was overloaded. In April, the team received support from engineers and product developers at Google.org, a charitable division of Google and Silicon Valley. Together, they put together a single cloud of computer code that automatically uploads daily coronavirus data from about 60 governments in a standardized format, code that removes duplicate entries, and information added from around the world. I have created an algorithm to merge into the base repository.

Privacy priority

Anyone can register to access up to 8 gigabytes of anonymized data in the latest version of the Global.health database. Scarpino says that half of the 24 million cases collected have data for 12 variables, and about 10% have more data. Currently, the visualization of website data is limited to maps that display the data collected by the team. Scarpino states that infographics are not the focus because they prioritized standardization of data collection and navigation of privacy issues so that people around the world could add them to the database. The project architect consulted with legal and ethics experts on how to securely process and share anonymized data about individuals. These data are often tightly protected by government agencies, universities and hospitals.

Julien Riou, an epidemiologist at the University of Bern, Switzerland, looks forward to exploring the database. So far, he has done much of his COVID-19 research based on data from the Swiss cohort, but a deep international dataset provides basic questions such as true infection rates in countries around the world. He says he can provide a better answer to. More data means we can get closer to the truth, he says. Other researchers agree, and information about a person’s vaccine status and whether they are infected with a variant of the coronavirus may help answer the pressing scientific questions about immunity in the coming months. I add.

Kucharski welcomes project funding. Many of these databases are crowdsourced, but if you rely solely on volunteers, he says, it’s often not sustainable.

Scarpino hopes to eventually extend the COVID-19 database to adaptable platforms to investigate other diseases, especially the next epidemic. However, to do this, companies, nonprofits, or other places previously tracked health data in Syria, but now in 12 countries after being sold to a data company. He says, this can’t be a pot flash.

