



Extraction of Google Trends and COVID-19 data

All of the following data processing and analysis was performed using R 3.5.2 (R Foundation for Statistics Computing, Vienna, Austria). Unless otherwise stated, statistical levels below 0.05 are considered significant. COVID-19 data and Google Trends (GT) data are available in nine different regions of Japan (JP) and eight English-speaking countries: Australia (AU), Canada (CA), United Kingdom (GB) and Ireland (GB). Was analyzed separately in. IE), India (IN), Singapore (SG), United States (US), and South Africa (ZA).

Three-year (October 1, 2017, October 25, 2020) time-series GT trend data for all categories regarding keywords of symptoms that may be related to COVID-19 using the R package gtrendsR. You have been queried. [17].. Individual queries were run for each keyword in all nine regions. The search keywords are defined as shown in Table 1. 54 English keywords were used in the search in eight English-speaking country regions, and the corresponding 60 Japanese keywords (listed in Additional File 1) were used in the search in Japan. The data obtained is a weekly relative search volume for each keyword, with the maximum value during the included period normalized to 100%. When the relative search volume was less than 1%, it was estimated to be 0%.

Table 1 Search Google Trends with included English and Japanese keywords

For COVID-19 data on the number of consecutive positive cases per day since January 22, 2020, see the Web database (https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases, I downloaded the data from (access destination). October 30, 2020) Provided by the United Nations Humanitarian Coordination Authority. Since we did not include the number of positive cases from mainland China, we considered the number of COVID-19 cases before January 22, 2020 to be zero (even in 20172019). The daily case data for COVID-19 was converted to weekly serial data with reference to the GT weekly trend data above.

Pretreatment and analysis

The keyword weekly trend data has been further processed as shown in Figure 1. Figure 1A (top line) is the original GT timeline for three years of chest pain in the United States region. The sequence is processed using R package statistics, removing seasonality (1 year level) and general trends from the original series, and keyword trends analyzed by the remaining random series (Figure 1A, bottom row). Used as data. [11].. The resulting series were then evaluated by the Augmented Dickey Fuller (ADF) test using the R package t series. [18] To find out if the sequence is stationary (Figure 1B). If the sequence was not considered quiesced, the sequence was further diffed and the delta sequence was quiesced (as confirmed again in the ADF test).

Figure 1

Overview of preprocessing flow. The sequence was processed to remove seasonality (1 year level) and general trends from the original series, and the remaining random series (bottom row) was used as keyword trend data. Next, the obtained series was evaluated by the ADF test to examine its stationarity (B). The VAR model (C) was then used to analyze the temporal relationship between the processed sequence of each single keyword and the COVID-19 weekly positive data. Next, we evaluated whether the Keyword Trend Granger causes a COVID-19 positive trend (D).

Next, we analyzed the temporal relationship between the processed sequence of each keyword and the weekly positive data of COVID-19 using a VAR model. [11, 12] (Figure 1C), using R package variables [19].. The weekly positive trend data for COVID-19 was not really constant on its own, so its differential sequence was attributed to the VAR analysis. Appropriate lag was determined from 18 lag order ranges based on Akaike’s Information Criterion, one of the most frequently used methods. [20].. Based on the period of increase / decrease in the number of effective plays of COVID-19, the lag of 2 months or more for predicting COVID-19 positive by keyword trend may be too long in fact, so the lag range of 18 weeks is set. I used it.Illness momentum [21, 22].. The following equation (AB) shows an example of the VAR model (lag order = 1) used in this study.

A)

({Y} _ {1, t} = {c} _ {1} + left ({ phi} _ {11} {Y} _ {1, t-1} + { phi} _ {12 } {Y} _ {2, t-1} right) + { varepsilon} _ {1, t} )

B)

({Y} _ {2, t} = {c} _ {1} + left ({ phi} _ {21} {Y} _ {1, t-1} + { phi} _ {22 } {Y} _ {2, t-1} right) + { varepsilon} _ {2, t} )

Where ({Y} _ {1} ) is weekly COVID-19 positive for each country and ({Y} _ {2} ) is in the relative search volume of one keyword of interest. Weekly Google Trends. Same country. Therefore, the VAR model is obtained for every keyword in each country.

The resulting VAR model was then used to assess whether the Keyword Trend Granger caused a COVID-19 positive trend. [11, 12] (Fig. 1D). This meant that changes in keyword trends could actually predict future changes in COVID-19 positive trends. The causality here was just statistical and did not require a true causality mechanism between the two trends. One p-value was obtained for the Granger causality of one keyword for the COVID-19 trend, and a Granger causality analysis was performed for all keywords. Adjusted multiple tests using Benjamini-Hochberg (BH) method [23] Within a country-by-country group. The BH method adjusts false discovery rate (FDR). It has a lower risk of false positives than raw p-values ​​and is more powerful than the strictest Bonferroni method.

In addition, for reference, we also calculated the Pearson correlation and Spearman’s rank correlation between raw GT keyword trends and weekly positive trends for COVID-19, similar to previous GT-based COVID-19 studies. The correlation p-value was similarly adjusted by the BH method.

Incorporate media coverage trends

Next, we evaluated the media coverage of the GT keywords obtained in terms of the weekly positive trends of COVID-19 and the statistically reliable temporal relationship. Due to the lack of available data, we could only analyze media coverage trends for these keywords in the Japanese region. The week when the title was published after reviewing Nikkei Telecom (http://telecom.nikkei.co.jp), a large-scale database in Japan that covers newspapers, TV news, Internet news, and general magazines published in Japan. We measured the number of articles published for each. / abstract / manual contained the identified Japanese keywords. Professional journals have been excluded from the reviews of reviewed publications because they may have less exposure to the general public. The timeline of the number of weeks of articles containing the obtained keywords was used as a trend in Japanese media coverage. We then reassessed whether the identified GT keyword trends caused weekly positives for COVID-19, even when adjusted for keyword simultaneous media coverage trends. This partial Granger causality analysis was performed using the R package FIAR. [24]..

ethics

This study was approved by the Institutional Review Board of the Graduate School of Medicine, the University of Tokyo (ID: 11,628- (3)). The data was publicly distributed, so informed consent was not required. This study was conducted in accordance with the ethical standards set out in the 1964 Declaration of Helsinki.

