Health
Profiling Covid-19 Patients on Severity Level: An Integrated Statistical Approach
Study population
The COVID-BioB study, a single-center prospective observational cohort, was conducted at the IRCCS San Raffaele Scientific Institute in Milan, Italy.
A full description of patient management and clinical protocols was previously published7.
Patients who were admitted to our hospital between 2 March and 25 April 2020 and had at least one plasma creatinine level measured during SARS-CoV-2 infection were included in this study. A total of 392 consecutive patients were included in the current analysis. All his patients aged 18 years or older who were admitted to the IRCCS San Raffaele Scientific Institute and had confirmed SARS-CoV-2 infection were consecutively enrolled in the COVID-BioB study. Diagnosis of SARS-CoV-2 infection is based on positive real-time reverse-transcriptase polymerase chain reaction (RT-PCR) from nasopharyngeal and/or throat swabs or strong clinical and radiological suspicion of Covid-19 pneumonia. was7.
This study was approved by the IRCCS San Raffaele Hospital Ethics Committee (protocol number 34/int/2020) and registered with ClinicalTrials.gov (NCT04318366).
Written informed consent was obtained prior to data collection for those patients able to provide signed informed consent upon admission. Otherwise, patients consented as soon as they were able to sign. This study is reported in compliance with the STROBE statement.8.
All methods were performed in accordance with relevant guidelines and regulations.
Data collection and definition
Data were collected from chart review and entered into a dedicated COVID-BioB Research Electronic Case Record Form (eCRF). Demographic characteristics, laboratory data, and medications were extracted from electronic medical records. Prior to analysis, data were cross-checked against charts and verified for accuracy by data managers and clinicians.The clinical observation start date (baseline) was defined as the date of admission to the emergency department7.
Baseline serum creatinine was defined as the most recent creatinine value available in the past 6 months (21%) in stable clinical status, or the last value available at previous hospital discharge. For subjects who died, the lowest creatinine value during hospitalization after SARS-CoV-2 infection was chosen. Acute renal failure (AKI) was defined as a 50% increase in serum creatinine from baseline according to KDIGO criteria.9Patients were defined as having hypertension (HYP) if their history reported hypertension or if they were chronically treated with at least one antihypertensive drug.
yurt2/FiO2 It was used as an index of severity of dyspnea. Recoded into 5 classes defined by SOFA scores6,Ten,11 (S1 table).
statistical method
We applied standard and advanced statistical methods to identify factors associated with increased/decreased risk of in-hospital mortality and severity of dyspnea expressed as the SOFA score. In addition to the standard Cox regression model, survival tree (ST) analysis was implemented and within a data-driven approach he identified risk factors associated with Covid-19 outcome. This procedure allows us to profile patients with different risks of in-hospital mortality and to disentangle the role of highly dependent covariates in risk stratification.The iterative ST algorithm selects the best predictors with optimal thresholds, aiming to identify homogeneous subgroups of patients with similar survival outcomes12To account for overfitting, a constraint was imposed by fixing the minimum number of observations for any terminal node to 20 during the tree construction phase. Following ST analysis, the Kaplan-Meier method was used to estimate overall survival for each risk profile, and a log-rank test was applied to compare survival between defined patient groups based on risk profile. Did12.
Variables used in the Cox regression model and ST analysis were demographic characteristics and acceptance values. [i.e., age, sex, BMI, C-reactive protein (CRP) and creatinine]comorbidities [i.e., coronary artery disease (CAD), diabetes, chronic obstructive pulmonary disease (COPD), malignancy (NPL), mean arterial blood pressure (MBP), AKI, HYP]decrease in posthospital antihypertensive therapy and severity of dyspnea (expressed as PaO)2/FiO2 and SOFA scores).
Utilizing ST analysis, we applied this procedure to identify cut-off values for SOFA scores to obtain a binary version of the score itself used in continuous analysis.
To investigate the impact of comorbidities on the severity of dyspnea, we used the same covariates as before (i.e., age, sex, BMI, CRP creatinine, CAD, diabetes, COPD, NPL, AKI, MBP, HYP and posthospitalization). reduction of antihypertensive therapy). A stepwise variable selection procedure was applied to identify a smaller set of relevant predictors.
Another data-driven approach was used within the classification and regression tree (CART) methodology to identify profiles of patients with different risks of respiratory distress based on the same covariates input into the logistic regression model. . partitioning. The minimum number of observations at each terminal node was set to 20 to ensure sufficient observations at the nodes to properly perform further analysis.
Finally, we also used a Bayesian network (BN) approach to investigate the dependency structure among all variables included in the CART analysis. A BN is a probabilistic graphical model that shows the relationships between variables by means of a set of nodes that are variables and the arcs that represent the relationships between them. Disconnected nodes represent variables that are conditionally independent of each other. The arc directionality is such that the directed cycle is not included in the graph. A BN is therefore considered a directed acyclic graph (DAG), and the parameters of the model represent the conditional probability distribution of each node for each combination of values of the previous nodes.13.
The purpose of using BN in this work is to learn the dependence structure directly from the data, but to filter out the infeasible directions of the variables. The network structure has been estimated from the data by a hill-climbing algorithm using the Akaike Information Criterion score function.
This approach is essential to uncovering the complex interrelationships between variables and gaining better insight into the mechanisms involved in Covid-19 disease progression. Whereas the CART analysis reports the best predictors and best splits and allows us to classify patients based on the outcome, the BN approach emphasizes how variables are related across multivariate structures before. are reported here to consolidate the analysis of Furthermore, the BN allows better interpretation of the results obtained from multivariate logistic and CART models, allows for in-depth investigation of the role of several covariates on the results, and suggests that the conditioning of one or more variables is the network Evaluate how it affects and propagates to other variables in .
Setting the value of one or more variables in the network updates the conditional probability distribution to reflect it. This update is known as evidence propagation. Based on the estimated network, you can explore different possible scenarios by inserting and propagating new evidence in one or more variables throughout the network. Various diagnostic checks were performed to explore the impact of the evidence on the distribution of the target variable using ‘what-if’ sensitivity scenarios.14.
Risk was reported as hazard ratio (HR) or odds ratio (OR) with 95% Cis (Wald calculation). AP values < 0.05 were considered significant. All analyzes were performed using the R statistical software (version 4.0.4; https://cran.r-project.org/index.html). We implemented ST and CART analysis using the R package rpart. This procedure applies the Leblanc-Crowley splitting rule. R package bnlearn15 and grain16 It was used to train the network and perform the inference required to compute conditional probabilities.
