Health

Predicting three-month fasting blood glucose and glycated hemoglobin changes in patients with type 2 diabetes mellitus based on multiple machine learning algorithms

Study design and data source

The data of this study were obtained from the Public Health Service System and the Medical Record Homepage Management System of the Health Information Center of Sichuan Province, China (including personal basic information form, health check-up form, and follow-up service record form), and the overall data were derived from patients who received anti-diabetic drugs or had the International Classification of Diseases Tenth Revision (ICD-10) code¹⁵ for type 2 diabetes between January 2015 and December 2020.

A total of 375,723 T2DM patients’ related diagnosis and treatment data were collected in this study, and the available data for constructing the FBG prediction model and the HbA1c prediction model were screened according to the following criteria: if the same patient had 2 or more registration data within 3 ± 1 months, the patient’s data was available; If there were multiple sets of data within 3 ± 1 months, the data closest to 3 months from the baseline should be taken; if there were multiple sets of consistent data longitudinally for the same patient (a patient had multiple sets of data that met the requirements of having 2 or more data within 3 ± 1 months), a group was randomly selected for inclusion. Determination of glycemic control status: to facilitate the development of a predictive model, this study converted the continuous measurements of FBG and HbA1c into discrete categorical outcomes. According to the relevant guidelines¹⁶, the FBG threshold range was defined as [4.4–7.0]. A value of 1 was assigned to patients whose FBG values fell within the well-regulated range (FBG: 4.4–7.0), while a value of 0 was assigned to those outside this range. Similarly, the HbA1c threshold was set at 7%. A value of 1 was attributed to patients who achieved controlled HbA1c levels (HbA1c < 7), and a value of 0 was assigned to those who did not achieve the desired HbA1c control (HbA1c ≥ 7).

The data included the patient’s basic information, drug use, test indicators, and living and diet, as well as the actual follow-up of the patient after treatment. This study used a unique ID to identify patient connection information, and all research operations carried out would not be traced to the individual patient, and the patient’s sensitive personal information (such as name, phone number, address, work unit, responsible doctor, etc.) would be deleted. All files were encrypted during transmission and use, and documents were received by a password. This study has passed the ethical review, the approval document in Supplementary Fig. 1.

In this study, a total of 511 variables were included, which were named X1–X511 for statistical convenience (Detailed variables are shown in Supplemental Table 1). Data analysis was performed using named variables, and the variable names were restored after the model evaluation process was complete.

Table 1 Baseline characteristics of participants.

Data cleaning

We deleted variables with a missing ratio of 90%, a single category ratio of 90%, and variables with a coefficient of variation less than 0.1. These variables had little impact on the establishment of the model, and the analysis was meaningless, so they were deleted. Two methods were used for inputting missing data: not inputting and modified random forest inputting . After the data were inputted, if there was a large difference between the positive and negative sample sizes, the data was balanced by sampling. And we modified outliers to the maximum or minimum value of the norm.

The method of “not inputting” was to delete the missing columns and the missing rows in the data in turn, and finally, we got the data without missing values. The “modified random forest inputting” meant that by continuously introducing the inputted columns into the model, as the amount of data continued to accumulate, the obtained values had a higher accuracy rate, which could achieve a more accurate prediction of missing values.

After the data were inputted, the data were divided into training and test sets for machine learning. And the number of training sets accounted for 80% of the total sample size, and the number of test sets accounted for 20% of the total sample size.

Feature screening

The data were screened using three methods: Not screening, Lasso screening, and Boruta screening. Feature screening is an important aspect of model building, which helps to exclude relevant variables, biases, and limitations of unnecessary noise, making the final analysis results closer to reality. Lasso are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators by inputting all latent variables at the same time, reducing bias caused by unimportant variables, and selecting only the most important variables from a potentially large initial pool¹⁷. Boruta screening is also a popular method at present¹⁸. It uses the random forest algorithm to extract feature variables, disrupt the sequence of feature variables, and calculate the importance of feature variables¹⁹.

Model training

16 kinds of machine learning algorithms were used for model training, and the data after feature screening were modeled respectively. The specific machine learning algorithm models used included: Logistic regression, Decision Tree, Random Forest, Extra Tree, Stochastic Gradient Descent (SGD), Gaussian Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), Passive Aggressive, AdaBoost, Bagging, Gradient Boosting, XGBoost, and Ensemble Learning (The introduction and comparison of various machine learning algorithms are detailed in the references^20,21). In 2011, Tianqi Chen and Carlos Guestrin first proposed the XGBoost algorithm. It is a machine learning model that achieves stronger learning effects by integrating multiple weak learners²², and has better flexibility and scalability. Compared with general machine learning algorithms, the XGBoost model shows strong advantages. These machine learning algorithms have their strengths, among which, the ensemble learning model is an evaluation index based on the trained model, summarizing the best model and outputting according to the voting principle. The evaluation indicators of the prediction model included Area Under Curve (AUC), Accuracy, Precision, Recall, and F1 Score. According to the machine learning results, the 5 models with the best prediction performance were selected and their receiver operating characteristic curve (ROC) and P-R curves were drawn.

Model verification

Ten-fold cross-validation and bootstrapping sampling were used to verify the impact of different preprocessing algorithms and different machine learning algorithms on the prediction of building FBG and HbAlc models²³. The model with the largest AUC was selected and constructed using 10 subsets (randomly drawn 10%–100% of the total sample size) to assess the effect of different sample sizes on predictive power. Each subset was split 4:1 into a training set and a test set, and the AUC calculated from the test set was used for sample size checking. By transforming randomly sampled data, 10 independent replicates were generated for each model.

A process framework of the data flow is shown in Fig. 1. Data flowed through each node according to a predetermined schedule.

Statistical analysis

Continuous variables were expressed as mean ± standard deviation, and count variables were expressed as frequency. Differences between quantitative data were tested using a t-test and rank test. Hypothesis testing was used to investigate the influence of different data processing methods and algorithms on the model prediction performance. On the analysis results of bootstrapping sampling and validation set, hypothesis testing single factor analysis was performed. The analysis content included different data inputting methods, feature screening methods, and the corresponding mean ± standard deviation and 95% confidence interval between the three dimensions of the machine learning model and the five evaluation indicators (AUC, Accuracy, Precision, Recall, and F1 Score) and p-value.

Excel 2016 was used for summarizing data, and all statistical analyses were performed using Python 3.8.

The transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement for prediction model development is in Supplemental Table 10.

Sources

1/ https://Google.com/

2/ https://www.nature.com/articles/s41598-023-43240-5

The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: cgurgu@internetmarketingcompany.BizWebsite: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos

to request, modification Contact us at Here or collaboration@support.exbulletin.com

ExBulletin

Predicting three-month fasting blood glucose and glycated hemoglobin changes in patients with type 2 diabetes mellitus based on multiple machine learning algorithms

Health

Predicting three-month fasting blood glucose and glycated hemoglobin changes in patients with type 2 diabetes mellitus based on multiple machine learning algorithms

Study design and data source

Data cleaning

Feature screening

Model training

Model verification

Statistical analysis

Related

Recent Posts

Add ExBulletin to your Home screen!