Health
AI-based differential diagnosis of dementia etiologies on multimodal data
Study population
We collected demographics, personal and family history, laboratory results, findings from the physical/neurological exams, medications, neuropsychological tests, and functional assessments as well as multisequence magnetic resonance imaging (MRI) scans from 9 distinct cohorts, totaling 51,269 participants. All participants or their designated informants provided written informed consents. All protocols received approval from the respective institutional ethical review boards of each cohort. There were 19,849 participants with NC, 9,357 participants with MCI and 22,063 participants with dementia. We further identified 10 primary and contributing causes of dementia: 17,346 participants with AD; 2,003 participants with dementia with LBD and PD (LBD); 2,032 participants with vascular brain injury or VD including stroke (VD); 114 participants with Prion disease including Creutzfeldt-Jakob disease (PRD); 3,076 participants with frontotemporal lobar degeneration (FTD) and its variants, which includes corticobasal degeneration (CBD) and progressive supranuclear palsy (PSP), and with or without amyotrophic lateral sclerosis (FTD); 138 participants with normal pressure hydrocephalus (NPH); 808 participants with dementia due to infections, metabolic disorders, substance abuse (including alcohol, medications), delirium and systemic disease, a category termed as systemic and external factors (SEF); 2,700 participants with psychiatric diseases, including schizophrenia, depression, bipolar disorder, anxiety and posttraumatic stress disorder (PSY); 265 participants with dementia due to traumatic brain injury (TBI); and 1,234 participants with dementia due to other causes, which include neoplasms, multiple systems atrophy, essential tremor, Huntington’s disease, Down syndrome and seizures (ODE).
The cohorts include the National Alzheimer’s Coordinating Center (NACC) dataset (n = 45,349)41, the ADNI dataset (n = 2,404)48, the FTD neuroimaging initiative (NIFD) dataset (n = 253)46, the Parkinson’s Progression Marker Initiative (PPMI) dataset (n = 198)45, the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) dataset (n = 661)43, the Open Access Series of Imaging Studies-3 (OASIS) dataset (n = 491)42, the 4 Repeat Tauopathy Neuroimaging Initiative (4RTNI) dataset (n = 80)44 and three in-house datasets maintained by the Lewy Body Dementia Center for Excellence at Stanford University (LBDSU) (n = 182)47 and the FHS (n = 1,651)49. Since its inception in 1948, FHS has been dedicated to identifying factors contributing to cardiovascular disease, monitoring multiple generations from Framingham, Massachusetts. Over time, the study has pinpointed major cardiovascular disease risk factors and explored their effects while also investigating risk factors for conditions like dementia and analyzing the relationship between physical traits and genetics. Additional details on the study population are presented in Tables 1 and S1.
Inclusion and exclusion criterion
Individuals from each cohort were eligible for study inclusion if they were diagnosed with NC, MCI or dementia. We used the NACC dataset41, which is based on the Uniform Data Set (UDS) 3.0 dictionary68, as the baseline for our study. To ensure data consistency, we organized the data from the other cohorts according to the UDS dictionary. For individuals from the NACC cohort who had multiple clinical visits, we initially prioritized the visits at which the person received the diagnostic label of dementia. We then selected the visit with the most data features available prioritizing the availability of neuroimaging information. If multiple visits met all the above criteria, we chose the most recent visit among them. This approach maximized the sample sizes of dementia cases and ensured that each individual had the latest record included in the study while maximizing the utilization of available neuroimaging and non-imaging data. We included participants from the 4RTNI dataset44 with FTD-related disorders like PSP or CBS. For other cohorts (NIFD46, PPMI45, LBDSU47, AIBL43, ADNI48 and OASIS42), participants were included if they had at least one MRI scan within 6 months of an officially documented diagnosis. From the FHS49, we used data from the Original Cohort (Gen 1) enrolled in 1948 and the Offspring Cohort (Gen 2) enrolled in 1971. For these participants, we selected available data including demographics, history, clinical exam scores, neuropsychological test scores and MRI within 6 months of the date of diagnosis. We did not exclude cases based on the absence of features (including imaging) or diagnostic labels. Instead, we used our innovative model training approach to address missing features or labels (see below).
Data processing and training strategy
Various non-imaging features (n = 391) corresponding to subject demographics, medical history, laboratory results, medications, neuropsychological tests and functional assessments were included in our study. We combined data from 4RTNI, AIBL, LBDSU, NACC, NIFD, OASIS and PPMI to train the model. We used a portion of the NACC dataset for internal testing, whereas the ADNI and FHS cohorts served for external validation (Tables 1 and S1–S5). We used a series of steps such as standardizing the data across all cohorts and formatting the features into numerical or categorical variables before using them for model training. We used stratified sampling at the person-level to create the training, validation and testing splits. As we pooled the data from multiple cohorts, we encountered challenges related to missing features and labels. To address these issues and enhance the robustness of our model against data unavailability, we incorporated several strategies such as random feature masking and masking of missing labels (see below).
MRI processing
Our investigation harnessed the potential of multisequence magnetic resonance imaging (MRI) volumetric scans sourced from diverse cohorts (Table S6). Most of these scans encompassed T1-weighted (T1w), T2-weighted (T2w), diffusion-weighted imaging (DWI), susceptibility-weighted imaging (SWI) and fluid-attenuated inversion recovery (FLAIR) sequences. The collected imaging data were stored in the NIFTI file format, categorized by participant and the date of their visit. The MRI scans underwent a series of pre-processing steps involving skull stripping, linear registration to the MNI space and intensity normalization. Skull stripping was performed using SynthStrip69, a computational tool designed for extracting brain voxels from various image types. Then, the MRI scans were registered using FSL’s ‘flirt’ tool for linear registration of whole brain images70, based on the MNI152 atlas71. Before linear registration to the MNI space, we used the ‘fslorient2std’ function within FSL to standardize the orientation across all scans to match the MNI template’s axis order. As a result, the registered scans followed the dimensions of the MNI152 template, which are 182 × 218 × 182. Finally, all MRI scans underwent intensity normalization to the range [0,1] to increase the homogeneity of the data. To ensure the purity of the dataset, we excluded calibration, localizer and 2D scans from the downloaded data before initiating model training. Consequently, as our DWI sequences were acquired in 2D, they were not considered for model training.
Backbone architecture
Our modeling framework harnesses the power of the transformer architecture to interpret and process a vast array of diagnostic parameters, including person-level demographics, medical history, neuroimaging, functional assessments and neuropsychological test scores. Each of these distinct features is initially transformed into a fixed-length vector using a modality-specific strategy, forming the initial layer of input for the transformer model. Following this, the transformer acts to aggregate these vector inputs, decoding them into a series of predictions. A distinguishing strength of this framework lies in its integration of the transformer’s masking mechanism72,73, strategically deployed to emulate missing features. This capability enhances the model’s robustness and predictive power, allowing it to adeptly handle real-world scenarios characterized by incomplete data.
Multimodal data embeddings
Transformers use a uniform representation for all input tokens, typically in the form of fixed-length vectors. However, the inherent complexity of medical data, with its variety of modalities, poses a challenge to this requirement. Therefore, medical data needs to be adapted into a unified embedding that our transformer model can process. The data we accessed fall into three primary categories: numerical data, categorical data and imaging data. Each category requires a specific method of embedding. Numerical data typically encompass those data types where values are defined in an ordinal manner that holds distinct real-world implications. For instance, chronological age fits into this category, as it serves as an indicator of the aging process. To project numerical data into the input space of the transformer, we used a single linear layer to ensure appropriate preservation of the structure inherent to the original data space. Categorical data encompass those inputs that can be divided into distinct categories yet lack any implicit order or priority. An example of this is gender, which can be categorized as ‘male’ or ‘female’. We used a lookup table to translate categorical inputs into corresponding embeddings. It is noteworthy that this approach is akin to a linear transformation when the data is one-hot vectorized but is computationally efficient, particularly when dealing with a vast number of categories. Imaging data, which includes MRI scans in medical applications, can be seen as a special case of numerical data. However, due to their high dimensionality and complexity, it is difficult to compress raw imaging data into a lower-dimensionality vector using a linear transformation while still retaining essential information. We leveraged the advanced capabilities of modern deep learning architectures to extract meaningful imaging embeddings (see below). Once these embeddings were generated, they were treated as numerical data, undergoing linear projection into vectors of suitable length, thus enabling their integration with other inputs to the transformer.
Imaging feature extraction
We harnessed the Swin UNETR (Extended Data Fig. 6)74,75, a three-dimensional (3D) transformer-based architecture, to extract embeddings from a multitude of brain MRI scans, encompassing various sequences including T1w, T2w, SWI and FLAIR imaging sequences. The Swin UNETR model consists of a Swin Transformer encoder, designed to operate on 3D patches, seamlessly connected to a convolutional neural network-based decoder through multi-resolution skip connections. Commencing with an input volume \(X\in {{\mathbb{R}}}^{H\times W\times D}\), the encoder segmented X into a sequence of 3D tokens with dimensions \(\frac{H}{{H}^{{\prime} }}\times \frac{W}{{W}^{{\prime} }}\times \frac{D}{{D}^{{\prime} }}\), and projected them into a C-dimensional space via an embedding layer. It employed a patch size of 2 × 2 × 2 with a feature dimension of 2 × 2 × 2 × 1 and an embedding space dimension of C = 48. The Swin UNETR encoder was subsequently interconnected with a convolutional neural network-based decoder at various resolutions through skip connections, collectively forming a ‘U-shaped’ network. This decoder amalgamated the encoder’s outputs at different resolutions, conducted upsampling via deconvolutions, ultimately generating a reconstruction of the initial input volume. The pre-trained weights were the product of self-supervised pre-training of the Swin UNETR encoder, primarily conducted on 3D volumes encompassing the chest, abdomen and head/neck74,75.
The process of obtaining imaging embeddings began with several transformations applied to the MRI scans. These transformations included resampling the scans to standardized pixel dimensions, foreground cropping, and spatial resizing, resulting in the creation of subvolumes with dimensions of 128 × 128 × 128. Subsequently, these subvolumes were input into the Swin UNETR model, which in turn extracted encoder outputs sized at 768 × 4 × 4 × 4. These extracted embeddings underwent downsampling via a learnable embedding module, consisting of four convolutional blocks, to align with the input token size of the downstream transformer. As a result, the MRI scans were effectively embedded into one-dimensional vectors, each of size 256. These vectors were then combined with non-imaging features and directed into the downstream transformer for further processing. The entire process used a dataset comprising 8,155 MRI volumes, which were allocated for model training, validation and testing (Table S6).
Random feature masking
To enhance the robustness of the backbone transformer in handling data incompleteness, we leveraged the masking mechanism72,73 to emulate arbitrary missing features during training. The masking mechanism, when paired with the attention mechanism, effectively halts the information flow from a given set of input tokens, ensuring that certain features are concealed during prediction. A practical challenge arises when considering the potential combinations of input features, which increase exponentially. With hundreds of features in play, capturing every potential combination is intractable. Inspired by the definition of Shapley values, we deployed an efficient strategy for feature dropout. Given a sample with a feature set S, S is randomly permuted as σ; simultaneously, an integer i is selected independently from the range \(\left[1,| S| \right]\). Subsequent to this, the features σi+1, σi+2, …, σ∣S∣ are masked out from the backbone transformer. It is noteworthy that the dropout process was applied afresh across different training batches or epochs to ensure that the model gets exposed to a diverse array of missing information even within a single sample.
Handling missing labels
The backbone transformer was trained by amalgamating data from multiple different cohorts, each focused on distinct etiologies, which introduced the challenge of missing labels in the dataset. While most conventional approaches involve discarding records with incomplete output labels during training, we chose a more inclusive strategy to maximize the utility of the available data. Our approach framed the task as a multilabel classification problem, introducing thirteen separate binary heads, one for each target label. With this design, for every training sample, we generated a binary mask indicating the absence of each label. We then masked the loss associated with samples lacking specific labels before backpropagation. This method ensured optimal utilization of the dataset, irrespective of label availability. The primary advantage of this approach lies in its adaptability. By implementing this label-masking strategy, our model can be evaluated against datasets with varying degrees of label availability, granting us the flexibility to address a wide spectrum of real-world scenarios.
Loss function
Our backbone model was trained by minimizing the loss function (\({{{\mathcal{L}}}}\)) composed of two loss terms: ‘focal loss (FL)’76 (\({{{{\mathcal{L}}}}}_{{{{\rm{FL}}}}}\)) and ‘ranking loss (RL)’ (\({{{{\mathcal{L}}}}}_{{{{\rm{RL}}}}}\)), along with the standard L2 regularization term. FL is a variant of standard cross-entropy loss that addresses the issue of class imbalance; it assigns low weight to easy (well-classified) instances and employs a balance parameter. This loss function was used for each of the diagnostic categories (a total of 13; Glossary 1). Therefore, our \({{{{\mathcal{L}}}}}_{{{{\rm{FL}}}}}\) term was:
$${{{{\mathcal{L}}}}}_{{{{\rm{FL}}}}}=\frac{1}{N}\sum\limits_{k=1}^{N}\sum\limits_{i=1}^{13}-{y}_{k,i}{\alpha }_{i}{(1-{p}_{k,i})}^{\gamma }\log ({p}_{k,i})-(1-{y}_{k,i})(1-{\alpha }_{i}){({p}_{k,i})}^{\gamma }\log (1-{p}_{k,i}),$$
where N was the batch size (that is, N = 128), and other parameters and variables were as defined. The focusing parameter γ was set to 2, which had been reported to work well in most of the experiments in the original paper76. Moreover, αi ∈ [0, 1] was the balancing parameter that influenced the weights of positive and negative instances. It was set as the square of the complement of the fraction of samples labeled as 1, varying for each i due to the differing level of class imbalance across diagnostic categories (Table 1). The FL term did not take inter-class relationships into account. To address these relationships in our overall loss function, we also incorporated the RL term that induced loss if the sigmoid outputs for diagnostic categories labeled as 0 were not lower than those labeled as 1 by a predefined margin of ϵ, for any training sample k. We defined the RL term for any pair of diagnostic categories i and j, as follows:
$${{{{\mathcal{L}}}}}_{{{{\rm{RL}}}}}^{(i,\,j)}({{{{\bf{p}}}}}_{k},{{{{\bf{y}}}}}_{k})=\max (0,(\,{p}_{k,i}-{p}_{k,\,j})(\,{y}_{k,\,j}-{y}_{k,i})+\epsilon ),$$
Overall, the RL term was:
$${{{{\mathcal{L}}}}}_{{{{\rm{RL}}}}}=\frac{1}{N}\sum\limits_{k=1}^{N}\sum\limits_{i=1}^{13}\sum\limits_{j=i+1}^{13}{{{{\mathcal{L}}}}}_{{{{\rm{RL}}}}}^{(i,\,j)}({{{{\bf{p}}}}}_{k},{{{{\bf{y}}}}}_{k}).$$
Combining all terms, our overall loss function (\({{{\mathcal{L}}}}\)) was:
$${{{\mathcal{L}}}}={{{{\mathcal{L}}}}}_{{{{\rm{FL}}}}}+\lambda {{{{\mathcal{L}}}}}_{{{{\rm{RL}}}}}+\beta \parallel {{{\bf{w}}}}{\parallel }^{2},$$
where λ and β were the weights that controlled the importance of \({{{{\mathcal{L}}}}}_{{{{\rm{RL}}}}}\) and the L2 regularization terms, respectively. The training was done using the mini-batch strategy with the AdamW optimizer77, an improved version of the Adam optimizer78, with a learning rate of 0.001 for a total of 256 epochs. Additionally, we utilized a cosine learning rate scheduler with warm restarts79, initiating the first restart after 64 epochs and extending the restart period by a factor of 2 for each subsequent restart. The values of ϵ, λ, and β were determined to be ϵ = 0.25, λ = 0.005, and β = 0.0005, respectively, based on an evaluation of the overall model performance on the validation set. During training, the model performance was evaluated on the validation set at the end of each epoch, and the model with the highest performance was selected. To demonstrate the effectiveness of the focal loss in compensating for the high class imbalance, the performance of our baseline model was compared against that of a model trained without the focal loss term across all the 13 diagnostic categories (Table S16).
Interpretability analysis
The primary goal of interpretability analysis is to demystify ML models by providing clear insights into how various features influence predictions. Central to this field lies the Shapley value51, originally a game theory concept, now repurposed to evaluate feature significance in ML models. In this context, each instance is considered a unique ‘game’, where features act as players contributing to the outcome. The model’s output is analogous to the game’s payoff, with the Shapley value quantifying each feature’s contribution towards this outcome. However, calculating Shapley values for all possible feature combinations is often computationally infeasible due to the sheer number of features. To overcome this, we applied permutation sampling to approximate Shapley values80, which simplifies computations while maintaining accuracy in estimating feature contributions. We performed Shapley analysis on the NC, MCI and dementia predictions within the NACC test set. We first identified cases for which the model yielded logit values greater than 0. We then selected a subset of 500 cases with the most features available per diagnostic group. Features were subsequently ranked based on their mean Shapley values. To account for data missingness, features that were absent for a case were assigned a zero Shapley value, ensuring their influence was accurately represented. The resulting distribution of Shapley values across features provided insight into their relative importance, with higher values indicating more influence.
Traditional ML models
To assess our model’s ability to classify NC, MCI and dementia cases, we compared its performance with that of the CatBoost model, a tree-based classification framework39,50. Given the variable availability of features across the test cohorts (Tables S2, S4 and S5), we divided the data into two feature subsets. This stratification enabled a comparison with CatBoost, offering insights into our model’s performance using a range of parameters. The first feature subset consisted of variables common across all cohorts, including demographics, MMSE and Boston Naming Test scores. The second subset expanded on this by incorporating additional neuropsychological measures found in the NACC and ADNI cohorts, such as trail making tests A and B, logical memory IIA delayed recall, MoCA scores, and digit span forward and backward tests. We trained separate CatBoost models for each feature set but applied our model to both subsets without retraining, allowing for a consistent evaluation across different feature configurations.
Biomarker validation
The predicted probabilities of the model for various etiologies were cross-validated with established gold-standard biomarkers pertinent to each respective etiology. Both the NACC and ADNI test cohorts were used in AD biomarker analyses, whereas only NACC testing data were used for FTD and LBD analyses due to biomarker availability. In the NACC dataset, binary UDS variables were used to define positivity for amyloid β (Aβ), tau and fluorodeoxyglucose F18 (FDG) PET biomarkers for AD due to varying PET processing methods across centers. Binary UDS variables were also used to define FDG and MRI evidence for FTD, and DaTscan as evidence for LBD. In ADNI, the University of California, Berkeley (UCB) Aβ PET processing pipeline yields Freesurfer-defined cortical summary and reference regions, as well as centiloids (CL). A cutoff value of 20 CL was chosen to define positivity81. For tau, the UCB processing pipeline yields standardized uptake value ratios (SUVr) in Freesurfer-defined regions. A meta-temporal region of interest was constructed following established standards82. A Gaussian mixture model with two components identified 1.74 SUVr as the optimal threshold to separate the two distributions, where values greater than 1.74 indicated tau PET positivity. Finally, the UCB FDG PET processing pipeline yields a meta-region of interest, on which a Gaussian mixture model with two components identified 1.21 SUVr as the best threshold, with values smaller than 1.21 indicating positivity for neurodegeneration. Information regarding the PET processing protocols can be found in the summaries of UCB amyloid, tau, and FDG PET methods available on the LONI Image Data Archive website83.
Neuropathologic validation
The model’s predictive capacity for various dementia etiologies was substantiated through alignment with neuropathological evaluations sourced from the NACC, FHS and ADNI cohorts (Table S12). We included participants who conformed to the study’s inclusion criteria, had a diagnosis close to 3 years before death, and for whom neuropathological data were available. Standardization of data was conducted in accordance with the Neuropathology Data Form Version 10 protocols from the National Institute on Aging84. We pinpointed neuropathological indicators that influence the pathological signature of some dementia etiologies, such as arteriolosclerosis, the presence of neurofibrillary tangles and amyloid plaques, and CAA. These indicators were chosen to reflect the complex pathological terrain that defines each form of dementia. To examine the Thal phase for amyloid plaques (A score), subjects were categorized into two groups: one encompassing Phase 0, indicative of no amyloid plaque presence, and a composite group merging Phases 1-5, reflecting varying degrees of amyloid pathology. The model’s predictive performance was then compared across these groupings. For the Braak stage of neurofibrillary degeneration (B score), we consolidated stages I-VI into a single collective, representing the presence of AD-type neurofibrillary pathology, whereas stage 0 was designated for cases devoid of AD-type neurofibrillary degeneration. With respect to the density of neocortical neuritic plaques, assessed by the (CERAD or C score), individuals without neuritic plaques constituted one group, whereas those with any manifestation of neuritic plaques (sparse, moderate or frequent (C1–C3)) were aggregated into a separate group for comparative analysis of the model’s predictive outcomes. To evaluate model alignment with the severity of CAA, subjects were classified into two groups, one representing the absence of CAA and another encapsulating all stages of CAA severity, ranging from mild to severe. We also evaluated the presence of arteriolosclerosis, underscoring the role of vascular pathology in the progression of AD by decreasing cerebral blood flow and impairing Aβ clearance. Furthermore, to evaluate the model’s concordance with non-AD pathologies, we analyzed the association between the model-generated probabilities of VD with the presence of old microinfarcts and arteriolosclerosis, and FTD with the presence of TDP-43 pathology.
AI-augmented clinician assessments
We aimed to ascertain if our model could bolster the diagnostic prowess of clinicians specializing in dementia care and diagnosis. To this end, a group of 12 neurologists and 7 neuroradiologists were invited to participate in diagnostic tasks on a subset of NACC cases (see ‘Data processing and training strategy’). Neurologists were presented with 100 cases, which included 15 cases each of NC and MCI, and 7 cases for each of the dementia etiologies. The data encompassed person-level demographics, medical history, social history, neuropsychological tests, functional assessments, and multisequence MRI scans where possible (that is, T1w, T2w, FLAIR, DWI and SWI sequences). They were asked to provide their diagnostic impressions, as well as a confidence score ranging from 0 to 100 for the diagnosis of each of the 13 labels. These confidence scores quantitatively reflect the clinician’s certainty in their diagnosis, with higher scores indicating greater certainty. This scoring system facilitated a quantitative comparison between the clinicians’ diagnostic certainty and the predictive probabilities generated by our model. Similarly, neuroradiologists were provided with the same multisequence MRI scans, along with information on age, gender, race, and education status from 70 clinically diagnosed dementia cases. They were also tasked with providing diagnostic impressions, as well as confidence scores concerning the origin of dementia (Glossary 1). To evaluate the potential enhancement of clinical judgments by our model, we calculated AI-augmented confidence scores by averaging the clinicians’ confidence scores with our model’s predicted probabilities. We then assessed the diagnostic accuracy of the clinicians’ original and AI-augmented confidence scores using AUROC and AUPR metrics. The specifics of the case samples and questionnaires provided to the neurologists and neuroradiologists are detailed below.
Neurologist approach to the ratings
Neurologist 1
The clinical data were reviewed initially, taking note of potential contributors such as extreme age or education (for example, age > 90 years, education less than 9 grades), primary language and language of cognitive testing. Pertinent factors like a history of transient ischemic attack or stroke, PD diagnosis and/or PD medication usage, known genetic mutations, closed head injury, alcohol or substance use disorders, chronic psychiatric symptoms/disorders and APOE genotype were assessed. Next, the current level of functional abilities was evaluated from the provided initial description (for example, independent living, requiring assistance with some or all activities) and FAQ responses. FAQ scores of 9 or higher typically indicated limitations with instrumental activities of daily living, supporting a dementia diagnosis. FAQ scores ranging from 4 to 8 would align with MCI if cognitive test scores indicated cognitive decline. Subsequently, cognitive test scores were reviewed, with focus on age, education, and gender-adjusted Z scores. For those with NC, no Z scores deviated by 1 standard deviation below the mean (that is, no score of −1.0 or worse). Persons with MCI would exhibit at least one Z score of −1.5 or worse (for example, −1.75) or two scores of −1.0 in the same cognitive domain. Persons with dementia would typically present with two or more scores at −2.0 or worse. Interpretation for patients with very low education or non-native language cognitive testing was approached cautiously. Following this, brain MRIs (T1w images) were reviewed for signs of atrophy, the pattern of atrophy, and cerebrovascular disease. When available, DWI was used to identify a diffusion restriction pattern commonly seen in prion diseases. Functional abilities and cognitive test scores were used to classify persons as normal, MCI, or dementia. For persons between categories, a continuum scale was employed. For instance, a score of 80 for MCI and 20 for dementia would indicate an 80% likelihood of classification as MCI and a 20% likelihood of classification as dementia. For individuals with MCI or dementia, the most likely diagnostic category or categories were selected. In cases of mixed dementia or unclear causation, multiple diagnostic categories were chosen, with their scores summing to 100. Each category’s score reflected the estimated contribution and, for mixed dementias, the extent of their contribution. For example, a score of 70 for AD, 20 for LBD and 10 for VD would signify an estimated 70% contribution from AD, 20% from LBD and 10% from cerebrovascular disease.
Neurologist 2
The evaluation of case reports began with a comprehensive analysis of demographics, available medical history, APOE4 status, structured family history and an assessment of the patient’s level of functional independence. Subsequently, a thorough examination of corresponding clinical scales and neuropsychological test results was conducted. Careful observations were made regarding the subject’s educational background, the presence of visual or hearing impairments, and whether the tests were conducted in the subject’s native language. Following this, the synthesis of clinical data allowed for the prediction of the presence of MCI, dementia, or cognitive states falling below the MCI threshold, often referred to as ‘normal’ cognition. These predictions were quantified, with the most probable diagnosis assigned a rating exceeding 50%, whereas the others received lower ratings, reflecting the confidence in the diagnosis. Subsequently, the MRI sequences were examined alongside the case report to identify factors contributing to the patient’s clinical condition. Distinctly, findings such as medial temporal atrophy and parietal atrophy were prominently associated with AD, whereas the presence of flair hyperintensity and focal encephalomalacia without an alternative cause was considered indicative of vascular burden and/or dementia, especially when accompanied by deep and/or brainstem microhemorrhages. Brainstem atrophy was frequently observed in cases suggestive of potential stroke or Lewy body conditions, and the use of DWI sequences allowed for the potential identification of conditions like prion disease and epilepsy-related disorders. In assessing the clinical significance of these contributors, the most plausible factors were rated highest, whereas other contributors received lower but still considerable ratings, typically exceeding 50%. However, distinguishing psychiatric features stemming from a neurodegenerative process from those arising as independent comorbid issues occasionally posed a challenge. Importantly, observed vascular burden in imaging, even when it didn’t independently warrant a dementia diagnosis, was consistently acknowledged under the vascular category, often rated highly due to the confidence in its clinical significance.
Neurologist 3
In the approach to differential diagnosis for dementia, a detailed case overview encompassed a wide spectrum of clinical information including demographics, vitals and comprehensive personal and medical histories, alongside results from systematic physical, neurological, psychiatric and neurocognitive evaluations. Cognitive function was assessed using clinician impressions from neuropsychiatric evaluations and standardized testing with MMSE or MoCA, facilitating the distinction among NC, MCI and dementia. Functional assessments provided insights into the impact of neurological disorders on daily living activities. Specific scales and questionnaires, such as the Hachinski Ischemic Score, evaluations for PSP, and CBS, the Unified Parkinson’s Disease Rating Scale and the Neuropsychiatric Inventory Questionnaire, were instrumental in identifying localized or generalized neurological deficits, signs and symptoms of PD and related conditions, and characteristic features of LBD, such as visual hallucinations. The presence of typical symptoms for disorders like NPH also contributed to fine-tuning the differential diagnosis. The Geriatric Depression Scale was used to discern if primary psychiatric disorders might mimic dementia presentations. An extensive review of neurocognitive testing data aided in differentiating AD from other cognitive disorders. Detailed MRI analyses, revealing anomalies such as cortical atrophy, ischemic changes and ventriculomegaly, further refined the diagnostic process.
Neurologist 4
The patient’s cognitive status, ranging from NC to MCI or dementia, was primarily determined based on neuropsychiatric test results and the functional assessment scale. Special consideration was given to patients with Parkinson’s syndrome, as their movement disorders could impact functional assessment scores. When neuropsychiatric testing clearly indicated dementia, diagnosis was straightforward. However, cases teetering on the borderline between MCI and AD required a closer examination, where functional assessment scores, medical history, and physical examination findings were collectively considered, factoring in the influence of motor disorders on the assessment. This process involved adjusting the probability estimate based on clinical judgment. Regarding etiological diagnosis, a comprehensive evaluation was carried out, taking into account both medical history and imaging data. Cases presenting with Parkinson’s symptoms led to differential diagnoses that included PD dementia, dementia with Lewy bodies, CBD, PSP and others. In instances where imaging revealed markers of cerebral small vessel disease, the possibility of VD was explored. Notably, when prominent mental symptoms were coupled with atrophy in one side of the frontal and temporal lobes, consideration was given to frontotemporal degeneration. Infectious, metabolic, traumatic, and hereditary causes were also taken into account, guided by the relevant medical history. The adjustment of probability in these cases was guided by personal judgment.
Neurologist 5
The assessment combined insights from clinical and medication history, specific neurological examinations and neuropsychological test scores. Initially, attention was given to basic demographic data, such as age and the subject’s living situation. Subsequently, a comprehensive evaluation of medical and social history was conducted, considering potential dementia risk factors and relevant habits. The presence or absence of APOE alleles was noted. Medication history was scrutinized, particularly medications associated with vascular comorbidities like antihypertensives and anticoagulants, indicative of vascular disease risk. The presence of antidepressants was acknowledged, considering potential psychiatric conditions linked to cognitive decline. During the review of neurological examinations, focus was placed on gaze, tremor, parkinsonism and gait assessment. Neuropsychological examination scores were analyzed, first taking note of the number of abnormal tests. MoCA scores were used when available, alongside other tests like WMS. Language assessment, often relying on Animals and Digit span backwards, played a crucial role. Z scores and absolute scores were considered for test abnormality determination. Cognitive decline characterized by language and memory loss pointed to AD. The presence of hallucinations and parkinsonism suggested LBD, or if PD was advanced, it pointed to PD dementia. Executive dysfunction and disinhibition were signs of FTD. Hydrocephalus-associated urinary symptoms and specific findings hinted at NPH. MCI was identified through mildly abnormal tests and preserved daily activities. MRIs were considered, yet clinical synopsis took precedence when imaging findings did not align with the clinical scenario. In offering a final diagnosis, a single label was assigned in cases of diagnosis confidence, whereas multiple labels were used if overlapping symptoms or psychiatric comorbidities/alcoholism could obscure the presentation. In such scenarios, several labels were assigned with varying confidence levels. For instance, in equivocal cases of dementia and MCI, ratings were employed to determine the likelihood of each diagnosis. If both MCI and dementia were considered, dropdowns for each dementia subtype were used to indicate the more probable dementia type. When distinguishing between dementia and psychiatric conditions or acute encephalopathy proved challenging, all relevant options were marked alongside dementia.
Neurologist 6
In assessing clinical cases for dementia, the process began with a comprehensive review of key demographic and historical data, encompassing details like age, gender, educational background, family history, and existing medical comorbidities, to provide context for interpreting the cognitive presentation. The clinical records were systematically examined, with a specific focus on the critical domains relevant to diagnosing dementia syndromes. Key tools for initial assessment, such as the MMSE and the MoCA scores, provided an initial screening of the severity and pattern of cognitive impairment. Very low scores indicated advanced dementia, whereas higher scores within the mild impairment range prompted a more detailed review of neuropsychological test data. This battery of neurocognitive tests revealed the specific profile of cognitive deficits within domains such as memory, language, executive function, and visuospatial abilities, each of which hinted at potential etiologies. A fundamental component of the diagnostic process involved evaluating for any concurrent neurological signs, which entailed a meticulous examination of physical findings, with a particular focus on motor exam results, including assessments for rigidity, tremors, and gait disorders often associated with Parkinsonian disorders. Additionally, the Hachinski Ischemic Scale score was considered for insights into potential vascular contributions. Furthermore, it was imperative to observe the individual’s functional status and any neuropsychiatric symptoms, as they bore diagnostic and prognostic significance. The clinician had to ascertain whether the deficits impeded daily activities. Behavioral manifestations such as depression, hallucinations, delusions and agitation could provide critical distinctions between various dementia types. Once these key components were systematically reviewed, the clinician synthesized the data to formulate a comprehensive differential diagnosis. Cognitive testing profiles, behavioral presentation, family history, age of onset, and the presence of neurological signs were all weighed and considered in a holistic manner. Common differentials in dementia assessment included AD, vascular cognitive impairment, dementia with Lewy bodies, PD dementia and FTD. Lastly, the MRI results were scrutinized for any uncommon findings that could either support or contradict the differential diagnosis. This involved assessing major structural abnormalities or alterations, such as hydrocephalus or severe atrophy, which could provide further backing for the final diagnosis.
Neurologist 7
The interpretation method followed a structured approach. Initially, cognitive impairment severity (NC, MCI or dementia) was determined by assessing Functional Assessment Scale Score, independence level and neuropsychiatric testing. This assessment incorporated past medical history to exclude other potential causes of functional limitations. Etiology assessment comprised several considerations. VD was diagnosed when factors such as stroke history, cerebrovascular disease risk factors, focal neurological deficits, Hachinski infarction score, and specific MRI findings indicating infarctions, white matter hyperintensities, and perivascular spaces were present. Parkinsonism, as evaluated by the Unified Parkinson’s Disease Rating Scale, prompted investigation for LBD, NPH, VD, FTD and variants. LBD was considered for cases with visual hallucinations, Parkinsonism, cognitive impairment, and unremarkable MRI findings, whereas NPH diagnosis hinged on ventricular dilation and radiological features. FTD identification relied on executive function deficits, abnormal behavior, language impairment, and MRI-documented frontal/temporal lobe atrophy. Mental illness was contemplated for individuals with relevant medical history and substantial neuropsychiatric inventory and GDS symptoms. Prion disease recognition was based on distinctive MRI patterns. Conditions like infectious, metabolic, substance abuse, delirium, and psychiatric disorders were considered through medical history, coupled with the absence of specific MRI abnormalities. Lastly, multiple system atrophy was diagnosed in cases displaying Parkinson’s symptoms, defecation issues, ataxia and cerebellar atrophy on MRI, whereas TBI diagnosis was associated with head trauma history, cognitive decline, localized lesions, and secondary atrophy.
Neurologist 8
The evaluation process initiated with a comprehensive assessment of patient demographics, medical/family history, and risk factors. Cardiovascular and cerebrovascular risk factors were scrutinized due to their potential contribution to VD and vascular parkinsonism. Special attention was given to assessing activities of daily living (ADLs), which served as a crucial factor in distinguishing dementia from MCI. APOE status played a pivotal role in gauging the likelihood of AD. The presence of APOE4 heightened the risk of AD, particularly in early onset cases, whereas APOE2 could potentially serve as a protective factor. Psychiatric history was examined to identify behavioral changes and assess whether conditions like depression or anxiety contributed to cognitive symptoms. The GDS helped differentiate between pseudodementia/depression and other psychiatric illnesses affecting cognitive function. This information was crucial in pinpointing specific cognitive disorders (for example, PD dementia, behavioral variant FTD, impulse control disorders in the context of dopamine agonists). A meticulous examination of clinical findings focused on gait, tremor, and bradykinesia. The presence of rest tremor, bradykinesia, or rigidity prompted consideration of parkinsonism, or other forms of parkinsonism such as dementia with Lewy bodies (DLB), PSP or FTD. Comprehensive neuropsychological battery results were analyzed to discern patterns of cognitive impairment, differentiating between executive function deficits and memory impairments. Deviations in tasks such as Trails suggested executive dysfunction, potentially indicating subcortical dementia like DLB, PDD, VD or vascular parkinsonism. Poor performance on WAIS-R or WAIS-III indicated memory impairment, typically associated with cortical dementias like AD. Imaging studies were instrumental in the evaluation. Patterns like diffuse or parietal atrophy suggested AD, whereas frontal-temporal atrophy indicated FTD. The presence of widespread white matter disease (WMD) burden aligned with VD or vascular parkinsonism. Specific assessments included the evaluation of the swallow tail sign, associated with PD, and midbrain atrophy, assessed through sagittal images using the midbrain-to-pons ratio (midbrain area/pontine area). Regarding the rating system, no cases received a perfect score of 100, as most presented with mixed pathologies, combining features such as amyloid beta AD changes and alpha-synuclein aggregates with parkinsonism or alpha-synuclein alongside evidence of tauopathy in PD-PSP variants. Ratings between 50% and 80% indicated varying degrees of likelihood for a specific pathology, with ratings above 80% signifying a stronger likelihood of the disease or pathology being present.
Neurologist 9
The assessment began with a thorough review of the individual’s medical history, with a focus on identifying major diagnoses that could impact cognition. This included conditions like TBI, psychiatric disorders, stroke-related issues, and APOE status. Subsequently, the individual’s medication history was analyzed, considering potential biases introduced by medications commonly used for AD or PD, which might have implied a higher likelihood of these conditions. Functional status assessment followed, encompassing ADLs and instrumental activities of daily living (iADL), providing insights into the individual’s everyday capabilities. A comprehensive physical examination was conducted, emphasizing the identification of notable abnormalities that could offer insights into cognitive status. Psychiatric and cognitive testing scales were administered, and the results were carefully analyzed for consistency and coherence. These results were also cross-referenced with the person’s reported functional status. In cases of discrepancy, consideration was given to underlying mood or psychiatric disorders that may have influenced information accuracy. Chronology of symptoms, often absent from person-level histories, was evaluated with a particular focus on the Neuropsychiatric Inventory Questionnaire, which inquired about symptoms experienced within the last 30 days. During the review of imaging studies, the gathered information was taken into account. Attention was paid to imaging findings that may have indicated AD or vascular disease. Unusual symptoms in the person-level history, such as new motor problems or agitation, prompted consideration of rare conditions like FTD, Huntington’s disease, or Creutzfeldt-Jakob disease. Subsequently, a detailed review of the imaging data was conducted to identify specific features that could be indicative of these particular disorders. Lastly, the interpretation of cognitive testing scale results was influenced by the individual’s functional status. This guided the determination of whether the person exhibited signs of dementia or MCI or fell within the spectrum of normal cognitive function. The aim was to construct a comprehensive assessment of the individual’s cognitive state, accounting for these factors.
Neurologist 10
The determination of cognitive status, including NC, MCI or dementia, relied primarily on neuropsychiatric test outcomes and the functional assessment scale. Notably, when individuals exhibited Parkinsonism, functional abilities were often influenced by motor impairments, making neuropsychiatric test results more influential than the Functional Activities Questionnaire (FAQ). Given the absence of distinct cutoff points for these categories, adjustments to the probability assessment were made based on individual judgment. Regarding the etiological diagnosis, a comprehensive evaluation incorporated all available clinical information and imaging data. For instance, cases presenting with Parkinsonism prompted a focused differential diagnosis that considered conditions like DLB, characterized by symptoms such as parkinsonism, dementia and hallucinations. Others included PD dementia (PDD), typically occurring after a prolonged history of PD, vascular injuries with attention to severe small vessel disease, especially within the basal ganglia, and NPH, identified by enlarged brain ventricles. Conditions such as CBD and PSP, though less common, required the presence of more typical symptoms like apraxia in CBD or abnormal vertical eye movement in PSP for diagnosis. For individuals diagnosed with MCI or dementia but without Parkinsonism, the differential diagnosis primarily encompassed AD, FTD and vascular injuries. FTD, for example, might exhibit pronounced non-memory impairments, along with psychiatric and behavioral symptoms, and asymmetrical brain atrophy in frontal and/or temporal lobes. Additionally, vascular injuries played a substantial role in cognitive impairment and sometimes coexisted with AD pathology. In these instances, probability assessments were adjusted based on clinical judgment. For the remaining etiologies, establishing a diagnosis necessitated a detailed clinical history.
Neurologist 11
The evaluation process initiated with an assessment of the provided case profiles, encompassing baseline information like age, education, language, and required assistance. Supplementary data, including genetic test results such as APOE4 status, medication records, and relevant details, were also considered. Subsequently, various cognitive and physical examinations, along with associated indices, were reviewed to detect neurocognitive dysfunction. From these comprehensive case profiles, preliminary hypotheses were formulated to guide the diagnostic process, ultimately leading to specific diagnoses or a set of potential options. A meticulous evaluation of imaging studies for each case followed, examining different sequences and views for signs of cerebral atrophy or structural changes, including WMD. These imaging findings were correlated with case profile hypotheses to generate a list of probable diagnoses. Probability ratings were assigned to these diagnoses, reflecting the likelihood of their presence. The rating process initially involved determining whether cases met criteria for NC, MCI or dementia. In ambiguous cases distinguishing between dementia and MCI, probability ratings were provided for both, especially when the differentiation between MCI and mild dementia was uncertain based on testing outcomes. Subsequently, probable contributing factors to the diagnoses were identified by selecting the types of dementia most likely present. Many cases presented with multiple potential contributing causes, often including VD alongside AD. Quantifying the likelihood of each diagnosis involved assigning scores of 70 or higher to those with a high probability, regardless of an individual factor’s relatively low contribution to their dementia. Higher scores indicated a greater likelihood of that diagnosis being the primary cause. Causes with similar probabilities scores did not reflect an equal degree of causality to the individual’s condition but merely reflected an equal probability of occurrence. Scores ranging from 20 to 30 suggested the presence of dementia, though with a minor role in the clinical presentation. Scores below 10 indicated a very low probability, implying little to no significance.
Neurologist 12
While reviewing clinical data in conjunction with MRI scans, a notable absence was observed regarding information on symptom onset and progression. This critical aspect of history-taking has the potential to offer valuable insights into the diagnosis, as the pace of progression varies among different forms of dementia. For diagnostic purposes, reliance was placed on MMSE scores, employing a cutoff of 24 to diagnose dementia. Functional capacity assessments assisted in distinguishing between MCI and dementia. Psychiatric questionnaires proved useful in orienting toward specific diagnoses, such as Parkinson’s dementia, DLB or infectious causes. The evaluation of depression’s role in cognition was challenging, but the Geriatric Depression Scale provided some guidance. In cases of uncertainty, the MRI findings played a pivotal role. For instance, clear frontotemporal atrophy with behavioral disturbances and language involvement suggested FTD, whereas temporal lobe atrophy leaned more toward AD. In cases of DLB or Parkinson’s dementia, clinical presentation bore more weight when MRI results were unremarkable. Moderate to severe white matter abnormalities pointed to VD. In most cases, a shortlist of potential diagnoses was compiled before reviewing the MRI. However, there were instances where MRI results were conclusive and prompted a change in the diagnosis. For example, one case indicated possible Creutzfeldt-Jakob disease due to hallucinations and corresponding MRI findings. In another, an MRI revealed encephalomalacia with ventricular enlargement following a head injury. A young case with a cavum septum pellucidum was attributed to chronic traumatic encephalopathy. Lastly, global atrophy in an individual with a history of alcohol abuse and seizures pointed to alcoholic dementia. Providing a percentage of certainty for each diagnosis proved beneficial, as many cases presented mixed pathology, especially in Parkinson’s dementia, where vascular disease often contributed to the clinical picture.
Neuroradiologist approach to the ratings
Neuroradiologist 1
The evaluation of MRI scans initiated with a global perspective to exclude multiple infarcts and identify notable brain atrophy patterns. The presence and severity of white matter lesions, chronic infarcts and microhemorrhages were recorded. Subsequent assessment focused primarily on volume loss, particularly emphasizing hemispheric asymmetry. The initial evaluation determined whether dominant frontal and anterior temporal or parietal and medial temporal volume loss was evident. A more detailed sub-analysis of each region was conducted, focusing on grading severity and documenting regional and focal volume loss in real time. The lobar volume loss evaluation was done systematically, starting with the frontal lobes, including attention to asymmetry when present. Sub-analyses of specific regions within the frontal lobes were conducted, such as the anterior insula, cingulate gyrus, precentral gyrus, and caudate nucleus. Evaluation of temporal lobe volume loss was also carried out, distinguishing mesial and non-mesial temporal lobe atrophy. Subanalyses of hippocampal, amygdala and parahippocampal atrophy were included, with special attention to anterior, lateral, and posterior temporal lobe atrophy, including fusiform, middle, and inferior temporal gyrus volume. The assessment for atrophy was extended to parietal and occipital lobe, documenting brainstem and cerebellar atrophy. When appraising ventricular size, a comparison was made relative to sulcal size. Findings favoring an AD pattern included the presence of predominant parietal and medial temporal lobe atrophy, or less frontal lobe involvement than parietal and temporal lobes. Deviations from the AD pattern, such as predominant frontal, anterior temporal, or occipital involvement, enlarged ventricles, or multiple infarcts, supported non-AD dementia patterns, including those indicative of LBD, VD, prion disease, FTD and its variants, NPH, TBI, psychiatric diagnoses and/or other conditions. A rating scale from 0 to 100 was used to assess the likelihood of various diagnostic considerations. A rating of 0 was selected when no evidence supported a particular diagnosis, whereas a rating of 100 indicated the imaging strongly suggested that entity. Ratings of 50 were assigned when imaging findings were equally likely to represent the entity in question.
Neuroradiologist 2
The approach to rating the cases followed a systematic checklist, starting with an assessment of the entire brain, then moving through various lobes: frontal, temporal, parietal, occipital and the brainstem. Within this framework, the aim was to determine the possible causes of dementia based on imaging findings. Initially, features indicative of NPH were sought. These features typically stood out from other conditions and included disproportionate ventricular enlargement, an acute callosal angle at the posterior commissure level, sulcal crowding near the vertex, and Sylvian fissure enlargement. Next, the focus shifted to assessing the overall burden of WMD, characterized by T2 FLAIR hyperintensities. Examination was carried out in regions with encephalomalacia or gliosis, which might signify prior infarcts, helping establish a potential vascular component to dementia, either as the sole cause or a contributing factor alongside other processes. Further examination was directed toward atrophy patterns, aiming to identify specific neurodegenerative processes. Disproportionate atrophy in the medial, basal, and lateral temporal lobes and the medial parietal lobes suggested AD. Relative preservation of medial temporal lobe structures hinted at dementia with Lewy bodies or PD dementia, although the absence of clinical history posed challenges for this diagnosis, as clinical features and typical MRI findings of medial temporal lobe preservation are valuable in a clinical setting. For FTD and its variants, the search was for frontal and/or temporal atrophy, predominately left posterior perisylvian or parietal atrophy, anterior temporal atrophy, predominant left posterior fronto-insular atrophy, midbrain atrophy relative to the pons (‘hummingbird’ sign), concavity of the dorsolateral midbrain, thinning of the tectal plate, or T2 hyperintense rim along the putamen with patchy or confluent T2 FLAIR hyperintensity in the rolandic subcortical white matter. In the quest for Prion disease indicators, examination included cortical/gyriform diffusion hyperintensity, often accompanied by thalamic and basal ganglia diffusion hyperintensity. Also explored were signs of encephalomalacia and gliosis typical of prior TBI.
Neuroradiologist 3
During case reviews, emphasis was placed on patient age and MRI findings as essential factors guiding the diagnostic process. Age served as a key determinant, informing the assessment of volume loss, particularly relevant in cases of AD and frontotemporal lobar degeneration (FTD). Each MRI sequence contributed uniquely to diagnostic considerations: T1w images held importance in gauging volume loss, discerning distinctive patterns within the hippocampus, temporal lobes, and parietal lobes for AD, and focusing on volume loss within the frontal and temporal lobes for FTD. In the assessment for NPH, attention was drawn to ventriculomegaly and its proportionality to volume loss. T1w images were also instrumental in identifying cerebellar atrophy, indicative of conditions like alcoholism or phenytoin use for seizures. Diffusion-weighted images played a critical role in detecting signs of Creutzfeldt-Jakob disease, characterized by hyperintensity in regions such as the insula, cingulate gyrus, frontal gyri, medial thalami, and possibly the basal ganglia. This sequence was also valuable for identifying infarcts. T2/FLAIR and other T2w images were essential for assessing small vessel disease burden, aiding in the evaluation of VD. They were also instrumental in detecting potential evidence of infectious, inflammatory, metabolic, or drug-related hyperintensity. The susceptibility-weighted images were used to assess for microhemorrhages, which could be associated with AD or Lewy body disease. Psychiatric diseases were typically exempt from numerical ratings as their diagnosis could not usually be ascertained through imaging. Ratings spanned from 70 to 90 in cases where a single diagnosis was highly confident. In scenarios where multiple potential diagnoses were considered, ratings ranged from 40 to 70 for each disease state, reflecting the estimated likelihood of each condition.
Neuroradiologist 4
Each case was approached by first reviewing the demographic information; however, as the project progressed, the demographic data became less informative, and by the midpoint of the project, demographics were reviewed only as a later step. The images were assessed using the SLICER software. The T2w and FLAIR sequences were carefully evaluated to gauge the extent of small vessel disease and infarcts, serving as indicators of potential vascular causes of cognitive impairment. These sequences also proved valuable for the exclusion of infectious, inflammatory, or toxic causes. The DWI sequence was employed to identify acute infarcts and to investigate neurodegenerative conditions such as Creutzfeldt-Jakob disease or fatal familial insomnia. Susceptibility-weighted images were analyzed to identify microhemorrhages, assess their extent and location, and rule out other potential causes of cognitive decline. However, the most pivotal sequences were the volumetric sequences acquired in all three anatomical planes. They were instrumental in assessing global or lobar-specific volume loss. Specific regions of interest included the hippocampal volume assessed through coronal sequences to rule out AD, the precuneus evaluated via sagittal sequences, and the parietal lobes examined in axial sequences. If frontal lobe volume loss was evident, then the temporal lobes were assessed for signs of FTD. Cerebellar volume loss or infratentorial volume loss led to considerations of alcohol abuse or phenytoin use, or cerebellar ataxias, whereas brainstem involvement indicated potential multisystem atrophy. Disproportionate ventricular dilatation raised suspicions of NPH. The rating scale used was comprehensive, and in cases where complete information was lacking, the diagnosis was assigned to the best of the ability. A diagnosis was rated as 100 when highly confident, and as 50 when uncertainty existed. Additionally, some cases were assigned a probability score between 50 and 100 when confident in excluding other potential causes, based on the imaging data.
Neuroradiologist 5
The approach to MR exams began with an evaluation of axial T2/FLAIR images, if available. If multiple regions of gliosis were observed alongside areas of encephalomalacia, resulting from prior infarctions in multiple vascular territories, consideration was given to the possibility of multi-infarct dementia. Moreover, when encephalomalacia and gliosis predominantly affected the temporal lobes, cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy became a potential inclusion in the diagnostic considerations. Following the FLAIR sequence, assessment of diffusion-weighted images, if accessible, primarily served to rule out more acute conditions like Creutzfeldt-Jakob disease, herpes encephalitis, or other forms of encephalitis. Subsequently, T1w images were reviewed, preferably in 3D format, to examine ventricle and sulci dimensions. The presence of ventriculomegaly and sulcal crowding at the vertex prompted consideration of NPH as a potential diagnosis. Additionally, gyri were evaluated to identify areas exhibiting volume loss. T2w images were especially helpful in this regard, as they enhanced the visibility of CSF and accentuated regions of atrophy. Once the order of diagnostic differentials was established, a diagnostic rating was assigned. In this rating system, a score of 100 indicated absolute certainty, an exceedingly rare occurrence in radiology. Conversely, a score of less than 20 signified extreme unlikelihood, 25 denoted unlikeliness, 50 implied the possibility of the diagnosis, whereas a range of 50 to 75 indicated a probable diagnosis. Finally, a score exceeding 75 suggested a high likelihood of the diagnosis being accurate.
Neuroradiologist 6
The review process began with an examination of the provided individual-level demographics for each case. Subsequently, all images provided for each case underwent analysis using the SLICER software. T2/FLAIR sequence was the basis for assessing small vessel changes, subacute to chronic infarcts, encephalomalacia from TBI, and any areas displaying signal abnormalities indicative of potential alternative causes, such as neurodegenerative, infectious-inflammatory, or toxic-metabolic etiologies. T2/FLAIR sequence was also employed to investigate seizure-related changes. T2w images played a key role in evaluating ventricular size, examining the posterior fossa for small infarcts, and observing major intracranial arterial flow voids. Diffusion-weighted images were used to identify acute infarcts and regions with reduced diffusivity, potentially linked to other neurodegenerative, infectious-inflammatory, toxic-metabolic conditions, or seizure-related changes. Susceptibility-weighted images were utilized to detect areas featuring parenchymal microhemorrhage or calcification. Lastly, high-resolution T1w images were employed to analyze regional volume loss patterns suggestive of specific neurodegenerative processes. The evaluation process included the completion of the online ADRD radiologist task survey. During the assessment of sections regarding regional predominate atrophy, the high-resolution T1w images were revisited to ensure response accuracy. In the final section, person-level demographics and imaging findings were synthesized to arrive at the best-guess probability for each diagnosis. The rating scale corresponded to the likelihood of the best-guess diagnosis. For instance, if there was high confidence that a case represented a particular diagnosis, it was assigned a score of 100, with a score of 0 given to all other diagnoses. In cases of diagnostic uncertainty, where the estimated probability was 50%, a score of 50 was assigned.
Neuroradiologist 7
Brain volume loss was assessed based on age-appropriate norms, with T1 and T2/FLAIR sequences aiding in the evaluation of volume loss within each lobe. These sequences were particularly useful for assessing CSF presence near the convexity. Brainstem volume loss was primarily evaluated through mid-sagittal and axial images, which allowed for the examination of the pontine belly and cerebral peduncle size, respectively. Coronal images provided insights into hippocampal volume, determined by the prominence of the temporal horns of the lateral ventricle. Sagittal images were used to assess cerebellar volume loss. FLAIR sequences played a crucial role in detecting encephalomalacia, gliosis, infarcts and white matter changes. Distinct patterns were observed in various dementia types, such as parieto-temporal volume loss favoring AD. Extensive white matter changes with or without microhemorrhages in individuals over 60 years pointed to VD. White matter changes in younger individuals raised consideration of alternative causes like infections or metabolic factors. Alcohol use often correlated with cerebellar volume loss. Traumatic brain injury was suspected in cases with FLAIR signal changes and peripheral volume loss in the anterior temporal and inferior frontal lobes, with or without susceptibility, along with corpus callosum and brainstem findings, suggestive of diffuse axonal injury. Frontal and temporal lobe volume loss indicated FTD. The ‘hummingbird’ sign on sagittal images led to consideration of PSP, particularly when combined with brainstem volume loss. Asymmetric ventricular prominence relative to cortical volume loss hinted at NPH, with the corpus callosal angle measured on coronal images to confirm the diagnosis. Although no specific findings were linked to psychiatric disorders, the presence of a cavum septum pellucidum was weakly correlated. Multiple findings in a case, such as global volume loss, extensive white matter changes and microhemorrhages, leaned toward VD over AD due to the subjective nature of volume loss assessment. A higher rating was assigned to the diagnosis with more MRI findings supporting it, though no case received a perfect score of 100, with ratings exceeding 80 indicating a dominant diagnosis.
Statistical analysis
We used one-way analysis of variance and the two-sided χ2 test for continuous and categorical variables, respectively to assess the overall differences in the population characteristics between the diagnostic groups across the study cohorts. We used the two-sample two-sided KS test for goodness of fit to compare model-predicted AD probabilities, P(AD), between MCI cases with an etiological diagnosis of AD and MCI cases without one. We applied the Kruskal-Wallis H-test for independent samples and subsequently conducted post-hoc Dunn’s testing with Bonferroni correction to evaluate the relationship between CDR scores and the model-predicted probabilities. In order to assess whether the model’s predicted probabilities for AD, FTD and LBD were higher for their respective biomarker positive cases compared to biomarker-negative ones, a one-sided Mann-Whitney U test was conducted. ADNI’s Aβ groups did not significantly deviate from normality and were therefore compared using the one-sided independent samples t-test. We applied the one-sided Mann-Whitney U test between neuropathologic scores and the model-predicted probabilities. To compare model predictions with expert-driven assessments, we used the Brunner-Munzel test to identify statistically significant increases in the mean disease probability scores between the levels of scoring categories. The Brunner-Munzel test was also used to compare the expert and model confidence scores for the true negative and true positive cases for each etiology. To evaluate the interrater reliability of label-specific confidence scores, we performed pairwise Pearson correlation analyses between clinicians’ scores and those generated by the model85. We calculated the average correlation coefficient across pairs and determined its 95% confidence interval. In addition, we estimated the mean Pearson correlation coefficient between the confidence score of neurologists and the model’s score for each diagnostic label using a bootstrapping approach. Pairwise statistical comparisons of AI-augmented clinician diagnostic performance (AUROC and AUPR) and clinicians only diagnostic performance were performed with the one-sided Wilcoxon signed-rank test. In all analyses, we opted for non-parametric tests when the Shapiro-Wilk test indicated significant deviations from normality. All statistical analyses were conducted at a significance level of 0.05.
Performance metrics
We generated ROC and PR curves from predictions on both the NACC test data and other datasets. From each ROC and PR curve, we further derived the area under the curve values (AUC and AUPR, respectively). Further, we computed micro-, macro- and weighted-average AUC and AUPR values. Of note, the microaverage approach consolidates true positives, true negatives, false positives, and false negatives from all classes into a unified curve, providing a global performance metric. In contrast, the macroaverage calculates individual ROC/PR curves for each class before computing their unweighted mean, disregarding potential class imbalances. The weighted-average, whereas similar in approach to macroaveraging, assigns a weight to each class’s ROC/PR curve proportionate to its representation in the dataset, thereby acknowledging class prevalence. We also evaluated the model’s accuracy, sensitivity, specificity and Matthews correlation coefficient, with the latter being a balanced measure of quality for classes of varying sizes in a binary classifier. Performance metrics were initially calculated for the entire testing cohort, followed by a stratified analysis based on age, gender and race subgroups.
Computational hardware and software
All MRI and non-imaging data were processed on a workstation equipped with an Intel i9 14-core 3.3 GHz processor and 4 NVIDIA RTX 2080Ti GPUs. Our software development utilized Python (version 3.11.7) and the models were developed using PyTorch (version 2.1.0). We used several other Python libraries to support data analysis, including pandas (version 1.5.3), scipy (version 1.10.1), tensorboardX (version 2.6.2), torchvision (version 0.15), and scikit-learn (version 1.2.2). Training the model on a single Quadro RTX8000 GPU on a shared computing cluster had an average runtime of 7 minutes per epoch, whereas the inference task took less than a minute per instance. All clinicians reviewed MRIs using 3D Slicer (version 4.10.2) and logged their findings in REDCap (version 11.1.3). Figures were prepared using Canva and Adobe Illustrator.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Sources 2/ https://www.nature.com/articles/s41591-024-03118-z The mention sources can contact us to remove/changing this article |
What Are The Main Benefits Of Comparing Car Insurance Quotes Online
LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos
to request, modification Contact us at Here or [email protected]