A machine-learning approach for nonalcoholic steatohepatitis susceptibility estimation

Ghadiri, FatemehHusseini, Abbas AliÖztaş, Oğuzhan2023-11-042023-11-0420220254-88600975-0711https://hdl.handle.net/11363/6222https://doi.org/Background Nonalcoholic steatohepatitis (NASH), a severe form of nonalcoholic fatty liver disease, can lead to advanced liver damage and has become an increasingly prominent health problem worldwide. Predictive models for early identification of highrisk individuals could help identify preventive and interventional measures. Traditional epidemiological models with limited predictive power are based on statistical analysis. In the current study, a novel machine-learning approach was developed for individual NASH susceptibility prediction using candidate single nucleotide polymorphisms (SNPs). Methods A total of 245 NASH patients and 120 healthy individuals were included in the study. Single nucleotide polymorphism genotypes of candidate genes including two SNPs in the cytochrome P450 family 2 subfamily E member 1 (CYP2E1) gene (rs6413432, rs3813867), two SNPs in the glucokinase regulator (GCKR) gene (rs780094, rs1260326), rs738409 SNP in patatinlike phospholipase domain-containing 3 (PNPLA3), and gender parameters were used to develop models for identifying at-risk individuals. To predict the individual’s susceptibility to NASH, nine different machine-learning models were constructed. These models involved two different feature selections including Chi-square, and support vector machine recursive feature elimination (SVM-RFE) and three classification algorithms including k-nearest neighbor (KNN), multi-layer perceptron (MLP), and random forest (RF). All nine machine-learning models were trained using 80% of both the NASH patients and the healthy controls data. The nine machine-learning models were then tested on 20% of both groups. The model’s performance was compared for model accuracy, precision, sensitivity, and F measure. Results Among all nine machine-learning models, the KNN classifier with all features as input showed the highest performance with 86% F measure and 79% accuracy. Conclusions Machine learning based on genomic variety may be applicable for estimating an individual’s susceptibility for developing NASH among high-risk groups with a high degree of accuracy, precision, and sensitivity.eninfo:eu-repo/semantics/openAccessAttribution-NonCommercial-NoDerivs 3.0 United StatesAlgorithmArtificial intelligenceDisease susceptibilityFatty liverGeneMachine learningNeural network modelNonalcoholic fattyliver diseaseNonalcoholic steatohepatitisSingle nucleotide polymorphismSupport vectormachineA machine-learning approach for nonalcoholic steatohepatitis susceptibility estimationArticle41547548210.1007/s12664-022-01263-2363676822-s2.0-85138396293Q3WOS:000881891700001N/A