Analisis untuk memprediksi diabetes menggunakan data mining
Abstrak
Abstrak - Penambangan data sangat penting untuk mengekstraksi pola dan wawasan berharga dari kumpulan data yang luas, memanfaatkan kecerdasan buatan dan teknik analisis data tingkat lanjut di berbagai domain. Diabetes, kelainan metabolisme yang ditandai dengan peningkatan kadar glukosa darah, menimbulkan risiko kesehatan yang signifikan, termasuk komplikasi kardiovaskular dan ginjal jika tidak diobati. Penambangan data memainkan peran penting dalam mengeksplorasi dan memprediksi diabetes dengan mengidentifikasi populasi berisiko tinggi, sehingga memungkinkan strategi intervensi dini seperti perubahan gaya hidup dan permulaan pengobatan tepat waktu.
Menganalisis kumpulan data komprehensif yang mencakup faktor-faktor terkait diabetes seperti berat badan, tekanan darah, kadar glukosa darah, dan kecenderungan genetik, penambangan data membangun model prediktif untuk menilai risiko dan menerapkan intervensi yang ditargetkan. Dalam studi komprehensif yang melibatkan 768 kasus (268 positif dan 500 negatif), Regresi Logistik mencapai akurasi 70%, dengan recall 57% dan skor F1 0,63, Naive Bayes (GaussianNB) mencapai akurasi 68%, dengan rasio recall 54% dan skor F1 0,61, Decision Tree Classifier mencapai akurasi 66%, dengan rasio recall 62% dan skor F1 0,64, Random Forest mencapai akurasi 70%, dengan rasio recall 59% dan skor F1 0,64, XGBClassifier mencapai akurasi 66%, dengan rasio recall 58% dan skor F1 0,62.
Analisis ini menggarisbawahi adanya trade-off antara presisi dan recall, khususnya dalam mengklasifikasikan kasus diabetes berisiko tinggi. Presisi tinggi mengurangi positif palsu tetapi dapat menurunkan recall, sehingga berpotensi melewatkan kasus positif benar. Sebaliknya, penekanan pada ingatan dapat meningkatkan hasil positif palsu. Mencapai keseimbangan antara metrik ini sangat penting untuk prediksi diabetes yang efektif dan strategi perawatan kesehatan yang disesuaikan. Abstrak ini merangkum peran penting penggalian data dalam penelitian diabetes, menekankan dampaknya pada pemodelan prediktif dan pengambilan keputusan perawatan kesehatan.
Referensi
Cut Fiarni, Evasaria M. Sipayung, Siti Maemunah ,"Analysis and Prediction of Diabetes Complication Disease using Data Mining Algorithm" Vol161,Pages 449-457, 2019, https://doi.org/10.1016/j.procs.2019.11.144.
Ahed J. Alkhatib, Amer Mahmoud Sindiani , Eman Hussein Alshdaifat, "Prediction of Risk Factors Leading to Diabetes Using Neural Network Analysis" vol3, Issue 2,2020 , https://asclepiusopen.com/clinical-research-in-diabetes-and-endocrinology/volume-3-issue-2/4.pdf
Mohanad M.Alsaleha , Kyung-Mo Yeonb , SohailAkhtara , Qazi Mohammad Sajid Jamala," XAI Implementation on Preliminary Data Analysis Phase: Explainable Output Application with Prediction of Diabetes Mellitus at Early Stage" Vol.13 No.02 (2022), 1070-1078 , https://doi.org/10.17762/turcomat.v13i2.12677.
Lindong Zhang , Min Liu," Analysis of Diabetes Disease Risk Prediction and Diabetes Medication Pattern Based on Data Mining",Vol 2022, Article ID 2665339, p9, https://doi.org/10.1155/2022/2665339.
K. Saravananathan, T. Velmurugan , "Quality Based Analysis of Clustering Algorithms using Diabetes Data for the Prediction of Disease", vol-8, Issue-11S2, 2019, 2278-3075, http://dx.doi.org/10.35940/ijitee.K1072.09811S219.
Hong Guo1,ZhiChao Fan1,Yan Zeng," Novel Data Mining Analysis Method on Risk Prediction of Type 2 Diabetes",94:1183–1198,2020, https://doi.org/10.1007/s11265-021-01717-4.
Joyce Jackson,"data mining a conceptual overview",vol 8 267-296,2002, https://doi.org/10.17705/1CAIS.00819.
David Crockett, Ryan Johnson, and Brian Eliason ," What is Data Mining in Healthcare", vol 8 ,2002, 267-296, https://www.healthcatalyst.com/wp-content/uploads/2014/06/What-is-data-mining-in-healthcare.pdf.
Ogundele I.O, Popoola O.L, Oyesola O.O, Orija K.T," A Review on Data Mining in Healthcare",vol 7, Issue 9, September 2018, ISSN: 2278 – 1323, https://www.researchgate.net/publication/370899263.
FRANS COENEN," Data Mining: Past, Present and Future",vol 7,26(01):25-29,2018, https://www.researchgate.net/publication/220254364.
Felipe Israel Marinho , Mario Henrique Akihiko da Costa Adaniya , "DATA MINING, MACHINE LEARNING, AND BUSINESS INTELLIGENCE - A CASE STUDY ON CRYPTOCURRENCIES",vol39, 2596-2809, 2023, http://periodicos.unifil.br/index.php/Revistateste/article/download/2891/2640/.
Bernd Kirchhof," 170 years of data mining: history and future", vol 262, pages 1013–1014, 2024, https://doi.org/10.1007/s00417-023-06359-9.
Kuldeep Nagi , "From Bits and Bytes to Big Data-An Historical Overview ", (June 9, 2020), , https://ssrn.com/abstract=3622921 or http://dx.doi.org/10.2139/ssrn.3622921.
Ravindra Maan , "The Evolution of Python Programming Language", 2040-0748 , Vol-9 Issue-02 July 2020 , https://ijgst.com/admin/uploadss/The%20Evolution%20of%20Python%20Programming%20Language.pdf.
Neesha Jothia , Nur’Aini Abdul Rashidb , Wahidah Husainc , "Data Mining in Healthcare – A Review",vol 72, P 306-313, 2015, https://doi.org/10.1016/j.procs.2015.12.145.
Furqan Alama , Rashid Mehmoodb , Iyad Katiba , Aiiad Albeshri," Analysis of Eight Data Mining Algorithms for Smarter Internet of Things (IoT)", Volume 98, P 437-442, 2016, https://doi.org/10.1016/j.procs.2016.09.068.
Steven J. Rigatti, MD, DBIM, DABFM," Random Forest", vol 47, : 31–39, https://doi.org/10.17849/insm-47-01-31-39.1.
Solane Duquea , Dr.Mohd. Nizam bin Omar ," Using Data Mining Algorithms for Developing a Model for Intrusion Detection System (IDS)", Vol 61, Pages 46-51, 2015, https://doi.org/10.1016/j.procs.2015.09.145.
Nabila Farnaaz , M. A. Jabbar ," Random Forest Modeling for Network Intrusion Detection System",89 213 – 217 , 2016, https://doi.org/10.1016/j.procs.2016.06.047.
Aritz Pe´rez *, Pedro Larran˜aga, In˜aki Inza," Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes",vol 43, p 1–25 , 2006 , https://doi.org/10.1016/j.ijar.2006.01.002.
Nurul Rismayanti , Ahmad Naswin , Umar Zaky , Muhammad Zakariyah , Dwi Amalia Purnamasari ," Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes",Volume 1 Issue 2 ISSN 3025-4167, https://doi.org/10.56705/ijaimi.v1i2.99.
Ivan Rodrigues, Alitta Parayil, Tarun Shetty, Imran Mirza," Use of Linear Discriminant Analysis (LDA), K Nearest Neighbours (KNN), Decision Tree (CART), Random Forest (RF), Gaussian Naive Bayes (NB), Support Vector Machines (SVM) to Predict Admission for Post Graduation Courses",7 Pages Posted: 26 Oct 2020, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3683065.
Sheikh Amir Fayaza , Majid Zamanb, Muheet Ahmed Buttc ," To Ameliorate Classification Accuracy using Ensemble Distributed Decision Tree (DDT) Vote Approach: An Empirical discourse of Geographical Data Mining", Volume 184, 2021, Pages 935-940, https://doi.org/10.1016/j.procs.2021.03.116.
Zeljko Vujovic, "Classification Model Evaluation Metrics", Volume 12 Issue 6, 2021, https://dx.doi.org/10.14569/IJACSA.2021.0120670.
Ching-Lung Fan,"Article Evaluation of Classification for Project Features with Machine Learning Algorithms", 2022, 14(2), 372; https://doi.org/10.3390/sym14020372.
Karan Bhowmick , Vivek Sarvaiya," A COMPARATIVE STUDY OF THE DIFFERENT CLASSIFICATION ALGORITHMS ON FOOTBALL ANALYTICS ", Int. J. Adv. Res. 9(08), 392-407, http://dx.doi.org/10.21474/IJAR01/13280.
D.Y. Lin,"Linear regression analysis of censored medical costs", Volume 1, Issue 1, March 2000, Pages 35–47, https://doi.org/10.1093/biostatistics/1.1.35.
Gülden Kaya Uyanık , Neşe Güler ,"A Study on Multiple Linear Regression Analysis" , Volume 106, 10 December 2013, Pages 234-240, https://doi.org/10.1016/j.sbspro.2013.12.027.
Peter C. Austina, Ewout W. Steyerbergd ,"The number of subjects per variable required in linear regression analyses" , VOLUME 68, ISSUE 6, P627-636, JUNE 2015, http://dx.doi.org/10.1016/j.jclinepi.2014.12.014.
Kolawole Ogunsina , Ilias Bilionis b , Daniel DeLaurentis , "Exploratory data analysis for airline disruption management", Volume 6, 15 December 2021, 100102, https://doi.org/10.1016/j.mlwa.2021.100102.
joan Stelmack, OD; Janet P. Szlyk, PhD; Thomas Stelmack, OD; Judith Babcock-Parziale, PhD; Paulette Demers-Turco, OD; R. Tracy Williams, OD; Robert W. Massof, PhD, "Use of Rasch person-item map in exploratory data analysis: A clinical perspective", Volume 41, Number 2, Pages 233–242,2004, http://dx.doi.org/10.1682/JRRD.2004.02.0233.
Kunitoshi Iseki 1, Yoshiharu Ikemiya, Kozen Kinjo, Taku Inoue, Chiho Iseki, Shuichi Takishita,"Body mass index and the risk of development of end-stage renal disease in a screened cohor", VOLUME 65, ISSUE 5, P1870-1876, MAY 2004 , https://doi.org/10.1111/j.1523-1755.2004.00582.x.
Massimo Cirillo, Pietro Anastasio , Natale G. De Santo, "Relationship of gender, age, and body mass index to errors in predicted kidney function", (2005) 20: 1791–1798, https://doi.org/10.1093/ndt/gfh962.
Chandra L. Jackson, PhD, MS, Hsin-Chieh Yeh, PhD , Moyses Szklo, MD, DrPH, Frank B. Hu, MD, PhD , Nae-Yuh Wang, PhD , Rosemary Dray-Spira, MD, PhD, and Frederick L. Brancati, MD, MHS," Body-Mass Index and All-Cause Mortality in US Adults With and Without Diabetes ", 29(1):25–33,2013, DOI: 10.1007/s11606-013-2553-7.
George A Bray, Kathleen A Jablonski, Wilfred Y Fujimoto, Elizabeth Barrett-Connor, Steven Haffner, Robert L Hanson, James O Hill, Van Hubbard, Andrea Kriska, Elizabeth Stamm, and F Xavier Pi-Sunyer , " Relation of central adiposity and body mass index to the development of diabetes in the Diabetes Prevention Program ", r 2008;87:1212– 8, https://doi.org/10.1093/ajcn/87.5.1212.
Ari Karppinen , Jaakko Kukkonen , Jari Härkönen , Mari Kauhaniemi , Anu Kousa , Tarja Koskentalo,"A modelling system for predicting urban air pollution: Comparison of model predictions with the data of an urban measurement network in Helsinki", 34(22):3735-3743 , https://www.researchgate.net/publication/222829613_A_modelling_system_for_predicting_urban_air_pollution_Comparison_of_model_predictions_with_the_data_of_an_urban_measurement_network_in_Helsinki
Daniel L. Moody , "Measuring the Quality of Data Models: An Empirical Evaluation of the Use of Quality Metrics in Practice", Proceedings of the 11th European Conference on Information Systems, ECIS 2003, Naples, Italy 16-21 June 2003, http://aisel.aisnet.org/ecis2003/78.
A.L.Sayeth Saabith , MMM.Fareez , T.Vinothraj , " Python Current Trend Applications-An Overview" , Volume 6, Issue 10, October-2019, e-ISSN: 2348 - 4470, print-ISSN: 2348-6406 , https://www.scribd.com/document/544106143/IJAERDV06I1085481.
Andre M. Carrington , Douglas G. Manuel, Paul W. Fieguth , Tim Ramsay , Venet Osmani , Bernhard Wernly, Carol Bennett, Steven Hawken , Olivia Magwood, Yusuf Sheikh, Matthew McInnes, and Andreas Holzinger , Senior Member, "Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation" , Volume: 45, Issue: 1, 01 January 2023 , https://doi.org/10.1109/TPAMI.2022.3145392.
Weichao Xu; Shun Liu; Xu Sun; Siyang Liu; Yun Zhang ,"A Fast Algorithm for Unbiased Estimation of Variance of AUC Based on Dynamic Programming", vol 9553 – 9560 , 2016 , https://doi.org/10.1109/ACCESS.2016.2628102.
Krzysztof Gajowniczek , Tomasz Ząbkowski , "ImbTreeAUC: An R package for building classification trees using the area under the ROC curve (AUC) on imbalanced datasets",Volume 15, July 2021, 100755, https://doi.org/10.1016/j.softx.2021.100755.
Farrukh Aslam Khan, Khan Zeb, Mabrook Alrakhami, Abdelouahid Derhab , " Detection and Prediction of Diabetes Using Data Mining A Comprehensive Review",vol9 IEEE Access PP(99):1-1, https://ieeexplore.ieee.org/document/9354154.




