Evaluation of Bank Credit Card Telemarketing Using Genetic Algorithms for Feature Selection and Naive Bayes

Authors

  • Ekka Pujo Ariesanto Akhmad

DOI:

https://doi.org/10.30649/japk.v10i1.21

Keywords:

telemarketing, CRISP data mining, genetic algorithm for feature selection, high-dimensional data, naive bayes

Abstract

The bank marketing department has collected data from bank customers or customers by marketing or socializing credit cards by telephone (telemarketing). Evaluation of credit card telemarketing that has been done by banks still lacks results and is efficient. One appropriate way to evaluate bank credit card telemarketing reports is to use data mining techniques. The purpose of using data mining is to determine the trends and patterns of customers who have the opportunity to subscribe to credit cards offered by banks. The research method uses Cross Industry Standard Process for Data Mining (CRISP-DM) with Genetic Algorithms for Feature Selection (GAFS) and Naive Bayes (NB). The results showed the number of attributes in the bank credit card telemarketing dataset were 15 attributes consisting of 14 ordinary attributes and 1 special attribute. The bank telemarketing dataset contains high-dimensional data, so the GAFS method is applied. After applying the GAFS method, it is obtained that 7 optimal attributes consist of 6 ordinary attributes and 1 special attribute. Six ordinary attributes include pekerjaan, balance, rumah, pinjaman, durasi, poutcome. While special attributes are target. The results showed the NB algorithm has an accuracy value of 86.71%. GAFS and NB algorithms increase the accuracy value to 90.27% for predictions of bank customers who take credit cards.

References

Afandie, M. N., Cholissodin, I., & Supianto, A. A. (2014). Implementasi metode k-nearest neighbor untuk pendukung keputusan pemilihan menu makanan sehat. Repository Jurnal Mahasiswa PTIIK UB, 3(1), 1.

Aggarwal, C. & Philip S. Yu. (2008). Privacy-Preserving Data mining : Models and Algorithms, Springer.

Aqham, Ahmad Asifuddin dan Kristoko Dwi Hartomo. (2019). Data Mining untuk Nasabah Bank Telemarketing Menggunakan Kombinasi Algoritm Naïve Bayes dan Algoritma Genetik. InfoTekJar: Jurnal Nasional Informatika dan Teknologi Jaringan - Vol. 4 No. 1 edisi September.

Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using Joint Mutual Information Maximisation. Expert Systems with Applications, 42(22), 8520-8532,https//doi.org/10.1016/j.eswa.2015.07.007

Bustami, B. (2013). "Penerapan Algoritma Naïve Bayes Untuk Mengklasifikasi Data Nasabah Asuransi," TECHSI : Jurnal Penelitian Teknik Informatika, vol. 3, no. 2, pp. 127 – 146.

Breiman, L. (1999). Random forests – random features, Technical Report 567, Statistics Departement, University of California, Berkeley.

Canedo, Veronica Bolon dan Noelia Sanchez Marono, A. A.-B. (2015). Feature selection for high-dimensional industrial data. Springer International Publishing Switzerland 2015.https://doi.org/10.1007/s13748-015-0080-y

Chana K. Y., Kwong. C. K, Dillona. T. S., and Tsimb Y. C. (2011). Reducing Overfitting in Manufacturing Process Modeling Using a Backward Elimination Based Genetic Programming, Applied Soft Computing.

Clark, P. & Boswell, R. (1991). Rule induction with CN2: Some recent improvements, In: Kodratoff Y.(eds) Machine Learning – EWSL-91, EWSL 1991, Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol 482, Springer, Berlin, Heidelberg.

Ekaristio, I., Soebroto, A. A., & Supianto, A. A. (2015). Pengembangan sistem pendukung keputusan pemilihan bibit unggul sapi bali menggunakan metode k-nearest neighbor. Journal of Environmental Engineering and Sustainable Technology, 02(01), 49–57.

Fadilah, A. N., Cholissodin, I., dan Mahmudy, W. F. (2015). Implementasi Analytical Hierarchy Process (AHP) dan Algoritma Genetika untuk Rekomendasi dan Optimasi. DORO: Repository Jurnal Mahasiswa PTIIK Universitas Brawijaya, 5(14).

Flach, P. A. (2016). Classifier calibration, In: C. Sammut, G.I., Webb (eds), Encyclopedia of machine learning and data mining, Springer, Boston, MA.

Genkin, A., Lewis, D. D., & Madigan, D. (2007) Large-scale Bayesian logistic regression for text categorization, Technometrics, vol.49, pp. 291-304.

Hu, W. & Maybank, S. (2008). AdaBoost-based algorithm for network instrusion detection, IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernatics, vol.38, no.2.

Jian, C., Gao, J., & Ao, Y. (2016). A new sampling method for classifying imbalanced data based on Support Vector Machine ensemble. Neurocomputing 193, 115-122. https://doi-org/10.1016/j.neucom.2016.02.006

K. Wisaeng. (2013). “A Comparison of Different Classification Techniques for Bank Direct Marketing”. International Journal of Soft Computing and Engineering (IJSCE).

Karim, M., & Rahman, R. M. (2013). Decission tree and naïve bayes algorithm for classification and generation of actionable knowledge for direct marketing, Journal of Software Engineering and Application, vol.6,pp.196-206.

Klas, W., & Schrelf, M. (1995). Metaclasses and their application: Data model tailoring and database integration. Springer.

Kusumadewi, Sri. (2003). Artificial Intelligence (teknik dan aplikasinya), Yogyakarta: Graha Ilmu.

Lewis, D. D. (1998). Naïve (Bayes) at forty: The independence assumption in information retrieval, in European Conference on Machine Learning, pp. 4-15.

Maghfirah, d.k.k. (2015). Menggunakan data mining untuk Segmentasi Customer pada Bank untuk Meningkatkan Customer Relationship Management (CRM) dengan Metode Klasifikasi (Algoritma J-48, Zero-r dan Naive Bayes). Prosiding SNST ke-6, Fakultas Teknik Universitas Wahid Hasyim Semarang.

Mandt, S., Hoffman, M. D., & Blei, D. M. (2017). Stochastic gradient descent as approximate Bayesian inference, Journal of Machine Learning Research, 18, 1-35.

North, M. (2012). Data Mining for the Masses. Computer Global Text Project.https://doi.org/10.1016/j.jspd.2013.01.003

Novia. (2017). Pengertian Telemarketing dan 5 Strategi Suksesnya. Diakses dari http://www.jurnal.id/blog/2017-pengertian-telemarketing-dan-5-strategi-suksesnya/ tanggal 26/10/2019.

Oktanisa, Irvi dan Ahmad Afif Supianto. (2018). Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), Vol. 5, No. 5, Oktober, hlm. 567-576.

Pal, M. (2005). Random forest classifier for remote sensing classification, Internatonal Journal of Remote Sensing, 26:1, 217-222.

Quinlan, J. R. (1987). Simplifying decision trees, International Journal of Man-Machine Studies 27, pp: 221-234.

S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.

Saez, J. A., Derrac, J., Luengo, J., & Herrera, F. (2014). Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers, Pattern Recognition, 47(12), 3941-3948 https://doi.org/10.1016/j.patcog.2014.06.012

Saez, J. A., Galar, M., Luengo, J., & Herrera, F. (2013). Tackling the problem of classification with noisy data using Multiple Classifier Systems: Analysis of the performance and robutsness. Information Sciences, 247 (June), I-20.https://doi.org/10.1016/j.ins.

06.002

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.https://doi.org/10.1145/

505283

Shearer, Colin. (2000). The CRISP-DM Model: The New Blueprint for Data Mining. Journal of Data Warehousing, Vol. 5, No. 4 Fall 2000.

Shmilovici, A. (2009). Support vector machine, In:Maimon O., Rokach L. (eds) Data mining and knowledge discovery handbook, Springer, Boston, MA.

Suntoro, Joko. (2019). Data Mining Algoritma dan Implementasi dengan Pemrograman PHP. Jakarta: PT. Elex Media Komputindo.

Suthaharan, S. (2016). Support vector machine, In: Machine leaning models and algorithms forbig data classification, Integrated Series In Information Systems, vol.36, Springer, Boston, MA.

Suyanto. (2011). Artificial Intelligence - Searching-Reasoning-Planning - Learning (Edisi Revisi, cetakan ke-2). Bandung : Informatika.

Trevino, Victor and Francesco Falciani. (2006). "GALGO: an R Package for Multivariate Variable Selection Using Genetic Algorithms," Bioinformatics, vol. 22, p. 1154–1156.

Zaidi, Nayyar, d.k.k. (2013). "Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting," Journal of Machine Learning Research, vol. 1, no. 14, pp. 1947 – 1988.

Zukhri, Zainudin. (2014). Algoritma Genetika Metode Komputasi Evolusioner untuk Menyelesaikan Masalah Optimasi. Yogyakarta: Andi Offset.

Published

01-09-2019

How to Cite

Akhmad, E. P. A. (2019). Evaluation of Bank Credit Card Telemarketing Using Genetic Algorithms for Feature Selection and Naive Bayes. JURNAL APLIKASI PELAYARAN DAN KEPELABUHANAN, 10(1), 12–22. https://doi.org/10.30649/japk.v10i1.21