ANALYSING SENSITIVE DATA SECURITY USING TOKENIZATION-BASED DATA MASKING IN THE EXTRACT-TRANSFORM-LOAD (ETL) PROCESS

Authors

  • Wayan Dana Universitas Multi Data Palembang
  • Abdul Rahman Universitas Multi Data Palembang,South Sumatera, Indonesia

DOI:

https://doi.org/10.47709/cnahpc.v8i2.7949

Keywords:

Keywords: Data Protection, ETL, Transformation, Data Masking, Tokenization

Abstract

Growth of internet users in Indonesia by 8.7% in 2025, reaching 212 million or 74.6% of the total population, has created significant opportunities for the digital economy but also raised concerns regarding data security. Inadequate management of digital data poses risks to public privacy, prompting the Indonesian government to enact Law Number 27 of 2022 on Personal Data Protection (UU PDP) as a legal framework for data protection. Effective data security measures are necessary across various stages, including safeguarding data during the Extract-Transform-Load (ETL) process. This study aims to develop a data masking technique based on tokenization during the transformation stage of the ETL process to enhance the security of sensitive data. The ETL process, which involves extracting data from diverse sources, transforming it into the required format, and loading it into a database, is particularly vulnerable to sensitive data exposure during the transformation stage, especially in unprotected staging environments. Tokenization replaces original data with tokens that hold no intrinsic value, ensuring data confidentiality throughout the transformation and staging phases. The study's findings indicate that tokenization effectively protects sensitive data before it is loaded into the database, while also minimizing the need for duplicate tables or additional storage for masked data. This research contributes practically to supporting the implementation of the UU PDP, strengthening data security in ETL systems, and fostering a secure digital ecosystem in Indonesia.

Downloads

Download data is not yet available.

References

Ali, M., Hamdan, H., & Al-Makhadmeh, Z. (2021). A novel framework for securing cloud computing storage using tokenization and cryptographic techniques. International Journal of Cloud Applications and Computing (IJCAC), 11(2), 1–18. https://doi.org/10.4018/IJCAC.2021040101

?AH?N, Y., & DOGRU, ?. (2023). An enterprise data privacy governance model: Security-centric multi-model data anonymization. Proceedings of Uluslararas? Mühendislik Ara?t?rma ve Geli?tirme Dergisi. https://doi.org/10.29137/umagd.1272085

Oktafiani, R. (2023). Kombinasi algoritma kriptografi vigenere cipher dan SHA256 dalam pengamanan basis data. Jurnal Sistem Komputer dan Informatika (JSON), 4(3), 433–442. https://doi.org/10.30865/json.v4i3.5583

Putra, I. M. S., & Adhitya Putra, D. K. T. (2021). Rancang bangun engine ETL data warehouse dengan menggunakan bahasa Python. J.RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(2), 113–123. https://doi.org/10.29207/resti.v5i2.2872

Fana, W. S., Sovia, R., Permana, R., & Islam, M. A. (2021). Data warehouse design with ETL method (Extract, Transform, And Load) for company information centre. International Journal of Artificial Intelligence Research, 5(2), 132–137. https://doi.org/10.29099/ijair.v5i2.215

Nika, M. (2024). Building AI-powered products. O’Reilly Media, Inc.

Ridha Ismadiah, M. (2020). Kombinasi algoritma cipher block chaining (CBC) dan Mars pada penyandian file PDF. Karya Ilmiah Internal.

Larson, K. S., & Souley, B. (2020). An improved data masking security solution using modulus based technique (MOBAT) for data warehouse system. International Journal of Science and Engineering Applications, 9(6), 68–78. ISSN: 2319-7560.

Wang, F., Liu, C., Cui, X., Yin, J., Lin, B., Tang, Z., Ye, Y., Cui, J., Zhu, B., Jin, P., Zhang, J., Ning, M., Yuan, L., Ma, P., Ding, R., Han, S., & Zhang, D. (2024). MoE-LLaVA: Mixture of experts for large vision-language models.

CoRR, abs/2401.15947. https://doi.org/10.48550/arXiv.2401.15947

Yulianto, A. A. (2021). Extract transform load (ETL) process in distributed database academic data warehouse. Jurnal Nasional Pendidikan Teknik Informatika, 5(2), 61. https://doi.org/10.23887/janapati.v5i2.9855

Jiang, L., & Torra, V. (2023). Data protection and multi-database data-driven models. Future Internet, 15(3), 93. https://doi.org/10.3390/fi15030093

Templ, M., & Sariyar, M. (2022). A systematic overview on methods to protect sensitive data provided for various analyses. International Journal of Information Security, 21(6), 1233–1246. https://doi.org/10.1007/s10207-022-00607-5

Garg, S., & Baliyan, N. (2021). Comparative analysis of Android and iOS from security viewpoint. Computer Science Review, 40, 100410. https://doi.org/10.1016/j.cosrev.2021.100410

Rady, M., Abdelkader, T., & Ismail, R. (2021). Securing query results for cloud databases. IJICIS, 21(1), 104–118. https://doi.org/10.21608/ijicis.2021.71016.1081

Badgujar, P. (2021). Implementing data masking techniques for privacy protection. Journal of Technological Innovation, 2(4).

Hasibuan, E. S., Ade Putri, E., & Universitas Bhayangkara Jakarta Raya. (2024). Perlindungan keamanan atas data pribadi di dunia maya. Jurnal Hukum SASANA, 10(1), 70–83. https://doi.org/10.31599/sasana.v10i1.2134

Manickam, V., & Indra, M. R. (2022). Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare management. Health Care Management Science, 25(2), 301–318. https://doi.org/10.1007/s10729-022-09613-y

Yanti, R. J., Bernardino, J., & Vieira, M. (2011). A data masking technique for data warehouses. Paper presented at the Proceedings of the 15th Symposium on International Database Engineering & Applications.

Shcherbinina, A., Shchur, I., & Petrenko, V. (2020). Pendekatan enkripsi berlapis dalam perlindungan big data. International Journal of Advanced Computer Science and Applications (IJACSA), 11(10), 488–495. https://doi.org/10.14569/IJACSA.2020.001101062

Pamungkas, M. A., & Zaney, M. (2021). Penggunaan hashing SHA1 dan RSA untuk melindungi data login. Jurnal Ilmu Komputer dan Rekayasa, 2(4), 1–10.

Ghann, M., Ahmed, H., & Eltahir, M. (2022). Menerapkan enkripsi pada data sensitif untuk mencegah publikasi ilegal. International Journal of Cybersecurity and Digital Forensics, 11(2), 23–30. https://doi.org/10.17781/P002685

Archana, R., Singh, A., & Kumar, A. (2021). Fleksibilitas dalam implementasi data masking pada sistem informatica big data quality. International Journal of Big Data and Data Warehousing, 6(1), 12–19. https://doi.org/10.20533/ijbdw.2021.0006

Ridwan, M. (2020). Analisis keamanan basis data cloud menggunakan metode DES dan AES. Jurnal Infokom, 12(2), 45–53.

Yesin, O., & Vilihura, A. (2021). Penggunaan berbagai metode masking untuk menyembunyikan data penting. Jurnal Teknologi Informasi dan Keamanan, 4(1), 101–108.

Shcherbinina, A., Shchur, I., & Petrenko, V. (2020). Pendekatan enkripsi berlapis dalam perlindungan data. International Journal of Advanced Computer Science and Applications (IJACSA), 11(10), 488–495. https://doi.org/10.14569/IJACSA.2020.001101062

Xu, Z., Wang, Y., & Li, Q. (2020). Enhancing ransomware detection through hierarchical encryption deviation analysis. IEEE Transactions on Dependable and Secure Computing, 17(6), 1210–12117. https://doi.org/10.1109/TDSC.2020.3040331

Zhang, Y., Liu, H., & Zhao, Y. (2021). AI-driven ransomware detection and defense strategies. IEEE Access, 9, 110012–110025. https://doi.org/10.1109/ACCESS.2021.3075567

Pearce, W., Landers, N., Fulda, N. (2020). Machine learning for offensive security: Sandbox classification using decision trees and artificial neural networks. In Arai K., Kapoor S., Bhatia R. (Eds.), Proceedings of the Future Technologies Conference, 263–280. Springer. https://doi.org/10.1007/978-3-030-68489-1_22

Novo, C., Morla, R. (2020). Flow-based detection and proxy-based evasion of encrypted malware C2 traffic. Proceedings of the 2020 ACM Workshop on Artificial Intelligence, Security, and Privacy (AISP 2020), 83–91. https://doi.org/10.1145/3411508.3421379

Ridha Ismadiah, M. (2020). Kombinasi algoritma CBC dan Mars pada penyandian file PDF. Jurnal Teknik Informatika dan Sistem Informasi, 2(4), 1–10.

Ridwan, M. (2020). Analisis keamanan basis data cloud menggunakan metode DES dan AES. Jurnal Infokom, 12(2), 45–53.

Yesin, O., & Vilihura, A. (2021). Penggunaan berbagai metode masking untuk menyembunyikan data penting. Jurnal Teknologi Informasi dan Keamanan, 4(1), 101–108.

Qian, Z., Callender, T., Hübers, S., van der Schaar, M., & Ercole, A. (2024). Synthetic data for privacy-preserving clinical risk prediction. Scientific Reports, 14, 23894. https://doi.org/10.1038/s41598-024-72894-y

Downloads

Published

2026-05-01

How to Cite

Dana, W., & Rahman, A. (2026). ANALYSING SENSITIVE DATA SECURITY USING TOKENIZATION-BASED DATA MASKING IN THE EXTRACT-TRANSFORM-LOAD (ETL) PROCESS. Journal of Computer Networks, Architecture and High Performance Computing, 8(2), 255–263. https://doi.org/10.47709/cnahpc.v8i2.7949

Similar Articles

<< < 3 4 5 6 7 8 9 10 11 12 > >> 

You may also start an advanced similarity search for this article.