The Effectiveness Of Automatic Speech Recognition In The Capcut Application For Developing Inclusive Learning Media

Authors

  • Netaniel Giovanni Pusdiklat Kepemimpinan dan Manajerial, Kementerian Keuangan Republik Indonesia https://orcid.org/0000-0002-8914-1867
  • Wendi Nurhayat Pusdiklat Kepemimpinan dan Manajerial, Kementerian Keuangan Republik Indonesia

DOI:

https://doi.org/10.17977/um038v8i22025p157

Keywords:

Speech-to-Text, Automatic Speech Recognition, Word Error Rate, Learning Media Development

Abstract

Deaf individuals face challenges in accessing audio information, necessitating technology that can generate accurate and easily understandable automatic captions. However, studies remain limited in systematically evaluating the accuracy of Automatic Speech Recognition (ASR) technology in the Indonesian language context. This study aims to analyze the effectiveness of the ASR feature in the CapCut application in generating automatic captions for educational media. To evaluate how well the ASR feature in the CapCut app creates automatic captions for educational videos, nine videos from one e-learning platform were chosen and compared by looking at the automatic transcriptions from CapCut alongside manual transcriptions. Jiwer software and OpenAI Whisper normalized the data before analyzing it to calculate the Word Error Rate (WER). The results show an average WER of 3.08%, categorized as excellent; however, content analysis revealed the need for minor manual editing of technical terms and sentence structure to enhance readability. CapCut’s ASR feature could be a strategic solution for creating effective and accessible educational media, and it could also serve as a guide for building automatic captioning systems.

Abstrak

Penyandang disabilitas tuli menghadapi tantangan dalam mengakses informasi audio sehingga dibutuhkan teknologi yang mampu menghasilkan takarir otomatis yang akurat dan mudah dipahami. Namun, studi yang secara sistematis mengevaluasi akurasi teknologi Automatic Speech Recognition (ASR) dalam konteks bahasa Indonesia masih terbatas. Penelitian ini bertujuan untuk menganalisis efektivitas fitur ASR pada aplikasi CapCut dalam menghasilkan takarir otomatis untuk media pembelajaran. Dengan pendekatan evaluatif kuantitatif, sembilan video pembelajaran dipilih dari satu e-learning untuk dianalisis melalui perbandingan antara hasil transkripsi otomatis CapCut dan transkripsi manual. Data dinormalisasi sebelum dianalisis menggunakan perangkat lunak Jiwer dan OpenAI Whisper untuk menghitung nilai Word Error Rate (WER). Hasil menunjukkan rata-rata WER sebesar 3,08% yang termasuk kategori sangat baik namun analisis konten mengungkap perlunya sedikit penyuntingan manual pada istilah teknis dan struktur kalimat untuk meningkatkan keterbacaan. Dengan demikian, ASR CapCut berpotensi menjadi solusi strategis dalam pengembangan media pembelajaran yang efisien dan inklusif serta menjadi acuan dalam pengembangan sistem takarir otomatis.

Author Biographies

Netaniel Giovanni, Pusdiklat Kepemimpinan dan Manajerial, Kementerian Keuangan Republik Indonesia

Penulis adalah Pengembang Teknologi Pembelajaran Ahli Pertama di Kementerian Keuangan Republik Indonesia

Wendi Nurhayat, Pusdiklat Kepemimpinan dan Manajerial, Kementerian Keuangan Republik Indonesia

Penulis adalah Pengembang Teknologi Pembelajaran Ahli Muda di Kementerian Keuangan Republik Indonesia

References

Abdullah, N., & Arief Muhsin, M. (2025). How Capcut Application Complete Video Assignment: A Study of Students Perception In Higher Education In Indonesia. Forum for University Scholars in Interdisciplinary Opportunities and Networking, 212–222. https://conference.ut.ac.id/index.php/fusion/article/view/4631

Ali, A., & Renals, S. (2018). Word Error Rate Estimation for Speech Recognition: e-WER. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2(2014), 20–24. https://doi.org/10.18653/v1/p18-2004

Alonzo, O., Shin, H. V., & Li, D. (2022). Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos. Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. https://doi.org/10.1145/3517428.3544808

Alwi, I. (2015). Kriteria Empirik dalam Menentukan Ukuran Sampel Pada Pengujian Hipotesis Statistika dan Analisis Butir. Formatif: Jurnal Ilmiah Pendidikan MIPA, 2(2), 140–148. https://doi.org/10.30998/formatif.v2i2.95

Aprilliana, G., & Efendi, R. (2022). Penggunaan Aplikasi Capcut Untuk Meningkatkan Keterampilan Menulis Teks Iklan Pada Siswa Kelas VIII SMPN 4 Jampangtengah Kabupaten Sukabumi. Triangulasi: Jurnal Pendidikan Kebahasaan, Kesastraan, Dan Pembelajaran, 2(2), 48–53. https://doi.org/10.55215/triangulasi.v2i2.6732

Candra, P., Soepriyanto, Y., & Praherdhiono, H. (2020). Pedagogical Knowledge (PK) Guru Dalam Pengembangan dan Implementasi Rencana Pembelajaran. JKTP: Jurnal Kajian Teknologi Pendidikan, 3(2), 166–177. https://doi.org/10.17977/um038v3i22020p166

Creswell, J. W., & Creswell, J. D. (2018). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. In SAGE Publication (Fifth Edit). SAGE Publication.

Cuevas-Alonso, M., & Tagarro, P. M. (2024). Redefining Language Education in the AI Era: Challenges, Opportunities and Perspectives. In C. Hervás-Gómez, M. D. Díaz Noguera, & F. Sánchez Vera (Eds.), The Education Revolution through Artificial Intelligence. Enhancing Skills, Safeguarding Rights, and Facilitating Human-Machine Collaboration (Issue Octaedro). Editorial Octaedro. https://doi.org/10.36006/09651-1

Eftekhari, H. (2024). Transcribing in the digital age: qualitative research practice utilizing intelligent speech recognition technology. European Journal of Cardiovascular Nursing, 23(5), 553–560. https://doi.org/10.1093/eurjcn/zvae013

Emara, I. F., & Shaker, N. H. (2024). The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy. Speech Communication, 157, 103038. https://doi.org/10.1016/J.SPECOM.2024.103038

Ferdiansyah, D., & Aditya, C. S. K. (2024). Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper. Jurnal Teknik Elektro Dan Komputer TRIAC, 11(1), 11–16. https://doi.org/10.21107/triac.v11i1.24332

Firmansyah, B. A., & Bachtiar, F. B. (2021). Automatic Speech Recognition Bahasa Indonesia menggunakan Unidirectional Gated Recurrent Unit. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 5(12), 5180–5187. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/10200

Fitriyana, H., Setyosari, P., & Ulfa, S. (2021). Analisis Kemampuan Technological Knowledge Calon Guru Sekolah Dasar. JKTP: Jurnal Kajian Teknologi Pendidikan, 4(4), 348–357. https://doi.org/10.17977/um038v4i42021p348

Goldrick, M., Keshet, J., Gustafson, E., Heller, J., & Needle, J. (2016). Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production. Cognition, 149, 31–39. https://doi.org/10.1016/J.COGNITION.2016.01.002

Hilmes, B., Rossenbach, N., & Schlüter, and R. (2024). On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures. https://doi.org/10.21437/SynData4GenAI.2024-10

Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., & Okuno, H. G. (2015). Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models. IEEE/ACM Transactions on Audio Speech and Language Processing, 23(2), 373–382. https://doi.org/10.1109/TASLP.2014.2387414

Iosifova, O., Iosifov, I., Sokolov, V., Romanovskyi, O., & Sukaylo, I. (2021). Analysis of automatic speech recognition methods. CEUR Workshop Proceedings, 2923, 252–257. https://ceur-ws.org/Vol-2923/paper27.pdf

Jollyta, D., & Oktarina, D. (2020). Tinjauan Kasus Model Speech Recognition: Hidden Markov Model. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 6(2), 202–209. https://jurnal.untan.ac.id/index.php/jepin/article/view/39231

Keshet, J. (2018). Automatic speech recognition: A primer for speech-language pathology researchers. International Journal of Speech-Language Pathology, 20(6), 599–609. https://doi.org/10.1080/17549507.2018.1510033

Kuhn, K., Kersken, V., Reuter, B., Egger, N., & Zimmermann, G. (2023). Measuring the Accuracy of Automatic Speech Recognition Solutions. ACM Transactions on Accessible Computing, 16(4), 1–23. https://doi.org/10.1145/3636513

Kuhn, K., Kersken, V., Reuter, B., Egger, N., & Zimmermann, G. (2024). Measuring the Accuracy of Automatic Speech Recognition Solutions. ACM Transactions on Accessible Computing, 16(4), 1–23. https://doi.org/10.1145/3636513

Luchs, M. G., Swa, S., & Griffin, A. (2015). Design Thinking - New Product Development Essentials from the PDMA. Wiley.

Manu, G. A., & Masan, P. L. (2020). Aplikasi Text To Speech Untuk Meningkatkan Pembelajaran Bahasa Inggris Bagi Siswa Disabilitas. Jurnal Pendidikan Teknologi Informasi (JUKANTI), 3(2), 17–26. https://doi.org/10.37792/jukanti.v3i2.217

Markl, N., & Lai, C. (2021). Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation. Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing, 34–40. https://aclanthology.org/2021.hcinlp-1.6/

Mattoon, J. S. (2005). Designing and Developing Technical Curriculum: Finding the Right Subject Matter Expert. Journal of STEM Teacher Education, 42(2), 61–76. https://ir.library.illinoisstate.edu/jste/vol42/iss2/5/

Mayer, R. E. (2009). Multimedia Learning. Cambridge University Press.

Meyer, A., Rose, D. H., & David Gordon. (2014). Universal Design for Learning Theory and Practice. In CAST Professional Publishing. CAST Professional Publishing.

Michelsanti, D., Tan, Z. H., Zhang, S. X., Xu, Y., Yu, M., Yu, D., & Jensen, J. (2021). An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 1368–1396. https://doi.org/10.1109/TASLP.2021.3066303

Microsoft Learn. (2025). Test accuracy of a custom speech model - Speech service - Azure AI services | Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-evaluate-data?pivots=ai-foundry-portal

Mishra, P., & Koehler, M. J. (2006). Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge. Teachers College Record: The Voice of Scholarship in Education, 108(6), 1017–1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x

Peraturan Menteri Pendayagunaan Aparatur Negara Dan Reformasi Birokrasi Republik Indonesia Nomor 28 Tahun 2017 Tentang Jabatan Fungsional Pengembang Teknologi Pembelajaran. https://peraturan.bpk.go.id/Details/132624/permen-pan-rb-no-28-tahun-2017

Peraturan Pemerintah Republik Indonesia Nomor 13 Tahun 2020 Tentang Akomodasi Yang Layak Untuk Peserta Didik Penyandang Disabilitas. https://peraturan.bpk.go.id/Details/132596/pp-no-13-tahun-2020

Putra, A., Eva Sri Gumilang, Lukmannul Haqim Lubay, Dian Budiana, & Gano Sumarno. (2024). Bentuk Komunikasi Guru dalam Proses Pembelajaran Pendidikan Jasmani pada Siswa Disabilitas Tunarungu di SLB Kota Bandung. Jurnal Mahasiswa Pendidikan Olahraga, 4(2), 419–429. https://doi.org/10.55081/jumper.v4i2.1655

Putra, M. M. I., Sompie, S. R. U. A., & Paturusi, S. (2020). Implementasi Speech Recognition pada Aplikasi Pembelajaran Bahasa Inggris untuk Anak. Jurnal Teknik Informatika, 15(4), 247–256. https://ejournal.unsrat.ac.id/index.php/informatika/article/view/30426

Reddy, V. M., Vaishnavi, T., & Kumar, K. P. (2023). Speech-to-Text and Text-to-Speech Recognition Using Deep Learning. Proceedings of the 2nd International Conference on Edge Computing and Applications, ICECAA 2023, 657–666. https://doi.org/10.1109/ICECAA58104.2023.10212222

Rizki, N., Asriwijiastuti, & Budiyanto. (2024). Pengembangan Media Audio Visual Animasi Gunung Berapi dalam Pembelajaran Sains bagi Penyandang Disabilitas Intelektual. GRAB KIDS: Journal of Special Education Need, 3(2), 73–76. https://doi.org/10.26740/gkjsen.v3i2.28299

Rong, Z. (2024). Application of Natural Language Processing in Virtual Experience AI Interaction Design. Journal of Intelligent Learning Systems and Applications, 16(04), 403–417. https://doi.org/10.4236/jilsa.2024.164020

Rukminingsih, Adnan, G., & Latief, M. A. (2020). Metode Penelitian Pendidikan. Penelitian Kuantitatif, Penelitian Kualitatif, Penelitian Tindakan Kelas. Erhaka Utama.

Salamun, S., Sukri, S., Amin, K., Elvitaria, L., & Trisnawati, L. (2022). Artificial Intelligence Automatic Speech Recognition (ASR) untuk pencarian potongan ayat Al-Quran. Jurnal Komputer Terapan, 8(1), 36–45. https://doi.org/10.35143/jkt.v8i1.5299

Srivastava, S., Varshney, A., Katyal, S., Kaur, R., & Gaur, V. (2021). A smart learning assistance tool for inclusive education. Journal of Intelligent & Fuzzy Systems, 40(6), 11981–11994. https://doi.org/10.3233/JIFS-210075

Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation. Frontiers in Psychology, 14(August). https://doi.org/10.3389/fpsyg.2023.1210187

Syaifuddin, M. C., Kharisma, A. P., & Akbar, M. A. (2019). Pengembangan Aplikasi Pembelajaran Pengucapan Bahasa Inggris Berbasis Android Menggunakan Automatic Speech Recognizer (ASR). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(2), 1741–1748. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4558

Tanra, I., Hasudungan, A. O., Alfianto, A. S., Purnama, L. P., & Sutikno, Y. (2024). Peningkatan Pemberdayaan Pembelajaran Penyandang Disabilitas Netra melalui PeTra (Pena BerceriTra ): Inovasi Teknologi untuk Aksesibilitas dan Kemandirian Literasi. Jurnal Pengabdian Masyarakat, 6(2), 76–81. https://doi.org/10.24853/jpmt.6.2.76-81

Undang-Undang Republik Indonesia Nomor 8 Tahun 2016 Tentang Penyandang Disabilitas. https://peraturan.bpk.go.id/Details/37251/uu-no-8-tahun-2016

Vogel, A. P., & Morgan, A. T. (2009). Factors affecting the quality of sound recording for speech and voice analysis. International Journal of Speech-Language Pathology, 11(6), 431–437. https://doi.org/10.3109/17549500902822189

Waibel, A., Behr, M., Yaman, D., Eyiokur, F. I., Nguyen, T. N., Mullov, C., Demirtas, M. A., Kantarci, A., Constantin, S., & Ekenel, H. K. (2023). Face-Dubbing++: LIP-Synchronous, Voice Preserving Translation Of Videos. ICASSPW 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. https://doi.org/10.1109/ICASSPW59220.2023.10193719

Wald, M., & Bain, K. (2008). Universal access to communication and learning: the role of automatic speech recognition. Universal Access in the Information Society, 6(4), 435–447. https://doi.org/10.1007/s10209-007-0093-9

Yu, D., & Deng, L. (2015). Automatic Speech Recognition. In Springer. Springer London. https://doi.org/10.1007/978-1-4471-5779-3

Downloads

Published

2025-05-30

How to Cite

Giovanni, N., & Nurhayat, W. (2025). The Effectiveness Of Automatic Speech Recognition In The Capcut Application For Developing Inclusive Learning Media. JKTP: Jurnal Kajian Teknologi Pendidikan, 8(2), 157–169. https://doi.org/10.17977/um038v8i22025p157

Issue

Section

Articles