The Effectiveness Of Automatic Speech Recognition In The Capcut Application For Developing Inclusive Learning Media
DOI:
https://doi.org/10.17977/um038v8i22025p157Keywords:
Speech-to-Text, Automatic Speech Recognition, Word Error Rate, Learning Media DevelopmentAbstract
Deaf individuals face challenges in accessing audio information, necessitating technology that can generate accurate and easily understandable automatic captions. However, studies remain limited in systematically evaluating the accuracy of Automatic Speech Recognition (ASR) technology in the Indonesian language context. This study aims to analyze the effectiveness of the ASR feature in the CapCut application in generating automatic captions for educational media. To evaluate how well the ASR feature in the CapCut app creates automatic captions for educational videos, nine videos from one e-learning platform were chosen and compared by looking at the automatic transcriptions from CapCut alongside manual transcriptions. Jiwer software and OpenAI Whisper normalized the data before analyzing it to calculate the Word Error Rate (WER). The results show an average WER of 3.08%, categorized as excellent; however, content analysis revealed the need for minor manual editing of technical terms and sentence structure to enhance readability. CapCut’s ASR feature could be a strategic solution for creating effective and accessible educational media, and it could also serve as a guide for building automatic captioning systems.
Abstrak
Penyandang disabilitas tuli menghadapi tantangan dalam mengakses informasi audio sehingga dibutuhkan teknologi yang mampu menghasilkan takarir otomatis yang akurat dan mudah dipahami. Namun, studi yang secara sistematis mengevaluasi akurasi teknologi Automatic Speech Recognition (ASR) dalam konteks bahasa Indonesia masih terbatas. Penelitian ini bertujuan untuk menganalisis efektivitas fitur ASR pada aplikasi CapCut dalam menghasilkan takarir otomatis untuk media pembelajaran. Dengan pendekatan evaluatif kuantitatif, sembilan video pembelajaran dipilih dari satu e-learning untuk dianalisis melalui perbandingan antara hasil transkripsi otomatis CapCut dan transkripsi manual. Data dinormalisasi sebelum dianalisis menggunakan perangkat lunak Jiwer dan OpenAI Whisper untuk menghitung nilai Word Error Rate (WER). Hasil menunjukkan rata-rata WER sebesar 3,08% yang termasuk kategori sangat baik namun analisis konten mengungkap perlunya sedikit penyuntingan manual pada istilah teknis dan struktur kalimat untuk meningkatkan keterbacaan. Dengan demikian, ASR CapCut berpotensi menjadi solusi strategis dalam pengembangan media pembelajaran yang efisien dan inklusif serta menjadi acuan dalam pengembangan sistem takarir otomatis.
References
Abdullah, N., & Arief Muhsin, M. (2025). How Capcut Application Complete Video Assignment: A Study of Students Perception In Higher Education In Indonesia. Forum for University Scholars in Interdisciplinary Opportunities and Networking, 212–222. https://conference.ut.ac.id/index.php/fusion/article/view/4631
Ali, A., & Renals, S. (2018). Word Error Rate Estimation for Speech Recognition: e-WER. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2(2014), 20–24. https://doi.org/10.18653/v1/p18-2004
Alonzo, O., Shin, H. V., & Li, D. (2022). Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos. Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. https://doi.org/10.1145/3517428.3544808
Alwi, I. (2015). Kriteria Empirik dalam Menentukan Ukuran Sampel Pada Pengujian Hipotesis Statistika dan Analisis Butir. Formatif: Jurnal Ilmiah Pendidikan MIPA, 2(2), 140–148. https://doi.org/10.30998/formatif.v2i2.95
Aprilliana, G., & Efendi, R. (2022). Penggunaan Aplikasi Capcut Untuk Meningkatkan Keterampilan Menulis Teks Iklan Pada Siswa Kelas VIII SMPN 4 Jampangtengah Kabupaten Sukabumi. Triangulasi: Jurnal Pendidikan Kebahasaan, Kesastraan, Dan Pembelajaran, 2(2), 48–53. https://doi.org/10.55215/triangulasi.v2i2.6732
Candra, P., Soepriyanto, Y., & Praherdhiono, H. (2020). Pedagogical Knowledge (PK) Guru Dalam Pengembangan dan Implementasi Rencana Pembelajaran. JKTP: Jurnal Kajian Teknologi Pendidikan, 3(2), 166–177. https://doi.org/10.17977/um038v3i22020p166
Creswell, J. W., & Creswell, J. D. (2018). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. In SAGE Publication (Fifth Edit). SAGE Publication.
Cuevas-Alonso, M., & Tagarro, P. M. (2024). Redefining Language Education in the AI Era: Challenges, Opportunities and Perspectives. In C. Hervás-Gómez, M. D. Díaz Noguera, & F. Sánchez Vera (Eds.), The Education Revolution through Artificial Intelligence. Enhancing Skills, Safeguarding Rights, and Facilitating Human-Machine Collaboration (Issue Octaedro). Editorial Octaedro. https://doi.org/10.36006/09651-1
Eftekhari, H. (2024). Transcribing in the digital age: qualitative research practice utilizing intelligent speech recognition technology. European Journal of Cardiovascular Nursing, 23(5), 553–560. https://doi.org/10.1093/eurjcn/zvae013
Emara, I. F., & Shaker, N. H. (2024). The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy. Speech Communication, 157, 103038. https://doi.org/10.1016/J.SPECOM.2024.103038
Ferdiansyah, D., & Aditya, C. S. K. (2024). Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper. Jurnal Teknik Elektro Dan Komputer TRIAC, 11(1), 11–16. https://doi.org/10.21107/triac.v11i1.24332
Firmansyah, B. A., & Bachtiar, F. B. (2021). Automatic Speech Recognition Bahasa Indonesia menggunakan Unidirectional Gated Recurrent Unit. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 5(12), 5180–5187. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/10200
Fitriyana, H., Setyosari, P., & Ulfa, S. (2021). Analisis Kemampuan Technological Knowledge Calon Guru Sekolah Dasar. JKTP: Jurnal Kajian Teknologi Pendidikan, 4(4), 348–357. https://doi.org/10.17977/um038v4i42021p348
Goldrick, M., Keshet, J., Gustafson, E., Heller, J., & Needle, J. (2016). Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production. Cognition, 149, 31–39. https://doi.org/10.1016/J.COGNITION.2016.01.002
Hilmes, B., Rossenbach, N., & Schlüter, and R. (2024). On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures. https://doi.org/10.21437/SynData4GenAI.2024-10
Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., & Okuno, H. G. (2015). Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models. IEEE/ACM Transactions on Audio Speech and Language Processing, 23(2), 373–382. https://doi.org/10.1109/TASLP.2014.2387414
Iosifova, O., Iosifov, I., Sokolov, V., Romanovskyi, O., & Sukaylo, I. (2021). Analysis of automatic speech recognition methods. CEUR Workshop Proceedings, 2923, 252–257. https://ceur-ws.org/Vol-2923/paper27.pdf
Jollyta, D., & Oktarina, D. (2020). Tinjauan Kasus Model Speech Recognition: Hidden Markov Model. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 6(2), 202–209. https://jurnal.untan.ac.id/index.php/jepin/article/view/39231
Keshet, J. (2018). Automatic speech recognition: A primer for speech-language pathology researchers. International Journal of Speech-Language Pathology, 20(6), 599–609. https://doi.org/10.1080/17549507.2018.1510033
Kuhn, K., Kersken, V., Reuter, B., Egger, N., & Zimmermann, G. (2023). Measuring the Accuracy of Automatic Speech Recognition Solutions. ACM Transactions on Accessible Computing, 16(4), 1–23. https://doi.org/10.1145/3636513
Kuhn, K., Kersken, V., Reuter, B., Egger, N., & Zimmermann, G. (2024). Measuring the Accuracy of Automatic Speech Recognition Solutions. ACM Transactions on Accessible Computing, 16(4), 1–23. https://doi.org/10.1145/3636513
Luchs, M. G., Swa, S., & Griffin, A. (2015). Design Thinking - New Product Development Essentials from the PDMA. Wiley.
Manu, G. A., & Masan, P. L. (2020). Aplikasi Text To Speech Untuk Meningkatkan Pembelajaran Bahasa Inggris Bagi Siswa Disabilitas. Jurnal Pendidikan Teknologi Informasi (JUKANTI), 3(2), 17–26. https://doi.org/10.37792/jukanti.v3i2.217
Markl, N., & Lai, C. (2021). Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation. Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing, 34–40. https://aclanthology.org/2021.hcinlp-1.6/
Mattoon, J. S. (2005). Designing and Developing Technical Curriculum: Finding the Right Subject Matter Expert. Journal of STEM Teacher Education, 42(2), 61–76. https://ir.library.illinoisstate.edu/jste/vol42/iss2/5/
Mayer, R. E. (2009). Multimedia Learning. Cambridge University Press.
Meyer, A., Rose, D. H., & David Gordon. (2014). Universal Design for Learning Theory and Practice. In CAST Professional Publishing. CAST Professional Publishing.
Michelsanti, D., Tan, Z. H., Zhang, S. X., Xu, Y., Yu, M., Yu, D., & Jensen, J. (2021). An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 1368–1396. https://doi.org/10.1109/TASLP.2021.3066303
Microsoft Learn. (2025). Test accuracy of a custom speech model - Speech service - Azure AI services | Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-evaluate-data?pivots=ai-foundry-portal
Mishra, P., & Koehler, M. J. (2006). Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge. Teachers College Record: The Voice of Scholarship in Education, 108(6), 1017–1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x
Peraturan Menteri Pendayagunaan Aparatur Negara Dan Reformasi Birokrasi Republik Indonesia Nomor 28 Tahun 2017 Tentang Jabatan Fungsional Pengembang Teknologi Pembelajaran. https://peraturan.bpk.go.id/Details/132624/permen-pan-rb-no-28-tahun-2017
Peraturan Pemerintah Republik Indonesia Nomor 13 Tahun 2020 Tentang Akomodasi Yang Layak Untuk Peserta Didik Penyandang Disabilitas. https://peraturan.bpk.go.id/Details/132596/pp-no-13-tahun-2020
Putra, A., Eva Sri Gumilang, Lukmannul Haqim Lubay, Dian Budiana, & Gano Sumarno. (2024). Bentuk Komunikasi Guru dalam Proses Pembelajaran Pendidikan Jasmani pada Siswa Disabilitas Tunarungu di SLB Kota Bandung. Jurnal Mahasiswa Pendidikan Olahraga, 4(2), 419–429. https://doi.org/10.55081/jumper.v4i2.1655
Putra, M. M. I., Sompie, S. R. U. A., & Paturusi, S. (2020). Implementasi Speech Recognition pada Aplikasi Pembelajaran Bahasa Inggris untuk Anak. Jurnal Teknik Informatika, 15(4), 247–256. https://ejournal.unsrat.ac.id/index.php/informatika/article/view/30426
Reddy, V. M., Vaishnavi, T., & Kumar, K. P. (2023). Speech-to-Text and Text-to-Speech Recognition Using Deep Learning. Proceedings of the 2nd International Conference on Edge Computing and Applications, ICECAA 2023, 657–666. https://doi.org/10.1109/ICECAA58104.2023.10212222
Rizki, N., Asriwijiastuti, & Budiyanto. (2024). Pengembangan Media Audio Visual Animasi Gunung Berapi dalam Pembelajaran Sains bagi Penyandang Disabilitas Intelektual. GRAB KIDS: Journal of Special Education Need, 3(2), 73–76. https://doi.org/10.26740/gkjsen.v3i2.28299
Rong, Z. (2024). Application of Natural Language Processing in Virtual Experience AI Interaction Design. Journal of Intelligent Learning Systems and Applications, 16(04), 403–417. https://doi.org/10.4236/jilsa.2024.164020
Rukminingsih, Adnan, G., & Latief, M. A. (2020). Metode Penelitian Pendidikan. Penelitian Kuantitatif, Penelitian Kualitatif, Penelitian Tindakan Kelas. Erhaka Utama.
Salamun, S., Sukri, S., Amin, K., Elvitaria, L., & Trisnawati, L. (2022). Artificial Intelligence Automatic Speech Recognition (ASR) untuk pencarian potongan ayat Al-Quran. Jurnal Komputer Terapan, 8(1), 36–45. https://doi.org/10.35143/jkt.v8i1.5299
Srivastava, S., Varshney, A., Katyal, S., Kaur, R., & Gaur, V. (2021). A smart learning assistance tool for inclusive education. Journal of Intelligent & Fuzzy Systems, 40(6), 11981–11994. https://doi.org/10.3233/JIFS-210075
Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation. Frontiers in Psychology, 14(August). https://doi.org/10.3389/fpsyg.2023.1210187
Syaifuddin, M. C., Kharisma, A. P., & Akbar, M. A. (2019). Pengembangan Aplikasi Pembelajaran Pengucapan Bahasa Inggris Berbasis Android Menggunakan Automatic Speech Recognizer (ASR). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(2), 1741–1748. https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/4558
Tanra, I., Hasudungan, A. O., Alfianto, A. S., Purnama, L. P., & Sutikno, Y. (2024). Peningkatan Pemberdayaan Pembelajaran Penyandang Disabilitas Netra melalui PeTra (Pena BerceriTra ): Inovasi Teknologi untuk Aksesibilitas dan Kemandirian Literasi. Jurnal Pengabdian Masyarakat, 6(2), 76–81. https://doi.org/10.24853/jpmt.6.2.76-81
Undang-Undang Republik Indonesia Nomor 8 Tahun 2016 Tentang Penyandang Disabilitas. https://peraturan.bpk.go.id/Details/37251/uu-no-8-tahun-2016
Vogel, A. P., & Morgan, A. T. (2009). Factors affecting the quality of sound recording for speech and voice analysis. International Journal of Speech-Language Pathology, 11(6), 431–437. https://doi.org/10.3109/17549500902822189
Waibel, A., Behr, M., Yaman, D., Eyiokur, F. I., Nguyen, T. N., Mullov, C., Demirtas, M. A., Kantarci, A., Constantin, S., & Ekenel, H. K. (2023). Face-Dubbing++: LIP-Synchronous, Voice Preserving Translation Of Videos. ICASSPW 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. https://doi.org/10.1109/ICASSPW59220.2023.10193719
Wald, M., & Bain, K. (2008). Universal access to communication and learning: the role of automatic speech recognition. Universal Access in the Information Society, 6(4), 435–447. https://doi.org/10.1007/s10209-007-0093-9
Yu, D., & Deng, L. (2015). Automatic Speech Recognition. In Springer. Springer London. https://doi.org/10.1007/978-1-4471-5779-3
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Netaniel Giovanni, Wendi Nurhayat

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Jurnal Kajian Teknologi Pendidikan allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles and allow readers to use them for any other lawful purpose. The journal allows the author(s) to hold the copyright without restrictions. Finally, the journal allows the author(s) to retain publishing rights without restrictions.
- Authors are allowed to archive their submitted articles in an open access repository.
- Authors are allowed to archive the final published article in an open access repository with an acknowledgment of its initial publication in this journal.










