In Issue 1/2023

Introduction

Artificial intelligence has taken huge leaps forward during the past year. Following this, the media has been swarming with reports on how it might affect different professions and how it has answered different questions addressed to it. I joined the crowd and asked the popular AI application ChatGPT whether people should use artificial intelligence when they transcribe spoken interaction. Hoping to get a thorough answer, I added: “Give reasons. Use references.” In this article, I show and analyse ChatGPT’s answer, drawing from recent experiences in the field.

ChatGPT’s answer to request “Should people use artificial intelligence when they transcribe spoken interaction? Give reasons. Use references.”

Yes, people should consider using artificial intelligence (AI) when they transcribe spoken interaction, but with caution and a full awareness of the potential benefits and drawbacks. AI offers several advantages, including speed, cost-effectiveness, and scalability. AI-powered transcription systems can transcribe a large amount of spoken content quickly, which can be especially useful in situations such as live events, news reporting or content creation for social media platforms.

Moreover, AI can perform the transcription work without taking breaks and its performance can be improved through continuous learning and training so that the accuracy of the transcription can be guaranteed. However, the accuracy of the AI transcription still needs to be verified through an additional edit of the transcript by a human.

Despite these advantages, AI transcription has some limitations, including its accuracy and understanding of context, particularly in cases involving complex language, such as dialects and slang. These systems can be more error-prone than human transcribers, so it is important to understand this aspect before deciding on the technology.

One important consideration is ensuring the privacy and security of the data being transcribed, especially when sensitive or confidential information is involved. Furthermore, training a transcription model requires a large corpus of diverse, high-quality transcriptions to ensure its accuracy and avoid biases.

Therefore, while AI transcription technology can save time, reduce costs, and improve productivity, it should be used with care and with an understanding of its limitations, given the fact that it has not yet reached full maturity regarding transcription accuracy.

References:

Eckert, W., & Mauch, M. (2019). The Challenges Facing Automatic Music Transcription. IEEE Signal Processing Magazine, 36(1), 20–29. https://doi.org/10.1109/MSP.2018.2873858

Rudin, C. (2019). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x

Wieling, M., Keshet, J., Nguyen, D. N., Chrupała, G., Veldhoen, S., & Bouma, G. (2020). A Review of Evaluation Techniques for Automatic Speech Recognition. Computer Speech and Language, 63, 101052. https://doi.org/10.1016/j.csl.2019.101052

Cautious Optimism

At first glance, ChatGPT’s answer is very thorough and balanced. This is very impressive, considering that the five-paragraph text appeared automatically on my screen about one minute after my question. ChatGPT begins by stating that “Yes, people should consider using artificial intelligence (AI) when they transcribe spoken interaction, but with caution and a full awareness of the potential benefits and drawbacks”. It is clearly a recommendation, but formulated with “should consider using” instead of “should use”. It also suggests using “caution” in the process to avoid “potential drawbacks”. These all add reservations to the recommendation. ChatGPT reinforces them at the end of the answer, stating that, despite AI’s potential to save time, lower costs and increase productivity in transcription, the user should be aware of its limitations because “it has not yet reached full maturity regarding transcription accuracy”.

Pros and Cons

As advantages of AI in professional transcription, ChatGPT lists speed, cost-effectiveness and scalability. It states that AI can transcribe a considerable amount of spoken material quickly and without taking breaks. Because of this, it deems AI especially well suited for “situations such as live events, news reporting and content creation for social media platforms”. Speed, cost-effectiveness and scalability are indeed often mentioned as the benefits of AI solutions in the field, especially when there is a lot of speech to be transcribed in a short time. What ChatGPT does not mention, however, is that the costs of the licenses for commercial programs vary considerably. This means that the cost-effectiveness really depends on the program, the deadline, and the size and quality of the transcription task. Of the examples given by ChatGPT, creating content for social media is a little vague and unclear. However, it might refer to frequent posting of long videos that need subtitles, which is often mentioned as an area where an automatic transcription program could be a useful tool.

After the benefits, ChatGPT points out that there are some disadvantages, such as problems with accuracy and understanding context, particularly when the transcribed material includes “complex language, such as dialects and slang”. It mentions that using AI entails problems in ensuring privacy and security if the data is sensitive or confidential. It also highlights that AI systems “can be more error-prone than human transcribers” and suggests that the accuracy of a transcript is verified by a human. ChatGPT consoles that AI’s performance may improve through “continuous learning and training” – even though it admits that successful training “requires a large corpus of diverse, high-quality transcriptions to ensure its accuracy and avoid biases”.

Like the advantages, the disadvantages mentioned by ChatGPT also connect well with recent discussions within the field of professional transcription. Privacy and safety issues have proved both serious and challenging to solve (Chen, et al., 2022). This is not necessarily a problem when the content is public, as with parliamentary reporting or the subtitling of live events, but with other types of data, such as patient conversations or banking customer service calls, it might be a critical issue which should be addressed carefully.

The accuracy problems that ChatGPT mentions are also frequently shared between transcription professionals. Even though different sorts of software for automatic speech recognition (ASR) have been around for a long time, they are still greatly affected by, for example, bad sound quality, unclear pronunciation and – as is mentioned in the answer – non-standard speech such as dialects and slang. What is not mentioned by ChatGPT is that the aims and criteria of transcription professions vary considerably and that many of them require a lot of careful and specialised human editing after the initial draft by AI. These professions include live events, news reporting and social media platforms, where there have been complaints of low-quality subtitles by unsupervised AI. The answer above also excludes the fact that the quality of automatic transcription can vary between languages. In addition, transcripts in some fields, such as conversation analysis, include special details and conventions that haven’t been successfully addressed with AI solutions (see Harjunpää & Kaikkonen, 2020).

Regardless of the disadvantages mentioned by ChatGPT, the quality of automatic transcription products has increased considerably during the last few years. Even many parliaments, which are known for being demanding about the quality of their official reports, have started using ASR in their transcription processes (e.g. Kawahara, Ueno & Morikawa, 2020; Varisto & Kuronen, 2022). Some others have been fairly optimistic after first trials (e.g. Lombard, 2022; Kerr, 2022). However, some parliaments such as the Tweede Kamerin the Netherlands have delayed or withdrawn from using ASR after problematic test results (e.g. Schelhaas, 2021).

Mysterious References

ChatGPT supports its answer with three references, as requested. They are not included in the text, so it’s not clear what parts of the answer they refer to. Furthermore, when I tried to check the references, I could find only the second one online. The other two remain inaccessible and, based on my search, it’s unclear to me whether they even exist. The DOI identifier given in the first reference leads nowhere, and the other one leads to a completely different article. This raises some questions about whether all the information in the answer is reliable.

The Human Touch

In my opinion, the automatic answer above by ChatGPT about the usefulness of artificial intelligence when transcribing spoken interaction is surprisingly good. Even though its content is very general, and sometimes vague, and excludes several relevant perspectives on the topic, it nevertheless mentions many of the key benefits and disadvantages of utilising AI in professional transcription. Importantly, it also recognises and emphasises the need for human supervision and intervention in the process.

The need for human supervision and intervention was also very much present in making requests to ChatGPT for this article. The quality of answers given to me by the AI essentially depended on the questions I asked. My first questions about professional reporting and transcription were met with answers about journalism and education. When I asked only about the benefits of AI in transcription, the answer was exclusively positive, with no mention of the disadvantages. In fact, one of the reported benefits was “improved accuracy”, although the answer cited in this article mentions accuracy as a particular challenge for AI solutions. In addition, two inaccessible references given in the answer above raise some concern about the reliability of the answers. ChatGPT has proven with its answer that it can certainly give valuable information about the use of AI in transcription. However, it also requires prior knowledge of the topic to supervise and evaluate the results.

Eero Voutilainen is Tiro’s editor-in-chief.

References

Chen, Y., J. Zhang, X. Yuan, S. Zhang, K. Chen, X. Wang & S. Guo (2022). SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems. – ACM Transactions on Privacy and Security, 25:3. URL: https://dl.acm.org/doi/10.1145/3510582 

Harjunpää, K. & S. Kaikkonen (2020). Conversation analytic transcription – Uncovering resources of social interaction – Tiro 2/2020. URL: https://tiro.intersteno.org/2020/11/conversation-analytic-transcription-uncovering-resources-of-social-interaction/

Kawahara, T., S. Ueno & M. Morikawa (2020). Transcription system using automatic speech recognition in the Japanese Parliament. – Tiro 1/2020. URL: https://tiro.intersteno.org/2020/05/transcription-system-using-automatic-speech-recognition-in-the-japanese-parliament/

Kerr, D. (2022). Automated speech recognition: Ears First! Embracing technological change at the legislative assembly of British Columbia. – Tiro 2/2022. URL: https://tiro.intersteno.org/2022/12/automated-speech-recognition-ears-first-embracing-technological-change-at-the-legislative-assembly-of-british-columbia/

Lombard, C. (2022). Experimenting with automatic speech recognition in the Houses of the Oireachtas (Irish Parliament). – Tiro 1/2022. URL: https://tiro.intersteno.org/2022/07/experimenting-with-automatic-speech-recognition-in-the-houses-of-the-oireachtas-irish-parliament/

Schelhaas, D. (2021). Cloud-based ASR solutions: Some notes for professional reporters. – Tiro 2/2021. URL: https://tiro.intersteno.org/2021/12/cloud-based-asr-solutions-some-notes-for-professional-reporters/

 Varisto, N. & R. Kuronen (2022). A Good servant but a bad master. Introducing ASR at the Parliament of Finland. – Tiro 2/2022 URL: https://tiro.intersteno.org/2022/12/a-good-servant-but-a-bad-master-introducing-asr-at-the-parliament-of-finland/

Showing 2 comments
pingbacks / trackbacks

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.