An Overview of the Institutional Dynamics of Speech Transcription

Posted June 10, 2026

In Issue 1/2026

Book review: Deamer, F., Fraser, H., Haworth, K., Komter, M., Loakes, D., Richardson, E., eds. (2024). Capturing talk: The institutional practices surrounding the transcription of spoken language. Frontiers Media SA. doi: 10.3389/978-2-8325-4933-9

Synopsis of this intriguing title

In Capturing talk: The institutional practices surrounding the transcription of spoken language, a collection of 16 articles is displayed with a common goal: recognising transcription as a complex, high-stakes practice deserving rigorous research and professional standards. To do so, the authors emphasise that transcriptions, rather than copies, are representations of speech and demonstrate how the expertise of professional transcribers, work conditions, context and institutional constraints strongly influence their reliability.

First, there is an editorial introduction of the research topic and the definition of transcription as an “entextualisation” – that is, the process of representing spoken language as written text. In addition, Deamer et al. highlight that transcription – besides not being a mechanical process – poses the question of power imbalance in institutional transcription practices when the transcriber has absolute control on the speech-to-text process and the actual speaker does not. Furthermore, context is key in the interpretation of the transcript, which leads to subjectivity in the interpretation of the meaning expressed in the speech.

Is institutional transcription of sufficient quality?

When it comes to covert recordings used in court, David Gilbert and Georgina Heydon’s article “Translated Transcripts from Covert Recordings Used for Evidence in Court: Issues of Reliability” analyses Vietnamese-to-English translated transcripts used in Australian drug trials. The authors identify translation errors related to mistranslations, omissions and distortions undetected in court, which implies systemic deficiencies in training, certification and judicial procedures.

In “Specifying Challenges in Transcribing Covert Recordings: Implications for Forensic Transcription”, Robbie Love and David Wright examine how multiple trained transcribers produce highly divergent transcripts of poor-quality covert recordings, possibly related to low agreement rates. Discrepancies between official transcripts and actual recorded interactions in governmental organisations are also observed in an article by Alex Holder et al., “Doing the Organization’s Work—Transcription for All Practical Governmental Purposes”.

Rebecca Milne et al. compare audio-recorded witness interviews in England and Wales with the written statements produced by police officers in their article “From Verbal Account to Written Evidence: Do Written Statements Generated by Officers Accurately Represent What Witnesses Say?”. As mentioned in the previous studies, authors also identify omissions, additions and distortions, which raises concerns about the reliability of written witness statements and accurate representations of spoken accounts. Similarly, “The Benefits of a Jeffersonian Transcript”, by Song Hee Park and Alexa Hepburn, compares a standard orthographic transcript and a detailed Jeffersonian transcript to determine how the interpreting meaning can change when pauses, overlaps and intonation are erased.

In “The Influence of Police Reporting Styles on the Processing of Crime Related Information”, Anita Eerland and Tessa van Charldorp show how statements are assessed in terms of readers’ perception of clarity and accuracy, which can shape legal judgements. Likewise, in “Does Automatic Speech Recognition (ASR) Have a Role in the Transcription of Indistinct Covert Recordings for Forensic Purposes?”, Debbie Loakes reflects on the limited reliability of ASR systems in forensic contexts.

Considering this, “A Framework for Deciding How to Create and Evaluate Transcripts for Forensic and Other Purposes” is a pertinent article in which Helen Fraser proposes a systematic model for evaluating transcript reliability to determine suitability. Martha Komter then offers a comparative perspective in her article “Dutch Institutional and Academic Transcripts of Police Interrogations”. In her article, she highlights how institutional goals influence transcription choices.

Based on a national survey of translators and interpreters working in forensic cases, Miranda Lai’s article “Transcribing and translating forensic speech evidence containing foreign languages—An Australian perspective” identifies weaknesses in current practices and calls for guidelines, training and quality assurance. Soon after, the publication features a Corrigendum to the article by the same author.

Lauren Harrington tests ASR systems on police interviews in her article “Incorporating automatic speech recognition methods into the transcription of police-suspect interviews: factors affecting automatic performance”. The article suggests that ASR might be useful for a first draft, provided human review ensures accuracy since performance remains imperfect. Subsequently, Eero Voutilainen reflects on this topic in his article titled “Written representation of spoken interaction in the official parliamentary transcripts of the Finnish Parliament”. His analysis highlights the complex editorial processes used to balance readability, accuracy and institutional standards.

“The complexity of situated text design: a negotiation between standardization and spoken language in a manufacturing company” is an article written by Anna-Lena Carlsson and Natalia Svensson Harari focused on instruction manuals created in a Swedish manufacturing company. Here, the authors explain how workers negotiate between spoken practices and standardised written forms.

Kate Haworth et al., authors of the article “‘For the Record’: applying linguistics to improve evidential consistency in police investigative interview records”, report findings from ongoing research into official transcripts of police interviews in England and Wales. The paper argues that accuracy, neutrality and consistency should be foundational principles in transcript production.

Finally, in “Automatic speech recognition and the transcription of indistinct forensic audio: how do the new generation of systems fare?”, Debbie Loakes assesses newer ASR systems, such as OpenAI’s Whisper model. Even though performance has improved, accuracy in forensic audio is still limited, which reinforces the need for human oversight.

A plea for good practice

Navigating through legal, parliamentary and technological contexts, this volume demonstrates that transcription is never objective: institutional power, methodological choices, available resources, technology and human interpretation inevitably shape the final product. As mentioned before, every article, despite coming from different perspectives and contexts, recognises transcription as a complex practice that requires rigorous research, transparency and professional standards.

Even if the publication could benefit from additional disciplinary perspectives, the uniformity shown among authors implies that the recurring challenges found in transcription are systematic. Therefore, this collection not only exposes how improvements are needed in current transcription practice, which tends to rely heavily on technology for ideological and economic reasons, but also underscores the need for clearer guidelines and common solutions.

Dr. Luz Belenguer Cortés is a linguist, a live subtitler and an audio describer in the Valencian TV station À Punt (Spain) and a part-time lecturer at Universitat Jaume I.

Synopsis of this intriguing title

Is institutional transcription of sufficient quality?

A plea for good practice

Leave a Comment Cancel reply

Leave a Comment
Cancel reply