User-centred Speech-to-text Interpreting: How to Enhance the Readability of Live Texts

Posted June 10, 2026

In Issue 1/2026

Introduction

Speech-to-text interpreters provide live text that makes real-life communicative events accessible for various demographic groups. Live text fills in the blanks and reduces hearing stress as well as the cognitive load that comes with compensating for hearing loss. It also supports second or third language comprehension. However, having to read text at the speed of the spoken word also creates cognitive load. Consequently, speech-to-text interpreters need to aim not only for speed and completeness but also for readability. This article discusses strategies for producing readable live text.

Hearing impairment and speech-to-text interpreting

A study by the German Association of the Hard of Hearing (DSB 2018) finds 21.3% of Germans over 14 – well over 15 million people – to be hard of hearing, with 8.8% of them severely hearing impaired or D/deaf. These figures are in line with a report by the European Association of the Hard of Hearing (EFOH 2024), according to which about 10% of the European population under the age of 65 and 20% including those 65 and over self-report hearing loss. With around 200,000 people who are D/deaf or severely hard of hearing in Germany and who, according to the Federal German Agency for Accessibility, are sign language users (Bundesstelle Barrierefreiheit, undated), that makes for five in six spoken language users among people with very limited or no spoken language understanding. They employ a tactical mix of residual hearing, lip reading, context awareness, anticipation, collocations and often message guessing in order to understand their communication partners, while using spoken language for their own communication.

These so-called “hearing tactics” vary in effectiveness depending on a wide range of factors, including the degree and kind of hearing loss, age of onset, acoustic environments, familiarity with a given topic and many more. Moreover, employing such hearing tactics will always result in hearing stress, causing fatigue and headaches and generally compromising content comprehension as well as, eventually, communication and participation. For spoken language-oriented persons with hearing impairments, hearing comprehension will always create a mental load that preys on cognitive capacities usually directed towards content comprehension, creating barriers in every aspect of life, from education through occupation to recreation.

Through real-time rendition of spoken speech as written text, speech-to-text interpretation (STTI) provides an effective communication aid, very similar to sign language interpretation for the deaf. Speech-to-text interpreting can be performed intralingually or interlingually, to facilitate real-time communication for persons with hearing impairment and limited listening comprehension of the spoken language of a communicative event – or other kinds of impairments – on the one hand, and for the general public on the other (Stuckless 1994; Wagner 2005; Gerzymisch-Arbogast 2005, 2008, 2013; Zethsen 2009; Platter 2015; Stinson 2015; Pöchhacker 2018, 2023; Eugeni 2020; Eugeni and Gambier 2023; Eichmeyer-Hell 2026). The product of this type of interpreting act is called “live text”. The output is presented on a screen on an individual device as auto-scrolling text that moves along with the spoken source text.

Readability

However, decoding live text in turn locks cognitive and visual capacities that are simultaneously accessed and engaged for hearing comprehension. Consequently, in order to maximise the effect of their efforts, speech-to-text interpreters need to produce a target text that reduces the additional cognitive load introduced into the communicative setting.

Strategies for producing comfortable-to-read text range from adhering to the orthographic and grammatical standards of the target language to using legible fonts and bright, high-contrast displays. In fact, Eugeni’s 2008 study reveals that comprehensibility of a live text improves according to the degree of its adaption. This is why STTI offers a broad band of how a live text can be produced: near verbatim, “smoothened” (which is the most frequently applied way in STTI), syntactically reformulated, lexically reformulated, or interpreted into easy or plain language (Eichmeyer-Hell, forthcoming).

However, real-time production of written text at speed is always error-prone, with errors ranging from typing or recognition errors to misunderstandings and omissions. While STTInterpreters – an abbreviation for “speech-to-text interpreters”, coined by Ursula Stachl-Peier, University of Graz, at DigiECOS 2020 – are in a unique position among interpreters in that they can edit and correct their own as well as their co-interpreter’s text output on the fly, a technique known as “self- and co-editing”, visible manipulations of text already on display may reduce readability and also add unwanted cognitive load.

Based on research and workshops conducted by the German Association of Professional Speech-to-Text Interpreters (BSD), we can now propose a framework for targeted, readability-oriented self- and co-editing tactics in speech-to-text interpreting.

Users’ opinions on live editing

During two weekend seminars organised by the German Association of the Hard of Hearing (DSB) in the fall of 2021, two interpreter members of the board of the German Association of STTInterpreters recorded several minutes-long segments of the live text and later played them back to the participants in a “thinking out loud” session, a method commonly used in software usability testing. While the participants universally welcomed edits and corrections as a means of increasing the readability and reliability of the live text, they criticised two aspects that impede readability and effectiveness: the fact that changes further up in the text “jiggle” the moving text at the place of production – causing readers to lose focus, especially when they are looking back and forth between the live text and other visual information – and the fact that edits further up in the text are hard to trace and it remains unclear what has been changed into what, thus defeating the effort.

In order to mitigate these issues, to produce a more readable text and to reduce hearing stress in users of STTI, we suggest the following tactics:

When using dictation software (user-dependent speech-recognition), use strict chunking at about three to five words per chunk or along punctuation marks, combined with co-edits at the front of the running text – i.e. within the last chunk.
When using the keyboard-input method, make no self-correction on the level of simple typing errors, as the meaning will be clear to the users anyway.
Frequently insert line breaks, with visual spaces between paragraphs. In speech-to-text interpreting, paragraphs do not necessarily structure a text on the semantic level but provide a visual guide for the reader who follows a scrolling live text. Shorter paragraphs make it easier for the co-interpreter to make corrections and additions on the content level without creating “jiggle” – e.g. “you, too, Arthur” for “you two are there”.
Co-correct minor errors that affect readability more than comprehension in the least intrusive way possible, which usually means selecting individual characters and typing over them, or selecting individual words and typing or dictating over them if the character count of the new word is higher or lower than the old word by one or two.
Clearly mark unreliable passages (with misspellings or possibly misheard words) during production using “(?)” after the word or phrase in question, and in co-editing using “(!)” after the corrected version.
Clearly mark omissions during production using “(…)”, and in co-editing enclosing additions in parentheses with the marker “ADD”.
In co-editing, insert corrections on the semantic level after the incorrect text enclosed in parentheses with the marker “CORR”. For correcting grammar or spelling, CORR would not be used.
In self-editing, correct minor and very minor errors only when the speech pauses.
In self-editing, mark corrections on the semantic level by using all caps.
Waive minor corrections while focusing on reliability and speed whenever the reader’s reading competency warrants this approach.

The appropriate use of these tactics hinges on a solid grasp of the common error types in STTInterpreting as outlined in the WIRA model. The WIRA model (Eichmeyer-Hell, forthcoming, presented first at the annual BDÜ Convention, 2019) focuses on the communication-facilitating qualities of the live text, the completeness of the ideas rendered and the absence of errors in the rendition, where the latter at best affects the readability and in the worst cases distorts the content. Errors are categorised, depending on their severity, from very minor (e.g. punctuation missing from the end of an intervention in turn-taking, or missing or transposed letters) to critical (e.g. the omission of the word “no”, which reverses the idea of meaning unit). If and how an edit or correction has to be made depends on the severity of the error relative to its position on the WIRA scale.

Conclusion

Self- and co-editing is an invaluable tool for producing reliable and readable live text in speech-to-text interpreting – if used discriminatingly, unobtrusively and transparently – to meet users’ requirements and expectations. Further research is needed on the merits of highlighting edits and corrections using (ADD) and (CORR) to determine whether the increase in transparency outweighs the nuisance of a “jiggly” live text.

Daniela Eichmeyer-Hell is a practi-searcher in the field of language services holding academic qualifications in business and management and languages. She is currently pursuing a PhD in transcultural communication at the University of Vienna. She is a state-certified speech-to-text interpreter and board member of the Austrian Association for Speech-to-Text Interpreting and chair of the Bavarian Association of STTInterpreters. Her research focuses on multimodal communication and accessibility, with an emphasis on speech-to-text interpreting and simultaneous interpreting into easy and plain language.

Anja Rau is a state-certified speech-to-text interpreter and member of the board of the Professional Association of German Speech-to-Text Interpreters (BSD). She holds a PhD in English Literature with a thesis on new narrative genres in the new media.

References

Bundesfachstelle Barrierefreiheit (undated). Gebärdensprache. URL: https://www.bundesfachstelle-barrierefreiheit.de/DE/Fachwissen/Information-und-Kommunikation/Gebaerdensprache (28.04.2026)

DSB – Deutscher Schwerhörigenbund e.V (German Association of the Hard of Hearing) (2018). Statistiken. URL: https://schwerhoerigen-netz.de/oeffentlichkeitsarbeit/statistiken/ (28.04.2026)

EFHOH – European Federation of the Hard of Hearing (2024). Getting the numbers right on Hearing Loss, Hearing Care and Hearing Aid Use in Europe – Report 2024 (28.04.2026)

Eichmeyer-Hell, D. (2026). Speech-to-text Interpreting – Intersection of Interpreting Technology, and Accessibility, G. Corpas Pastor and C. M. Hidalgo-Ternero (Eds.). Perspectives on Technology and Interpreting: Advances in Automation and Artificial Intelligence. London: Routledge.

Eichmeyer-Hell, D. (forthcoming). Schriftdolmetschen – Realisierungsformen im qualitätsorientierten Vergleich. PhD Thesis. University of Vienna.

Eugeni, C. (2008). Respeaking the TV for the Deaf: For a Real Special Needs-Oriented Subtitling. Studies in English Language and Literature (21). 37-47.

Eugeni, C. (2020). What’s in a name? TIRO – The Journal of Professional Reporting and Transcription 1. URL: https://tiro.intersteno.org/2020/05/whats-in-a-name/ (22.04.2026)

Eugeni, C. & Y. Gambier (2023). La Traduction Intralinguistique: les defis de la diamesie. Timisoara: Editura Politehnica.

Gerzymisch-Arbogast, H. (2005). Multidimensionale Translation. Ein Blick in die

Zukunft. F. Mayer, (Ed.) 20 Jahre Transforum. Hildesheim: Olms, 23-30.

Gerzymisch-Arbogast, H. (2008). Fundmentals in LSP Translation. Gerzymisch-Arbogast, Heidrun (Ed.) LSP Translation Scenarios. Selected contributions to the EU Marie Curie Conference Vienna 2007. MuTra (2). Norderstedt: Books on Deman GmbH. 7-64.

Gerzymisch-Arbogast, H. (2013). Gutachten zu der Bezeichnung ‘Schriftdolmetschen’. URL: https://bsd-ev.org/wp-content/uploads/2019/09/gutachten_gerzymisch.pdf (22.04.2026)

Platter, J. (2015). Translation im Spannungsbereich von Mündlichkeit und Schriftlichkeit, Schriftdolmetschen in Österreich. Eine textbasierte Analyse. Unpublished dissertation. Universität Wien.

Pöchhacker, F. (2018). Moving Boundaries in Interpreting. Dam, H., Brøgger, M. & K. Zethsen (Eds.) Moving Boundaries in Translation. London: Routledge. 45-63.

Pöchhacker, F. (2023). Re-interpreting interpreting. Translation Studies, DOI: 10.1080/14781700.2023.2207567

Stinson, M. S. (2015). Speech-to-text interpreting. Pöchhacker, F. (Ed.). Routledge Encyclopedia of Interpreting Studies. London: Routledge. 399 f.

Stuckless, R. (1994). Developments in real-time speech-to-text communication for people with impaired hearing. Ross, M. (Ed.). Communication access for persons with hearing loss. Baltimore, MD: York Press, 197-226.

Wagner, S. (2005). Intralingual speech-to-text-conversion in real-time: Challenges and Opportunities. http://www.euroconferences.info/proceedings/2005_Proceedings/2005_Wagner_Susanne.pdf (22.04.2026)

Zethsen, K. K. (2009). Intralingual Translation: An Attempt at Description. Meta, 54(4), 795–812. URL: https://doi.org/10.7202/038904ar (22.04.2026)

Showing 2 comments

pingbacks / trackbacks

Tiro 1/2026 – Tiro

[…] Daniela Eichmeyer-Hell and Anja Rau:User-centred Speech-to-text Interpreting: How to Enhance the Readability of Live Texts […]
June 10th, 2026 02:35 PM
Co-Editieren – unser Artikel in TIRO – TonTasteText

[…] Zum Artikel: USER-CENTRED SPEECH-TO-TEXT INTERPRETING: HOW TO ENHANCE THE READABILITY OF LIVE TEXTS […]
June 16th, 2026 02:36 PM