In Issue 2/2024

Introduction

Intralingual live subtitles can be created through different modes, one of which is the technique of respeaking. As Eugeni (2008) defines it, respeaking is “[…] a technique thanks to which the respeaker listens to the source text and re-speaks it. The vocal input is processed by a speech recognition software which transcribes it, thus producing real-time subtitles.”

This paper refers to a pilot study about verbatim and sensatim live subtitles in parliamentary sessions. In particular, the focus is on the qualitative analysis conducted throughout one session of the Rome City Council, where live subtitles are usually created via two main modes: 1) some speakers are directly subtitled through automatic speech recognition (ASR) with a live editor correcting possible mistakes and introducing punctuation as needed; and 2) other speakers are subtitled by a respeaker with the support of a live editor monitoring and correcting any mistakes and also introducing punctuation marks. For mode 1, only one editor is required to check the automatic subtitles (no respeakers) while, for mode 2, two people are involved in the process (one respeaker and one live editor). In the case of the Rome City Council, the technology used allows respeakers to avoid dictating punctuation, which is manually introduced by the live editor through a dedicated editing interface. When the speaker articulates and speaks clearly, there is no longer the need for a human intervention, and the live editor directly corrects mistakes and adds punctuation to the text produced by the ASR software.

Automatic (Verbatim) vs. Respoken (Edited) Subtitles

Each Rome City Council session is streamed online, with the screen showing the council member or president who is speaking with intralingual subtitles underneath. All sessions are also made accessible to deaf and hard-of-hearing (DHoH) people through Italian sign language (ISL) simultaneous interpreting.

As mentioned above, the delivered speeches are subtitled either via respeaking and live editing or via ASR and live editing. For this reason, it is impossible to make a comparison of how the same intervention is subtitled through both respeaking and ASR. This analysis therefore seeks only to explore the positive and negative aspects of the two modes and cannot provide that direct comparison as the turns that were subtitled differ in speech rate, topics covered and speakers’ feature.

Sensatim (edited) subtitles tend to be deemed by DHoH people as a form of censorship (Romero-Fresco, 2009) compared with verbatim (word for word) subtitles. The reason is because, some information can be lost through the reformulation carried out by the respeaker, meaning that the audience does not receive the full message that others are able to access. While this is true to a certain extent, it should also be taken into consideration that, when dealing with a live situation such as a session of a legislative body, orality is all the more pervasive and there are many orality features that can – or rather, should – be omitted in order to create a slimmer and more readable subtitle.

The orality features in these situations include mumbling and self-corrections, hesitations, false starts, parenthetical elements and highly subordinated sentences; that is, complex syntax. On one hand, the formal context guarantees that speech turns and speakers rarely overlap, which is good for the complicated task of respeakers. On the other, in some cases the speeches are almost impromptu and are filled with features of orality that do not aid the brevity and speed required for subtitles. Such features of orality cause problems in the ASR process since the machine does not have the ability to filter them or even recognise them, resulting in typos or other errors that can be difficult to interpret when reading the subtitle. The respeaker’s output is instead more likely to have removed all such features typical of spoken language, leading to more accessible subtitles. While the quality of verbatim subtitles compared with sensatim subtitles is up for debate, the hypothesis resulting to this observation is that the respeaker’s intervention is needed when the machine is no longer able to produce an acceptably accurate and readable verbatim transcript, due to an insufficient input (Lambourne, Hewitt, Lyon, & Warren 2004.)

Examples and Analysis

Some examples are shown below, created via respeaking and live editing or via ASR and live editing. Each example shows the source text (ST), which is the oral production of speakers, and the transmitted subtitles (SUB). Both have English translations below. Omissions are identified with brackets (…) and modified or misspelt words are underlined.

Example 1, below, was subtitled through ASR.  All the words spoken are reported, with none altered or misspelled, except for the filler word “diciamo” (“let’s say”), which is left out thanks to the revision of the editor, who deemed it unnecessary.

Example 1

ST: “Grazie presidente, volevo chiederle di rispettare un minute di silenzio per Maria Coscia. Se possibile volevo, diciamo… esprimere un pensiero da parte dell’aula. Grazie.”

‘Thank you, Chair, I wanted to ask for a minute’s silence for Maria Coscia. If it is possible, I wanted – let’s say – to express a thought from the assembly, thank you.’

SUB: “Grazie presidente. Volevo chiederle di rispettare un minuto di silenzio per Maria Coscia. Se possibile volevo (…) esprimere un pensiero da parte dell’aula, grazie.”

‘Thank you, Chair, I wanted to ask for a minute’s silence for Maria Coscia. If it is possible, I wanted to express a thought from the assembly, thank you.’

Example two, below, was subtitled through respeaking.

Example 2

ST: “E ricordiamo anche, non chiedendo il minuto di silenzio perché è una ricorrenza, comunque… che oggi ricorre anche l’anniversario della strage di Nassiriya e quindi idealmente siamo anche vicini ai militari feriti in questi giorni in un attentato sempre in Iraq.”

‘Let me also remind, without asking for the minute’s silence since it is an anniversary, anyway… that the date of today is the anniversary of the Nasiriyah bombing, so ideally we are also emotionally close to the soldiers wounded in the last few last days in a terrorist attack in Iraq as well.’

SUB: “(…) Oggi ricorre anche l’anniversario della strage di Nassiriya, quindi (…) siamo anche vicini ai militari feriti in questi giorni. L’attentato è avvenuto in Iraq.”

‘Today is the anniversary of the Nasiriyah bombing, so we are also emotionally close to the soldiers wounded last days. The terrorist attack was in Iraq.’

In an effort to provide as much readability as possible, the respeaker decided to turn most of the subordinate clauses into main clauses and the live editor has made considerable use of punctuation. Although reformulation is needed for readability reasons, the last part of these respoken subtitles contains a misleading concept. The speaker is talking about a terrorist attack that happened at the same time as the City Council session, but the final sentence, “The terrorist attack was in Iraq”, is ambiguous and could refer either to the terrorist attack of Nasiriyah or to the most recent one to which the speaker is actually referring.

To illustrate the meaning created through punctuation in subtitles, example 3, below, shows a broadcasted subtitle created via ASR where there is no punctuation and the subtitle is simply presented as a flow of words.

Example 3

SUB: “Se non erro se mi supportano gli uffici ok avevamo cinque ordini del giorno 29 emendamenti dovevano trattare ancora l’emendamento e l’emendamento numero 3 quindi abbiamo già esaurito gli ordini del giorno è far sì che si chiama regolamento Consigliere De Priamo so già che mi vuole dire.”

‘If I am not mistaken with the support from the offices ok we have five agendas 29 amendments we had to deal with amendment and the amendment number 3 thus we have already completed the agendas it is in order to that is called regulation Council member De Priamo I already know what you want to tell me.’

We can see in this example that the lack of punctuation makes it considerably more difficult to read and understand the subtitles.

Conclusions

Despite larger-scale research being needed to achieve more comprehensive results, in this preliminary phase and on the basis of the examples given, we can assume as a conclusion that some clear differences exist between the two modes of intralingual live subtitling that have been analysed. Generally speaking, live subtitles produced through ASR and live editing are more verbatim, unless the live editor decides to omit something. Syntactically, sentences remain as complex as the original, hence sometimes difficult to read. On the contrary, live subtitles through respeaking are normally reformulated, including syntactically, which in some cases improves the readability of the subtitles by omitting features of orality and repetitions. It is nevertheless important, as with all things, to bear in mind the pros and cons of each mode and to balance the upsides and downsides according to the context which they are used.

Alice Pagano, Ph. D., is a post-doc researcher and adjunct lecturer in Spanish Language and Translation at the University of Trieste, Italy. She has also worked as an interpreter, translator and post-editor.

References

Eugeni, C. (2008). “La sottotitolazione in diretta TV; Analisi strategica del respeakeraggio verbatim di BBC News.” Ph.D Thesis for Università degli Studi di Napoli Federico II.

Eugeni, C. (2019). “Technology in court reporting – Capitalising on human-computer interaction.” In International Justice Congress Proceedings, Uluslararası Adalet Kongresi (UAK), 2 4 May 2019, 873–881.

Lambourne, A., Hewitt, J., Lyon, C., Warren, S. (2004). “Speech-Based Real-Time Subtitling Services.” In International Journal of Speech Technology, 7(4), 269–79.

Romero-Fresco, P. (2009). “More haste less speed: Edited versus verbatim respoken subtitles.” In Vigo International Journal of Applied Linguistics n°6. Retrieved from (http://vialjournal.webs.uvigo.es/pdf/Vial-2009-Article6.pdf).

Romero-Fresco, P. (2018). “Respeaking: Subtitling through Speech Recognition.” In The Routledge Handbook of Audiovisual Translation, Routledge, 96-114.

Comments
pingbacks / trackbacks
  • […] Alice Pagano:Verbatim vs. Edited Live Parliamentary Subtitles […]

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.