Transcribe This! – Reflecting on the Importance of Communicating Non-verbal Elements

Posted June 10, 2026

In Issue 1/2026

Multimodality (Kress & van Leeuwen, 2006) is a concept that I have dealt with a lot in my columns. The idea behind it is that diamesic translation professions, such as court and parliamentary reporting, intralingual subtitling, minute taking or any other form of transcription, are not just acts of turning spoken language into written language. They turn into written language not just the words making any speech (i.e., the verbal component), but all that comes with it. This includes the paraverbal component, such as acoustic elements like intonation, and visual components like body language; and the non-verbal component, such as acoustic elements like applause, and visual elements like a speaker’s dress code. This is because the meaning of an utterance can be understood only in the wider context in which it is uttered. For example, if we exclaim “Help!” in a dangerous situation, we do not mean the same as if we exclaimed “Help!” while performing “The Three Little Pigs” for our children, taking part in a Beatles-themed karaoke night or complaining to a colleague that the boss asks too much of them.

In this issue, I present the idea of the universal report (Eugeni, 2026) and, in particular, the issue of multimodality. One of the main features of the universal report is that every single word in the report matches the exact moment in a videorecording where these words were uttered. This helps us understand how multimodality works, as it shows to the importance of para- and non-verbal communication (see above) to fully understanding what a speaker means with the words they utter in a specific context. Consider the example of a judge who must pass sentence in a case based only on a written account of the words uttered in a testimony. How can the judge be sure of understanding what is meant in a testimony if they cannot see the speaker’s face or their body language, or hear the pauses and hesitations made when speaking or the emotion that emerges from their utterances. If a witness uses a steady pace, it might mean that they are sure about something but, on the contrary, if they use a rising hesitant tone, it might indicate that they are not sure.

A possibly extreme case of multimodality in reporting is the haka dance performed by a New Zealand MP on 14 November 2024 during a vote on a controversial proposed law redefining the country’s founding agreement between indigenous Māori and the British Crown (https://apnews.com/video/new-zealand-80b9f44eddab4bff9f0e2b39f6017dbc). The parliamentary report says:

“[Members perform the haka “Ka Mate” on the floor of the House]” (Hansard Office, 2024: 7431).

This is a smart move, because readers’ familiarity with the context does most of the job: New Zealanders know what the haka dance is, means and implies if performed in front of the opposing MPs.

After the suspension of the debates, the Speaker of the House comments on the event as follows:

“At the end of the Principles of the Treaty of Waitangi Bill, Hana-Rawhiti Maipi-Clarke misused the voting procedure to stage a protest. Other members joined in.” (Hansard Office, 2024: 7432).

Thanks to this, the same New Zealanders reading the report have the bigger picture, including the names of the participants and the intention of the performers. A reader without knowledge of the haka culture, however, should be told about the function of the dance in its original context (a battle), the threatening face put on by performers, the sounds they produce, their colours and moves, the significance of the haka today within the identity of the Māori people and any relevant visual and acoustic elements that relate to the meaning of its use in that context.

Now, let us imagine that the same haka dance was performed in a Tik-Tok video by a Korean hip-hopper, in an anthropological museum in front of primary school pupils or at the opening of a football match. What would it mean in those contexts? Could it still be transcribed as the haka “Ka Mate”? And what if you were to think about not from the perspective of a parliamentary reporter, but from that of a social network influencer, a museum guide, a TV commentator, an audio describer for blind audiences of cinema movies, a Financial Times journalist or simply a New Zealander telling a friend about that same performance in Parliament? Before trying to answer these questions, it might be useful to consider that accessing the multimodal context can help greatly to streamline understanding of the real meaning of what is said.

Carlo Eugeni is Tiro’s scientific adviser.

References

Eugeni, C. (2026) “The Universal Report: Integrating Accessibility, Technology and Human Expertise”, in Tiro 1/2026., URL: [TO BE ADDED LATER]

Kress, G., & van Leeuwen, T. (2006 [1996]). Reading Images: The Grammar of Visual Design (2nd ed.). London: Routledge.

Hansard Office (2024). Parliamentary Debates, Thursday, 14 November 2024 (for inclusion in Volume 779). Wellington: New Zealand Parliament, retrieved from https://hansard.parliament.nz/api/resources/daily/related/2024-11-14/2024-11-14-daily.pdf?dname=Parliamentary%20Debates%20(Hansard)%20for%20Thursday,%2014%20November%202024%20[PDF%20859KB].pdf

Comments

pingbacks / trackbacks

Tiro 1/2026 – Tiro

[…] Carlo Eugeni:Transcribe This! – Reflecting on the Importance of Communicating Non-verbal Elements […]
June 10th, 2026 01:53 PM

References

Leave a Comment Cancel reply

Leave a Comment
Cancel reply