In Issue 2/2022

From multimodality to embodied cognition: interpreting (through) gestures

This article illustrates a brief overview of results from an empirical research project, conducted in collaboration with the Moscow State Linguistic University’s Centre for Socio-Cognitive Discourse Studies (SCoDis), aimed at investigating the phenomenon of simultaneous interpreters’ gestural behaviour under increased cognitive load. Such an analysis could prove interesting not only for the field of interpreting studies but for professional reporting techniques such as respeaking, an activity in which the discipline of simultaneous interpretation is intrinsically embedded.

The literature on simultaneous interpreting (SI) has long defined this profession as an extremely demanding activity. In this respect, cognitive psychology refers to the notion of “cognitive load” or even “cognitive pressure” – the total amount of mental effort. This can be influenced by many factors, particularly task load and design.

Indeed, SI constitutes a stand-alone communicative activity. It implies interpreters’ ability to simultaneously understand the speaker’s inner world and match it with their own, in order to reproduce meaning through a linguistic channel to convey the intended message to a target audience whose different culture inevitably results in different world conceptualisations. Described as “the absurd but inevitable incongruence inherent in interpretation” (Poyatos 1997), this cannot fail to result in a significant increase in cognitive load.

While dealing with these extremely challenging requirements, simultaneous interpreters – even though seated in sound-proof booths or in more recent, pandemic times connected via some remote simultaneous interpreting platforms, and thus able to reach their audience only “monomodally”, using their own voices – have frequently (if not usually) been proven to carry out language perception and production-related activities along with gesturing.

As sustained by the concept of “economy of intermediate representation” (Setton 1999), based on Johnson-Laird and Byrne’s mental model theory of cognition (1983), interpreters can co-ordinate input and output through a mental modelling process, which makes both language comprehension and production faster and more accurate. This enables them to process any portion of a speech by creating “mental segmentations”, which can reduce listening, production, memory and co-ordination efforts (Gile 2008).

It inevitably follows that SI should be regarded as a multimodal process, matching the interpreters’ input and output “with the activated mental representations that are conceptually shared by two working languages” (Pavlenko 2009). That said, it is worth considering how gestures’ function fits into the context of multimodality in SI. The mind’s ability to create perceptual experience via mental models leads to the principle of “embodied cognition” and subsequently “embodied language processing”, which describe the human sensorimotor system as fundamentally integrated with cognitive and linguistic processing.

In this respect, many research studies suggest that gestural behaviours may serve a speaker-oriented function – both co-speech and “co-thought” (Schwartz & Black 1996, Hegarty et al. 2005) gestures. It has been proven that people tend to perform gestures even in the absence of a conversational partner, since – as mentioned by Kita (2000) and demonstrated by Bavelas et al. (1992) – not all gestures are intended to be seen. Gestures performed by congenitally blind individuals, as well as research showing a general increase in quality of both speech comprehension and production by individuals suffering from aphasia or stuttering when encouraged to gesticulate, reinforce the self-oriented facilitative role of gestures in cognitive and linguistic processing.

In the specific case of interpreters, confined to the narrow interior of sound-proof booths, gestural reproduction of a lexical meaning can be explained only by acknowledging that giving physical form to a concept is an extremely useful cognitive effort-reduction strategy, applied “outside the realm of conscious awareness” (Streeck 2006).

From theoretical framework to empirical evidence

To provide robust evidence in support of this theory, 19 participants – both professional interpreters and undergraduate students – were asked to record their SI performance on a TED Talk lecture from English to Italian. The gathered material was analysed with the aim of assessing the types and functions of the specific gestures all subjects performed in co-occurrence with perceived disfluencies in interpreting production. In ELAN annotation software, tiers were created to annotate speech difficulties and their categories, as well as all gesture typologies and the hand(s) used to perform them. Jamovi statistical software was used to find correlations between interpreters’ gestural activities and the co-occurring linguistic signals of increased cognitive pressure.

Our data seem to support the exact opposite to the claim that preventing gestures has no particular effect on speech production (Holler et al. 2012, Kisa et al. 2021), research showing that people gesture more when their listeners can see them (Alibali et al. 2001, Mol et al. 2011) and the argument that “when speech stops, gesture stops” (Graziano & Gullberg 2018). Our experimental investigation suggests a very tight link between speech and gestures, showing how the latter are accountable for so-called “cross-modal priming” (Krauss et al. 2001): through the “manipulation” of one’s own world conceptualisation and perception, gestures enable one to represent in motoric or kinesic form features of the “source concept” (Ibid.).

Overall, our findings show that gestures helped interpreters in reducing stress, in lexical retrieval and in analytic and spatio-motoric thinking, serving as an invaluable support not only at times of fluent interpretation but, above all, during speech disruptions.

A comparison between results obtained from our analysis and those extracted from similar studies conducted by the SCoDis centre underlined interesting similarities in interpreters’ gestural behaviour patterns. Based on this, we may infer – although further research is needed – not only that gestures help simultaneous interpreters in reducing an increased cognitive effort but that such self-oriented facilitative function is cross-cultural and cross-linguistic. In other words, given that kinaesthetic perceptual processes are an integral part of human cognitive functioning, accompanying speech with hand gestures appears to be a crucial cognitive load-reduction strategy, irrespective of interpreters’ professional expertise, their culture of origin or the language combinations involved.If further supported by future research, this could lay the groundwork for new multimodal training strategies, with obvious beneficial implications for interpreters and interpreting trainers’ education.

Future implications

Much more interestingly, as far as respeaking is concerned – apart from the fact that our results may prove useful to boost respeakers’ SI performance as well – a possible implication might concern enhancing automatic speech recognition software, applying hand and articulatory gesture detection and control technologies and thus combining verbal and non-verbal stimuli to compensate for what is not captured in speech. Also, since gestures not only convey meaning but are meaning in their own right, their untranslatability in written form may entail serious omissions in verbatim transcription processes, for which respeaking is the main real-time reporting technique applied to date.

To make up for this deficiency, cutting-edge tools could yield optimal results in the form of accurate multimodal transcriptions from congresses or conferences, as well as forensic interrogation techniques. Such implementations, however, would imply the need to create a “vocabulary of gestures”. Far from this being a futuristic, utopian vision – despite the highly idiosyncratic nature of speech-accompanying gestures, which thus lack an unambiguous codification – numerous studies have already moved towards structuring a “lexicography of gestures” aimed at attributing linguistic meaning to the “semantic richness and subtlety of symbolism” (Poggi 2002) inherent in human gestural behaviours.

Although differences in gestural patterns between individuals and cultures are undeniable, results from our study highlighted that, under different instances of cognitive load, Italian and Russian interpreters resorted to the same main gesture typologies, therefore indicating the existence of at least partial cognitive-kinaesthetic commonalities among individuals. Specifically, adaptors, beats and pragmatic gestures turned out to be the main gestural macro-typologies that interpreters from both subject groups resorted to at times of speech disfluencies and/or resolutions. In addition, recent technological advancements showed how AI can be trained via interactive design to mimic human gestures when presented as part of multimodal communication, integrating them with speech as well as occurs naturally between people.

Concluding remarks

What has been discussed so far seems to suggest the possibility – as already proposed by Galvão (2009), albeit for slightly different purposes – of creating small multimedia corpora of speeches and respective simultaneous interpretations, correlated by gestural annotations, with the final aim of boosting the development of new multimodal strategies geared towards the recognition of gestures’ invaluable contribution in intrapersonal and interlinguistic communication, ultimately leading to a revolution in both interpreting studies and respeaking.

Vittoria Ghirardi is a translator and interpreter with a master’s degree in Conference Interpreting from the Scuola Superiore per Mediatori Linguistici in Pisa, Italy.


Alibali, M.W., Heat, D.C. & Meyers, H.J. 2001: Effects of visibility between speaker and listener on gesture production: Some gestures are meant to be seen – Journal of Memory and Language, Amsterdam: Elsevier, Vol. 44, 2, pp. 169-188.

Bavelas, J.B., Chovil, N., Lawrie, D.A. & Wade, A. 1992: Interactive gestures – Discourse Processes, 15, pp. 469-489.

Galvão, E.Z. 2009: “Speech and Gesture in the Booth – A Descriptive Approach to Multimodality in Simultaneous Interpreting”, in De Crom, D. (ed.), Selected papers of the CETRA Research Seminar in Translations Studies 2008, pp. 1-20.

Gile, D. 2008: Local Cognitive Load in Simultaneous Interpreting and its Implications for Empirical Research – FORUM, Revue Internationale d’interprétation et de traduction, Vol. 6, 2, pp. 59-77.

Graziano, M. & Gullberg, M. 2018: When Speech Stops, Gesture Stops: Evidence From Developmental and Crosslinguistic Comparisons – Frontiers in Psychology, Vol. 9, 879, pp. 1-15.

Hegarty, M., Mayer, S., Kriz, S. & Keehner, M. 2005: The role of gestures in mental animation – Spatial Cognition and Computation, Vol. 5, 4, pp. 33-356.

Holler, J., Turner, K. & Varcianna, T. 2012: It’s on the tip of my fingers: Co-speech gestures during lexical retrieval in different social contexts – Psychology Press, Taylor & Francis Group, Language and Cognitive Processes, Cognitive Neuroscience of Language, pp. 1-10.

Kisa, Y.D., Goldin-Meadow, S. & Casasanto, D. 2021: Do gestures really facilitate speech production? – American Psychological Association, Journal of Experimental Psychology General, Vol. 151, 6, pp. 1252–1271.

Kita, S. 2000: “How representational gestures help speaking”, in McNeill, D. (ed.), Language and gesture, Cambridge: Cambridge University Press, pp. 162-185.

Krauss, R.M., Chen, Y. & Gottesman, R.F. 2001: “Lexical Gestures and Lexical Access: A Process Model”, in McNeill, D. (ed.), Language and gesture, New York: Cambridge University Press, pp. 261-283.

Mol, L., Kramer, E., Maes, A. & Swerts, M. 2011: Seeing and being seen: the effects on gesture production – International Communication Association, Journal of Computer-Mediated Communication, Vol. 17, 1, pp. 77-100.

Pavlenko, A. 2009: “Conceptual representation in the bilingual lexicon and second language vocabulary learning”, in Pavlenko, A. (ed.), The bilingual mental lexicon. Interdisciplinary approaches, London: Cromwell Press, pp. 125-160.

Poggi, I. 2002: The case of the Italian gestionary – Gesture, Vol. 2, John Benjamins Publishing Company, pp. 71-98.

Poyatos, F. 1997: “The reality of multichannel communication”, in Poyatos, F. (ed.), Nonverbal Communication and Translations, Amsterdam/Philadelphia: John Benjamins, pp. 249-282.

Schwartz, D.L. & Black, J.B. 1996: Shuttling between depictive models and abstract rules: Induction and fallback – Cognitive Science, 20, pp. 457-497.

Setton, R. 1999: Simultaneous Interpretation. A Cognitive-pragmatic Analysis, Amsterdam/Philadelphia: John Benjamins Publishing Company.

Streeck, J. 2006: Gestures: Pragmatic Aspects – Encyclopaedia of Language & Linguistics, 2, Elsevier, pp. 71-76.

Showing 2 comments
pingbacks / trackbacks
  • […] Vittoria Ghirardi:“Speech Handling”: Patterns of Gestural Behaviour Under Increased Cognitive Load – A Multimoda… […]

  • […] Vittoria Ghirardi:“Speech Handling”: Patterns of Gestural Behaviour Under Increased Cognitive Load – A Multimoda… […]

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.