In the 20th and 21st centuries, the world has undergone technical, scientific and technological transformations, which have pushed the interaction between humans and computers further and further. This has led to the notion of a fourth industrial revolution, with all disciplines potentially fusing. This hybridisation phenomenon has had an impact on all sectors of our daily life, profoundly challenging many activities, which can no longer be accomplished without hardware or software tools.
Initially described as the “implementation of interactive computing systems for human use” (Hewett et al., 1992), this interdependence between humans and technology, or “human-computer interaction” (or HCI), has overcome the “human use” in many fields, totally or significantly replacing it. In diamesic translation – that is, transcribing speech into written text for reporting, subtitling and other similar purposes – HCI has been increasingly automating all fast-writing activities to such an extent that these activities are impossible without technology, but not without humans.
Depending on the technical solution adopted to carry out these activities (pen shorthand, keyboarding, stenotyping, respeaking or other solutions with automatic speech recognition, or ASR), the products of diamesic translation texts (subtitles, reports, transcripts or notes) can be grouped into four main categories:
- Fully human: humans do the speech-capturing job, and text-processing software can then be used to copy the handwritten notes into digital text.
- Computer-aided: human live subtitlers, reporters or note takers play the most important role, and technology assists them with fast-writing software programs and text-processing interfaces.
- Human-aided: technology transcribes the speech and humans post-edit it, either by correcting transcription mistakes only (as in the case of court reporting) or by turning the spoken text into a readable written text by omitting orality features (such as fillers, hesitations, false starts, self-reformulations and self-corrections) and adjusting grammar and style to those of a written text, as in the case of parliamentary reports and live subtitles in many countries.
- Fully automated: as in the case of automatic subtitles offered by web-based platforms such as YouTube or Teams, ASR transcribes speech and adds punctuation automatically. No human is involved in the process.
Further layers of technology can be added to the above depending on the profession: in note taking, experimental technology automatically translates notes into digital words; in live subtitling, live editors correct the text produced by the live subtitler with a text-processing interface; in court and parliamentary reporting, automatic transcripts and pre-recorded subtitling, the text can be synchronised with the speech either manually, with professional subtitling software, or automatically. In all the above, automatic translation can be used to produce multilingual versions of the same text, as in the case of the European Parliament, which has recently started an experiment aiming at producing live transcripts of its sessions automatically translated into all official languages (Romero-Fresco, 2023).
Recently, big developments have also been made with artificial intelligence. Many professional writers and diamesic translators feel threatened by it, because programs such as ChatGPT seem to produce impeccable professional documents, such as summaries, reports, transcripts, translations, essays and commentaries. The results to date have been both surprisingly positive and surprisingly negative, as is shown by Eero Voutilainen in this issue of Tiro.
In light of all that has happened so far in the fields of HCI and diamesic translation, one can safely conclude that humans should not be scared by technology, as the request for human jobs has kept increasing. There are three main reasons for such an apparently contradictory conclusion. First, the number of transcribed speeches is still a minor proportion of all speeches produced. Secondly, fully automated diamesic translation is more applied to speeches that are unlikely to be transcribed by a professional, such as YouTube videos, videoconference meetings and media output for media indexing purposes. Thirdly, technology introduces new jobs, such as the post-editor of automatically produced transcript drafts.
All in all, humans are still in control, and machines cannot think for themselves.
Carlo Eugeni is Tiro’s Scientific Advisor.
Hewett, T., R. Baecker, S. K. Card & T. Carey (1992). ACM SGHCHI Curricula for Human-Computer Interaction. New York: The Association for Computing Machinery.
Romero-Fresco, P. (2023). “Interpreting for access – The long road to recognition”. In Zwischenberger, C., K. Reithofer & S. Rennert (eds.) Introducing New Hypertexts on Interpreting (Studies) – A tribute to Franz Pöchhacker. Amsterdam: John Benjamins Publishing Company, pp.236-253. DOI: https://doi.org/10.1075/btl.160.12rom
Voutilainen, E. (2023). “Artificial Intelligence Suggests Using Itself for Professional Transcription” – Tiro 1/2023.