In Issue 2/2023

Introduction

Like many parliamentary reporting offices worldwide, the Official Journal Division (OJD) of the Portuguese Parliament has been seeking technological solutions—specifically, exploring automatic speech recognition (ASR) software—to support and optimise its work processes. Facing increased demands from extended plenary sessions and a growing number of inquiry committees, and addressing staff injuries from intensive typing, ASR technology presented an opportunity to enhance efficiency and alleviate the physical strain on reporters.

The Official Journal Division’s workflow

The Official Journal of the Portuguese Parliament, the Diário da Assembleia da República, provides a comprehensive account of each plenary session and inquiry committees. The journal documents debates in direct speech, adopting a format similar to a dramatic dialogue, complete with speaker identification and MP party affiliations. In order to do so, OJD reporters take handwritten notes in the sitting room to record significant interruptions or non-verbal events relevant to the debate, such as applause and laughter. Subsequently, they refer to digital audio recordings of the meetings, divided into 15-minute segments corresponding to their note-taking periods, which they can control with a foot pedal. Until the recent implementation of an ASR system, which will be described later, the main task of a reporter involved transcribing spoken words into written text using Microsoft Word. Although they no longer need to type the entire text, they still have to note interruptions and non-verbal events at appropriate points in the transcription. Additionally, they make minor edits, such as removing false starts, hesitations, self-corrections, repetitions, and slips of the tongue, and they adjust word order or punctuation to clarify the text. The resulting transcription is an edited verbatim report, prioritising the meaning of what was said over a strictly ipsis verbis record (Eugeni & Gambier, 2023). Moreover, reporters also insert, correct, or complete procedural information related to the discussion agenda, legislative initiatives, vote results, et cetera when necessary, meeting institutional requirements and alternating between mimetic (“showing”) and diegetic (“telling”) reporting strategies (Voutilainen, 2023). Senior reporters, serving as editorial managers, oversee reporters’ work, conducting thorough proofreading for consistency and compliance with the journal’s guidelines.

Exploring and evaluating ASR technologies

In 2019, and particularly in 2020, in response to the demanding workload from extended plenary sessions held after the Covid-19 confinement periods, some staff members began exploring various free ASR technologies. Acknowledging advancements in this technology, which had shown unsatisfactory results in tests conducted in 2013, the OJD and, subsequently, the Secretary-General of the Portuguese Parliament considered implementing an ASR system a priority. In July 2022, an official task group comprising members from different parliamentary services was established to evaluate specific ASR solutions.

This task group began evaluating software from a Portuguese company, as well as another software based on a major international brand’s speech-to-text technology. While awaiting their proofs of concept (PoC), the OJD engaged in online and in-person meetings with parliamentary reporting offices in Brazil, Finland, Spain, and the European Parliament, among others, to share best practices and address common ASR implementation challenges. Additionally, the OJD also met with the NLX-Group, a university research group on natural language and speech at the Department of Informatics at the University of Lisbon. Also, an OJD member attended the Intersteno Maastricht conference in 2022. All this contributed to the knowledge exchange on the subject.

By the end of 2022, only the PoC from the major international brand had been delivered. The PoC revealed significant shortcomings, including a word error rate (WER) substantially higher than the average 10% benchmark recommended by our contacts in other parliaments, an inability to integrate with the foot pedal and MS Word, and a lack of machine learning capabilities to adapt and improve from our own corrections. In the meantime, most reporters had begun using freely available versions of ASR systems, such as Microsoft Azure Speech-to-Text and Speechmatics.

The quick transition to a new ASR

In early 2023, we began experimenting with OpenAI’s Whisper, an open-source large language model (LLM) trained with 680,000 hours of multilingual audio records paired with accurate text transcriptions, which have been verified or created by humans. At first, we executed it on a cloud-based platform. Since plenary meetings are open to the public and we usually do not transcribe committee or in camera meetings, we had no access security concerns. This solution has suited us well, and we have had the ASR software running smoothly on our own servers since March 2023. A colleague from our IT directorate and member of the task group developed a Python program that enables Whisper to automatically transcribe the audio recordings of the meetings, recognises speaker changes, and performs several other text replacements and formatting. The transcriptions are delivered as Word files, in a folder structured so that every reporter has access to it a few minutes after the recording is made available.

Given the simplicity of this set-up, the experience gained with other ASR systems, and the ability to edit the text in Word, these Whisper transcriptions were adopted by all our staff practically overnight. In fact, they proved to be a significant aid to the office while working on transcribing meetings of two additional inquiry committees, in addition to plenary meetings, during the first semester of 2023.

Future developments

This simple yet highly flexible system demonstrated promising results from the start. A series of tests we have conducted in August yielded a WER ranging between 1.7%, for pre-written and well-articulated speech, and 11.2% for poorer-quality speech or more heated debates. However, we are determined to continue improving it further. In fact, although we conducted the latest tests with a relatively small sample size, based on the insights gained from these, we have been working on creating post-editing scripts that will help correct most recurring transcription errors (mostly regarding proper names, places, and domain-related terms) and even insert some procedural expressions at the right moment in the text.

Although this will probably take longer, we are also considering developing a large acoustic model (a sort of LLM that detects voice patterns in order to identify speakers), based on our own audio records. This would enhance the quality and the promptness of the transcription by enabling effective speaker diarisation—the process of automatically distinguishing who is speaking at any given time in the text. The current program already separates the speeches of different speakers, but the names of the speakers still need to be manually entered by the reporters.

The biggest challenge, however, will be to integrate this homemade system with the remaining existing software and to streamline or at least extend it to other parliamentary services. We have already concluded that it could help the Committee Support Division in drawing minutes despite the particularities of the Standing Committees meetings. Also, our Parliamentary Channel Support Centre is working on a very similar system to apply it to live subtitling and translating, as well as indexing and cataloguing video archives.

While we have no control over this particular LLM and its developments, all the recent advances in artificial intelligence lead us to think that not only are open-source LLMs here to stay but they will keep improving and it will become easier and easier to create accurate LLMs based on smaller datasets. In fact, one conceivable step further would be to develop our own LLM based on our parliamentary records, which could help not only in improving the quality of our transcription but also in developing a system to support law writing and proofreading.

Conclusions

As we were warned by colleagues in other parliaments and as we have experienced while testing other ASR solutions, reporters are presented by a wall of text they no longer have to type from scratch. Moreover, contrary to other ASR systems tested, Whisper leaves out of the transcript many false starts, planning expressions and self-corrections that reporters would not type anyway. The software even corrects some minor speakers’ mistakes or omissions. Reporters still have to edit the text to insert procedural information and make the written speech clearer. For the time being, there are no considerable gains in terms of time, but this means the nature of a great part of the reporter’s work has changed from typing to editing, making the management of the staff more flexible, as more reporters can adopt the role of editors.

At the same time, the first draft of the report tends to align more closely with what was said, regardless of sentence clarity. This has heightened our awareness of our editing practices and of the need to make them more transparent, as the report gets more readily accessible and comparable with online audio and video recordings.

While open-source ASR technologies have not drastically shortened report production times, due to the editing and institutional information addition required, they have made staff management more adaptable and eased the physical burden on reporters. As they will undoubtedly keep evolving, we are committed to continuing to explore them, upholding the quality and the timeliness of our work, and we are eager to share experiences and best practice with the international reporting community.

Paulo Granja works as a parliamentary reporter at the Official Journal Division of the Parliament of Portugal.

References

Comments
pingbacks / trackbacks
  • […] Paulo Granja:The Portuguese Parliamentary Reporters’ Experience with Automatic Speech Recognition Systems […]

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.