Parliamentary reporting was introduced in Finland roughly 160 years ago, but the basic workflow in the Records Office of the Parliament of Finland has not changed much since the early days: we still use a two-phased system with typists and reporters to compile the plenary reports. What has changed is that it is increasingly hard to recruit typists nowadays, and yet, due to new accessibility legislation in late 2020, the office is facing pressure to produce even more text than before, for example alternative texts or captions of public committee hearings. To be able to meet these demands, we simply need more tools to input large amounts of text.
The Parliament of Finland acquired an automatic speech recognition (ASR) system in December 2021 and has now been using it since February 2022. Before the acquisition, a lot of studying and documenting was required, and after that, adjustment to new ways of drafting records, but what was the actual process like?
Assessing our needs
The Records Officehas been watching the development of ASR technology during the last decade. We have not really been interested in using respeaking, for which we would have needed new equipment, more workspaces etc. Since we knew that speaker-independent recognition had taken great strides forward in recent years, we felt the time was right to consider an ASR system. We also felt that speech recognition will be in our future, whether we like it or not, and that being an active part would give us a chance to influence the development.
A proof of concept (PoC) was launched at the office in late 2019 to test how ASR would work for us. The entire office, consisting of 25 people, both typists and reporters, took part in the PoC, trying different approaches and ways of working with ASR. All in all, the speech recognition worked very well, with more than 90 % accuracy for most plenary speeches. It also proved invaluable in producing the alternative texts and captions; we did not even have typists available for that, since most of the typists work part-time and only during plenary sessions.
A special ASR team consisting of reporters, typists and ICT and procurement specialists in Parliament was then created to consider the technical and practical requirements of an ASR system and how to evaluate different systems.
The hard work of evaluating
After the PoC, the actual process of acquisition started with a request for tender (RFT) in September 2021. According to European Union regulations, an RFT is necessary when a project will exceed a certain cost. We then established several requirements for usability and quality to evaluate the tenders, but our biggest focus was on the quality of the actual text produced by the ASR. We chose a word error rate (WER) method, tailored to our own specific needs, for the evaluation; for the definition of WER, see Wikipedia s.v. Word error rate.
The evaluation of all five tenders was a time-consuming but necessary process. We chose two hours of speech from plenary sessions, consisting of more than 13,000 words in Finnish, and made a reference text of it. All the results were then compared, side by side, to the reference text in a spreadsheet file that automatically calculated any errors compared to the reference text.
The quality in general was high, with an accuracy of between 91% and 97%. Please note that we did not include notoriously difficult speakers in the test material, since we have found it easier to type them manually. This probably increased the accuracy.
After thoroughly evaluating the tenders, the winner was declared in December 2021. The system was introduced when Parliament reconvened after its winter break in early February 2022.
Since a few of the MPs speak Finland Swedish, we required all tenders to include an offer for speech recognition in Finland Swedish. However, the quality did not quite meet our standards, so we have not used this option—at least not yet.
When planning a public procurement, a proof of concept is strongly recommended. It is important to describe requirements and evaluation points precisely because it makes it easier to compare the tenders. The evaluation should be done thoroughly because it provides you with support, for instance if one of the tenderers wants to question the decision.
Introduction and new ways of drafting the report
The foundation of our ASR system is a politics language model consisting of parliamentary reports and other material from our website. This means that the ASR has functioned quite well, although we haven’t made any major development work yet. While the ASR may struggle with dialects and bad articulation, we don’t have very heated debates in Finland, which helps us a lot since the audio files do not include much noise or many interruptions.
Using an ASR system requires a totally new attitude towards the draft. The ASR listens and calculates odds, but it lacks a human sense of context. You have to be very critical towards it because the draft may seem correct at a glance although, with a closer look, it is not. You have to focus on details and even on individual characters but still keep an eye on the big picture. We still have the same editing guidelines as previously but because the ASR serves you the draft on a silver plate, you have to be cautious and not trust the artificial intelligence—your eyes can fool your ears when reading a draft and simultaneously listening to the audio. Humans always have to control the process, not machines.
The new way of working is widely accepted by our typists as well as reporters, although in the beginning there were some suspicions. It still isn’t a perfect application, but we have noticed that the pros beat the cons: correcting errors made by the ASR is, in most cases, less work than writing everything from scratch.
The ASR has made it possible to fulfil our new duties, such as creating alternative texts and captions. It eases the physical load and it gives us much-needed flexibility. Although we still have a two-phased working process in plenary sessions, typists don’t have to type all the time and reporters can use the ASR to make drafts themselves. Nevertheless, we still need our typists: there will always be speakers who are difficult for the ASR because of various dialects, articulation etc. In such cases, typing from scratch still is faster than correcting errors made by the ASR.
Niklas Varisto and Riikka Kuronen are parliamentary reporters in the Records Office of the Parliament of Finland.
Wikipedia s.v. Word error rate. URL: https://en.wikipedia.org/wiki/Word_error_rate