Why, Where, When
Like so many of our colleagues in parliamentary reporting sections across the world, the Debates Office in the Irish Parliament (the Houses of the Oireachtas) has been facing increasing pressures due to extended sittings of plenary sessions of both Houses and an ever-expanding committee system. In February 2022 alone, our output was 3 million words. The result has been slippage in publication deadlines which causes frustration among Members of Parliament and the public, and among staff of the Debates Office as we pride ourselves on the quality and speed of service we provide to our customers.
Realising the extent of the challenge we faced, in 2019 we turned our attention to seeing if technology could assist us in dealing with this problem. Automatic Speech Recognition (ASR) has been around in various forms for years and some of our reporters are using Dragon software to respeak contributions. However, such applications have only limited capacity to solve the problem and what we were looking for was an ASR solution that would deliver first draft speech to text output from live and recorded feeds directly from the Chambers to our Debates Office PCs.
In early 2019 we were granted funding from the Oireachtas ICT department to undertake a four-month pilot ASR project with Italian company Cedat 85. The pilot was a laboratory experiment where we selected a base model set of representative samples of previous debates from both Houses and committees comprising 32 hours of debate. The streamed feed of these sessions was taken from the Oireachtas website and run through Cedat’s ASR Digital for Democracy and the output was then pulled back into PCs in Dublin for editing and analysis.
A key principle we wanted to study was how we could improve the accuracy of the language model throughout the pilot by providing weekly feedback on the output which Cedat engineers would use to adjust the language model. In order to categorise this feedback, errors and corrections to the output were logged under 3 headings:
- linguistic – e.g. “five thousand” to be recorded as “5,000”; “25 percent” to be recorded as “25%”
- software – screen freezing on certain commands
- acoustic – several speakers may have been interrupting each other
Each week during the pilot we held a conference call with Cedat providing feedback on the output we had received and Cedat engineers would use this information to adjust the language model. Over the course of the four months of the pilot we used 3 language models – the original and two new versions based on the feedback provided to Cedat. In order to assess whether these new models were improving the accuracy of the output, we applied each new model to the same selection of parliamentary sessions so we were comparing like with like. Using the Sclite (US National Institute of Standards and Technology) word error rate, WER, evaluation tool we identified a percentage WER reduction in the language models as follows:
Model One, 16.3%
Model Two, 13.3%
Model Three, 13%
The word error rate is a measurement of accuracy of output. What this showed was that the number of errors being produced by the software was reducing with each new language model and therefore the overall accuracy was improving. This was a significant result for us as it proved that with ongoing feedback and adjustments we could continue to improve the accuracy of the output over time.
Some results from the pilot
Perhaps not surprisingly, the pilot confirmed a number of ideas we had about this kind of technology. It works very well with a good speaker and when the atmosphere is quiet. In such circumstances there are very significant benefits, primarily around improving the speed of production of the report and in removing the need to manually input text. However, at those times which inevitably arise with bad speakers or arguments, there is much more manual intervention required by the reporter. We also came to the conclusion that ASR will push the debate in the direction of being more verbatim. This view was based on the opinion that if the software produced an accurate record of what was said, and it was grammatically correct, the reporters and editors would be less likely to alter it to apply house styles or make minor adjustments which might have been made if the text was input manually.
Challenges for ASR in the parliamentary environment
From our experience of working on the pilot, and as professionals working in parliamentary reporting, it appears that the parliamentary setting poses particularly difficult challenges for ASR technology because there are so many variables over which neither the technology providers nor the reporting sections have control. Consider the physical environment of chambers, some of them built long ago with no acoustic consideration. Every parliament has “good” speakers and “bad” speakers. Members interrupt each other, they shout and they speak off mic. They drift between languages (Irish and English in our case). They have regional dialects and accents. Witnesses who are unused to public speaking will be asked to appear before committees. Then there is all of the unspoken content that must be added to the debate such as the text of amendments, motions and vote results. In addition, there is the question of at what point we tag XML to the debate to identify speakers, types of business being conducted, stages of Bills and so on. So this is an ask that will I think push the technology to the limit.
Where we are right now
Right now the Debates Office, in conjunction with our ICT department, is preparing a request for tender seeking expressions of interest from the market. We should have been at this point two years ago but the Covid-19 pandemic interrupted all of our plans as priorities shifted almost overnight. We expect that our ask will be in two parts. Initially we will be looking for an ASR output which complements our existing editing and publication software. However, in the next 3 to 5 years we will be upgrading our entire systems as part of the wider digital transformation agenda in the Oireachtas. It is likely that at that stage we would seek to incorporate ASR at a much more embedded level producing not only speech to text for the Debates Office but also a host of other services such as real time closed captioning for broadcasting, speaker recognition, translation, and using the very granular XML output in more creative ways to present a whole new suite of data sets based on cognitive analysis of content.
There is an old Irish joke where a traveller asks a local person for directions to a particular place. The man replies, “If I was going there I wouldn’t start from here.” I sometimes feel that this could be applied to trying to introduce ASR technology to a parliamentary setting. We are starting from a difficult place. We are operating in sometimes very old physical surroundings and with established parliamentary procedures which will not change to suit the new technology. But that is the exciting challenge that confronts us. If we can make ASR work in these parliamentary settings it will truly be a momentous achievement offering an entirely new set of outcomes and surprises.
Carl Lombard is Deputy Editor of Debates, Houses of the Oireachtas, Dublin.
[…] Carl Lombard:Experimenting with Automatic Speech Recognition in the Houses of the Oireachtas (Irish Parliament) […]
[…] Lombard, Carl 2022: Experimenting with Automatic Speech Recognition in the Houses of the Oireacthas (Iris Parliament). – Tiro 1/2021. URL: https://tiro.intersteno.org/2022/07/experimenting-with-automatic-speech-recognition-in-the-houses-of… […]