How Amazon Taught Alexa to Speak in an Irish Brogue

For Alexa to speak like a Dubliner, Amazon researchers had to crack a problem that’s vexed data scientists for years: voice disentanglement.
How Amazon Taught Alexa to Speak in an Irish Brogue

The team contended with various linguistic challenges of Irish English. The Irish tend to drop the “h” in “th,” for example, pronouncing the letters as a hard “t” or a “d,” making “bath” sound like “bat,” or even “bad.” Irish English is also rhotic, meaning the “r” is overpronounced. That means the “r” in “party” will be more distinct than what you might hear out of a Londoner’s mouth. Alexa had to learn these speech features and master them.

Irish English, said Mr. Cotescu, who is Romanian and was the lead researcher on the Irish Alexa team, “is a hard one.”

The speech models that power Alexa’s verbal skills have been growing more advanced in recent years. In 2020, Amazon researchers taught Alexa to speak fluent Spanish from an English language-speaking model.

Mr. Cotescu and the team saw accents as the next frontier of Alexa’s speech capabilities. They designed Irish Alexa to rely more on A.I. than on actors to build up its speech model. As a result, Irish Alexa was trained on a relatively small corpus — about 24 hours of recordings by voice actors who recited 2,000 utterances in Irish-accented English.

At the outset, when Amazon’s researchers fed the Irish recordings to the still-learning Irish Alexa, some weird things happened.

Letters and syllables occasionally dropped out of the response. “S’s” sometimes stuck together. A word or two, sometimes crucial ones, were inexplicably mumbled and incomprehensible. At least in one case, Alexa’s female voice dropped a few octaves, sounding more masculine. Worse, the masculine voice sounded distinctly British, the kind of goof that might raise eyebrows in some Irish homes.

“They are big black boxes,” Mr. Tinchev, a Bulgarian national who is Amazon’s lead scientist on the project, said of the speech models. “You have to have a lot of experimentation to tune them.”