Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
ICASSP 2026 Paper Submission
Note: Syllables in the text shown in blue are those modified by the American → British rule. The corresponding changed segments in the American IPA and British IPA rows (Kokoro inputs) are highlighted in yellow. Phoneme transcriptions use the American and British IPA sets defined in the Kokoro TTS model.
Text:First meeting was next week. American IPA (Kokoro):fˈɜɹst mˈiTɪŋ wʌz nˈɛkst wˈik. British IPA (Kokoro):fˈɜəst mˈitɪŋ wʌz nˈɛkst wˈik.
Condition
Audio (wav)
American Probability
British Probability
American Similarity
British Similarity
American Speaker Emb
0.969
0.022
0.729
-0.073
American Speaker Emb + Rules
0.974
0.015
0.765
-0.156
British Speaker Emb
0.150
0.643
0.252
0.757
British Speaker Emb + Rules
0.114
0.731
0.207
0.817
Text: In the end, however, it all came together. American IPA (Kokoro): ɪn ði ˈɛnd, hWˈɛvəɹ, ɪt ˈɔl kˈAm təɡˈɛðəɹ. British IPA (Kokoro): ɪn ði ˈɛnd, hWˈɛvəə, ɪt ˈɔl kˈAm təɡˈɛðəə.
Condition
Audio (wav)
American Probability
British Probability
American Similarity
British Similarity
American Speaker Emb
0.975
0.022
0.620
-0.011
American Speaker Emb + Rules
0.815
0.138
0.617
0.220
British Speaker Emb
0.016
0.980
0.056
0.666
British Speaker Emb + Rules
0.001
0.988
-0.222
0.944
Text:Certainly, in terms of league position, we must be favorites. American IPA (Kokoro):sˈɜɹtnli, ɪn tˈɜɹmz ʌv lˈiɡ pəzˈɪʃən, wi mˈʌst bi fˈAvəɹəts. British IPA (Kokoro):sˈɜətnli, ɪn tˈɜəmz ʌv lˈiɡ pəzˈɪʃən, wi mˈʌst bi fˈAvəɹəts.
Condition
Audio (wav)
American Probability
British Probability
American Similarity
British Similarity
American Speaker Emb
0.976
0.006
0.935
-0.278
American Speaker Emb + Rules
0.971
0.007
0.946
-0.270
British Speaker Emb
0.091
0.132
0.429
0.568
British Speaker Emb + Rules
0.125
0.199
0.442
0.613
Text: Retirement? Who knows? American IPA (Kokoro): ɹətˈIəɹmᵊnt? hˌu nˈOz? British IPA (Kokoro): ɹətˈIəəmᵊnt? hˌu nˈQz?
Condition
Audio (wav)
American Probability
British Probability
American Similarity
British Similarity
American Speaker Emb
0.959
0.035
0.662
-0.008
American Speaker Emb + Rules
0.973
0.020
0.730
-0.082
British Speaker Emb
0.741
0.232
0.576
0.166
British Speaker Emb + Rules
0.068
0.876
0.196
0.759
Text: A bottle of water. American IPA (Kokoro): ɐ bˈɑTəl ʌv wˈɔTəɹ? British IPA (Kokoro): ɐ bˈɒtəl ʌv wˈɔtəə?