Portfolio

Accent conversion

Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation training. Towards this goal, we are developing speech-modification techniques that can generate utterances with the vocal properties of the learner and the accent of a native speaker. This is accomplished by altering both prosodic and segmental characteristics of speech.
Our articulatory-based accent conversion is a two-step process. In the first stage, we build an articulatory synthesizer for the nonnative learner. In the next step, we drive the synthesizer with articulatory gestures recorded from a native speaker.

The utterances below illustrate the performance of the approach, where

  • –   L1EMA: Native speaker articulatory synthesis
  • –   L2EMA: Foreign speaker articulatory synthesis
  • –   AC: Articulatory accent conversion
  • –   L1N: L1M utterance after pitch and VTL normalization to match L2 range
L1EMA L2EMA AC L1N

 

We have also developed an accent conversion method that relies exclusively on acoustic information.  The technique is based on the standard voice conversion model but uses a different pairing of source-target frames. Unlike conventional voice conversion, where the source-target mapping is trained on time-aligned source and target spectral vectors from parallel utterances, in our approach the mapping is trained on pairs selected based on their acoustic similarity following vocal tract length normalization.

(a) Conventional approach to voice conversion; source and target utterances are paired based on their ordering in a forced-aligned parallel corpus.  (b) Our approach to accent conversion: source and target utterances are paired based on their acoustic similarity following vocal-tract-length normalization (VTLN). MCD: Mel Cepstral Distortion.

The utterances below illustrate the performance of our acoustics-based approach, where

  • –   L1: Native speaker – STRAIGHT resynthesis from extracted MFCCs sampled at 200 Hz
  • –   L2: Foreign speaker – STRAIGHT resynthesis from extracted MFCCs sampled at 200 Hz
  • –   AC: Accent conversion using the proposed method
  • –   VC: Conventional voice conversion
L1 L2 AC VC

Audio samples from our other publications

Relevant publications

S. Aryal, R. Gutierrez-Osuna

Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents (Inproceeding)

Proc. Interspeech, 2016.

(BibTeX)

S. Aryal, R. Gutierrez-Osuna

Data driven articulatory synthesis with deep neural networks (Article)

Computer Speech and Language, 36, Page(s): 260-273, 2016.

(Links | BibTeX)

C. Liberatore, S. Aryal, Z. Wang, S. Polsley, R. Gutierrez-Osuna

SABR: Sparse, Anchor-Based Representation of the Speech Signal (Inproceeding)

Proc. Interspeech 2015, Page(s): 608-612, 2015.

(Abstract | Links | BibTeX)

S. Aryal, R. Gutierrez-Osuna

Articulatory-based conversion of foreign accents with deep neural networks (Inproceeding)

Proc. Interspeech, Page(s): 3385-3389, 2015.

(BibTeX)

C. Liberatore, R. Gutierrez-Osuna

Joint Optimization of Anatomical and Gestural Parameters in a Physical Vocal Tract Model (Inproceeding)

ICASSP, 2015.

(Links | BibTeX)

S. Aryal, R. Gutierrez-Osuna

Reduction of non-native accents through statistical parametric articulatory synthesis (Article)

Journal of the Acoustical Society of America, 137, 1, Page(s): 433-446, 2015.

(Links | BibTeX)

S. Aryal, R. Gutierrez-Osuna

Accent conversion through cross-speaker articulatory synthesis (Inproceeding)

Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Page(s): 7744-7748, 2014.

(Links | BibTeX)

S. Aryal, R. Gutierrez-Osuna

Can voice conversion be used to reduce non-native accents (Inproceeding)

Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Page(s): 7929-7933, 2014.

(Links | BibTeX)

D. Felps, S. Aryal, R. Gutierrez-Osuna

Normalization of articulatory data through Procrustes transformations and analysis-by-synthesis (Inproceeding)

Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Page(s): 3051-3055, 2014.

(Links | BibTeX)

S. Aryal, D. Felps, R. Gutierrez-Osuna

Foreign Accent Conversion through Voice Morphing (Inproceeding)

Interspeech, Page(s): 3077-3081, 2013.

(Links | BibTeX)

D. Felps, C. Geng, R. Gutierrez-Osuna

Foreign accent conversion through concatenative synthesis in the articulatory domain (Article)

IEEE Transactions on Audio, Speech and Language Processing, 2012.

(Links | BibTeX)

R. Gutierrez-Osuna, D. Felps

Foreign Accent Conversion through Voice Morphing (Techreport)

2010.

(Abstract | Links | BibTeX)

D. Felps, R. Gutierrez-Osuna

Developing objective measures of foreign-accent conversion (Article)

Audio, Speech, and Language Processing, IEEE Transactions on, 18, 5, Page(s): 1030–1040, 2010.

(Abstract | Links | BibTeX)

D. Felps, C. Geng, M. Berger, K. Richmond, R. Gutierrez-Osuna

Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database (Conference)

Interspeech, 2010.

(Abstract | Links | BibTeX)

D. Felps , H. Bortfeld , R. Gutierrez-Osuna

Foreign accent conversion in computer assisted pronunciation training (Article)

Speech communication, 51, 10, Page(s): 920–932, 2009.

(Abstract | Links | BibTeX)

D. Felps, H. Bortfeld, R. Gutierrez-Osuna

Prosodic and segmental factors in foreign-accent conversion (Techreport)

2008.

(Abstract | Links | BibTeX)