OddVoices (alpha)

Text (leave blank to use MIDI lyrics):

Upload MIDI file:

Choose voice:



OddVoices is a project to create free and open source singing synthesizers for American English. This is a Web frontend for OddVoices, whose C++ source has been compiled to WebAssembly, so everything happens in your browser and nothing is sent to a server. Please note that this is experimental alpha software and has many bugs.

See oddvoices/oddvoices on GitLab for the core DSP code and command-line version of OddVoices, and oddvoices/oddvoices-web for the source code of this Web application.

To use the application, enter some English text into the box, upload a monophonic MIDI file, and select which voice you want to use. You may also leave the text blank, and the app will look for MIDI lyric events. Click "Sing," wait a few seconds, and play the audio file with the controls. To save as a WAV file, use the three dots to the right (Chrome) or right click and press Save Audio As... (Firefox).

There are no limits on the length of text or length of MIDI files, but you might stub your toe by running into browser memory limitations. If you encounter this, or need some form of batch processing, consider using the native command-line version.

Phonetic entry

OddVoices uses the CMU Pronouncing Dictionary to pronounce most words. OddVoices does not identify parts of speech, so heteronyms like "lead" and "read" are not handled intelligently. For OOV (out-of-vocabulary) words, OddVoices will guess the pronunciation by converting individual letters and pairs of letters to phonemes.

To supply custom pronunciations to override the defaults, X-SAMPA notation is supported. Surround the X-SAMPA pronunciation with forward slashes (like this: /hEloU/ ) and make sure no additional punctuation immediately precedes or follows the slashes. The table of phonemes is:


There are some peculiarities worth noting here. First is the cot-caught merger that equates /ɑ/ and /ɔ/ along with other low back vowels. This admittedly reflects a bias towards the American West Coast and towards a younger demographic of singers. The exception to this merger is that /ɔr/ (horde) and /ɑr/ (hard) are distinct. If you enter /O/ or /A/ they will sound the same in OddVoices, but /Or/ and /Ar/ are different.

Second is the unification of /ə/ and /ʌ/. When sung, the English schwa is difficult to pin down and really represents a multitude of vowels. In varieties of North American English, /ə/ and /ʌ/ are closely linked and differ primarily by stress, so /ʌ/ is the best candidate for absorbing /ə/. Similarly, OddVoices doesn't distinguish /ɚ/ and /ɝ/. The CMU Pronouncing Dictionary unfortunately uses schwas a lot, so OddVoices enunciates a lot of words weirdly, like "im-uh-tate" for "imitate."

Finally, X-SAMPA's /{/ causes bracket matching issues in some text editors, so /{}/ and /&/ are provided as alternatives. The latter is borrowed from the so-called Conlang X-SAMPA or CXS.


OddVoices is copyright © 2021-2022 Nathan Ho and is available under the Apache License. Its voice files are in the Public Domain.

Midifile is copyright © 1999-2018 Craig Stuart Sapp and is available under the BSD 2-Clause License.

The CMU Pronouncing Dictionary is copyright © 1993-2015 Carnegie Mellon University and available under the BSD 2-Clause License.