Vocaloid Technology

     One of my first posts was So How Does This Work, Anyway? In it I described how the technology of Vocaloid worked. However, when I look at it now, I quite honestly cringe at it. I have definitely improved in writing since I wrote it. So, like I did with the Introduction, I'm remaking it, so newer visitors to my blog can enjoy a better-written and (hopefully) less confusing explanation.
     There are many different ways to synthesize any particular sound, but so far the most successful and realistic method has been concatenative synthesis. That's my favorite word, by the way. Not only does it roll pleasantly off the tongue, the word concatenate means to link together; unite in a series or chain. So, concatenative synthesis takes small samples, then links them smoothly together in a chain. How do we do that for the human voice, then?
     Phonemes are the essential building blocks of any language. Say I record a real person (a "voice donor") singing all 44 basic phonemes for the English language (again, this process can me done for any language, but I'm just doing English as an example), plus diaphonemes, a combination of two different phonemes, and allophones, variations of the same phoneme. Then I can plug those recordings into an algorithm that can concatenate any two samples together smoothly, and change the pitch of those samples. After cleaning things up and fixing bugs, the synthesizer is ready.
     That's how Vocaloid, and programs like it to a certain extent, are made. After buying a Vocaloid product, the user operates the voice by inputting notes into a piano-roll style interface, then typing words, and sometimes individual phonetic symbols into the notes, if the built-in dictionary doesn't recognize the word and automatically convert it into the correct phonemes. The rest is fine-tuning, which has been shortened to simply "tuning" in the Vocaloid community: Editing dynamics, expression, and other parameters. It is not a very easy job, and some are better or worse at it than others, but with a high-quality Vocaloid voice, experienced users have been reliably able to fool people into thinking their songs are sung by humans. The software is still not perfect, of course, but it's better than it was, and it's improving all the time.

Comments

Popular posts from this blog

Introduction

Vocaloid Descriptions: Megpoid (Gumi)

Updated Music I Like Post