El Molino (teatro): differenze tra le versioni

Da Wikipedia, l'enciclopedia libera.
Vai alla navigazione Vai alla ricerca
Contenuto cancellato Contenuto aggiunto
Nessun oggetto della modifica
Nessun oggetto della modifica
Riga 13: Riga 13:
===NeoSpeech and Nuance voices===
===NeoSpeech and Nuance voices===


Concatenation with a larger amount of recorded data (about 500 Megabytes), along with other undisclosed methods, is apparently used by [[NeoSpeech]]'s [[SAPI 5]] voices "Lily" and "Wang",<ref>[http://www.nextup.com/neospeech.html NeoSpeech SAPI5 TTS Voices for TextAloud, Kurzweil K3000, Jaws and Other Voice Programs<!-- Bot generated title -->]</ref> which can, in most cases, reliably synthesize awkward phrases provided they are added to the dictionary properly<ref>[http://www.nextup.com/phpBB2/viewtopic.php?p=8252#8252 NextUp.com :Text To Speech Software Forums: View topic - Cannot make Neospeech Lily read pinyin (phonetic text)<!-- Bot generated title -->]</ref> and does not suffer from the severe inflexibility and forced joins of simpler concatenation-based synthesis.
Concatenation with a larger amount of recorded data (about 500 Megabytes), along with other undisclosed methods, is apparently used by [[NeoSpeech]]'s [[SAPI 5]] voices "Lily" and "Wang",<ref>[http://www.nextup.com/neospeech.html NeoSpeech SAPI5 TTS Voices for TextAloud, Kurzweil K3000, Jaws and Other Voice Programs]</ref> which can, in most cases, reliably synthesize awkward phrases provided they are added to the dictionary properly<ref>[http://www.nextup.com/phpBB2/viewtopic.php?p=8252#8252 NextUp.com :Text To Speech Software Forums: View topic - Cannot make Neospeech Lily read pinyin (phonetic text)<!-- Bot generated title -->]</ref> and does not suffer from the severe inflexibility and forced joins of simpler concatenation-based synthesis.


The Nuance (formerly ScanSoft) RealSpeak MeiLing voice (available from [http://www.nextup.com/TextAloud/SpeechEngine/voices.html#Nuance NextUp] but note that it won't install without a purchased version of [http://www.nextup.com/ScanInstall.html TextAloud]) has similar properties but the download size is much smaller (42.7 MB). Due to bugs in the program, it is very difficult to get MeiLing to speak reliably from pinyin or [[zhuyin]] input.<ref>[http://www.nextup.com/phpBB2/viewtopic.php?p=9636#9636 NextUp.com :Text To Speech Software Forums: View topic - How to customize pronunciation in ScanSoft MeiLing?<!-- Bot generated title -->]</ref>
The Nuance (formerly ScanSoft) RealSpeak MeiLing voice (available from [http://www.nextup.com/TextAloud/SpeechEngine/voices.html#Nuance NextUp] but note that it won't install without a purchased version of [http://www.nextup.com/ScanInstall.html TextAloud]) has similar properties but the download size is much smaller (42.7 MB). Due to bugs in the program, it is very difficult to get MeiLing to speak reliably from pinyin or [[zhuyin]] input.<ref>[http://www.nextup.com/phpBB2/viewtopic.php?p=9636#9636 NextUp.com :Text To Speech Software Forums: View topic - How to customize pronunciation in ScanSoft MeiLing?<!-- Bot generated title -->]</ref>

Versione delle 12:15, 15 set 2009

Inizio della parte da tradurre: ]</ref> and does not suffer from the severe inflexibility and forced joins of simpler concatenation-based synthesis.

The Nuance (formerly ScanSoft) RealSpeak MeiLing voice (available from NextUp but note that it won't install without a purchased version of TextAloud) has similar properties but the download size is much smaller (42.7 MB). Due to bugs in the program, it is very difficult to get MeiLing to speak reliably from pinyin or zhuyin input.[1]

Of these voices, the most reliable for synthesizing awkward or unusual phrases from pronunciation input appears to be Lily. However, even Lily is not perfect. A few phrases are synthesized incorrectly when entered as pinyin but correctly when entered as Chinese characters, for example "yong4chu5lai5" (incorrectly read as the more common "yong4chu1lai5", but characters 用出来 are read correctly), and "zhuan3lai2zhuan3qu4" (the first "zhuan" is incorrectly read as "zhuai", but the characters 转来转去 are read correctly). This is reminiscent of some commercial English speech synthesizers which yield lower quality speech when fed pronunciation data than when fed original text, suggesting that the pronunciation data they accept is not the internal format they use.[2] Nevertheless it is not always desirable to enter characters only, because often it is necessary to specify a different pronunciation.

These voices can also fault in ways that are not explainable by the input format. For example, Neospeech Lily and Nuance MeiLing both make the following mistakes (which could indicate a sharing of the unpublished techniques they use, despite the significant difference in data size): 首都 (shou3du1) the "du1" is too low in pitch; 邮编 (you2bian1) the "bian1" is too low in pitch; 天真 (tian1zhen1) the two syllables are said with a drop of a musical third, like a doorbell, whereas they should be at the same pitch; 糖尿病 (tang2 niao4 bing4) the N is very unclear. This is true whether the input is characters or (in Lily's case) pinyin. The first three of these mistakes do not occur when the word is part of a longer phrase, but do occur when it is in isolation, which is often the case in a language-learning scenario[3].

Sometimes, pinyin phrases that are synthesized incorrectly by Lily can be corrected by breaking long words into separate words, but not in the above examples.

There does not appear to be any method of sending feedback to the developers about these bugs.

ESpeak

The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has started experimenting with Chinese synthesis.

Ekho

Ekho is another open source Chinese TTS, which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials".

Online Demos and Bell Labs

There is an online interactive demonstration for NeoSpeech voices,[4] but it is not possible to customize the Chinese pronunciation by entering pinyin. iFlyTek also has an online demonstration,[5] but it is frequently non-functional with no replies from the contact email address, and in practice it does not appear to accept CSSML pronunciation overrides. (Update: There is now a more reliable demo at iflylanguage.com[Server in USA] and ecl.iflytek.com [Server in China], and it allows CSSML pronuciation correction with visualization mode named "Advanced Reading Mode Settings". note however that the Javascript interface is slightly confusing for blind users as there is no submit button on the form; you have to click on the link that says "Woman's voice" or "Man's voice" after typing text in the box.)

Bell Labs have an online Mandarin text-to-speech demo[6] dated 1997, but it is now non-functional (the server that the query is to be submitted to does not exist in the DNS) and the contact email is no longer valid. However, their approach was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31 1997, ISBN 978-0792380276), and the former employee who was responsible for the project, Chilin Shih (who now works at the University of Illinois), has some notes about her methods on her website.[7]

Non-Windows systems

The above-mentioned Chinese speech synthesis systems (apart from the online demos) are available only for Windows. However, the spaced-interval repetition language-practice program Gradint includes code and instructions for using KeyTIP and SpeechPlus data on other operating systems, by reading the data directly or using the WINE emulator.

There are some reports[8] that SAPI 5-based speech synthesizers can be run on recent versions of the WINE emulator.

Mac OS had Chinese speech synthesizers available up to version 9; this was removed in Mac OS X but is scheduled to be replaced in version 10.5, according to Apple's website.

Notable approaches not yet taken

As of 2007, it appears that there have been no projects to synthesize Chinese by simulating the human vocal tract, as GNU Speech is doing for English. Chinese is also notably missing from the extensively-multilingual MBROLA project.

Note

See also

-->