High quality (commercial) Text to English speech software?

13,705

Solution 1

Top voices for text to speech I have heard so far by branding.
Acapela Voices http://www.acapela-group.com/text-to-speech-interactive-demo.html
Cepstral http://cepstral.com/demos/
ATT natural http://www2.research.att.com/~ttsweb/tts/demo.php
Nuance RealSpeak Voices http://www.nuance.com/vocalizer5/flash/index.html
Microsofts versions
L&H & True voice , being at the bottom, unless they improved it lately.

(not sure where to place it) Cereproc http://www.cereproc.com/support/live_demo

I find that the "UK" type english voices can sound "better" from my american perspective, than the "american" voices. Either the accent is hiding more of the problems, or i do not know enough about UK inflections and nuances to be as critical of it.

All of thse will run on the windows system, using SAPI 4 & 5 .
Voices are not all that is needed, for perfect voice though, a good program with dictionaries, pronunciation editing, and the usual tuning of the voice for specific words is going to be nesssiary if you want to get closer to sounding like a real human.

This location http://www.nextup.com/TextAloud/SpeechEngine/voices.html has a lot of samples you can hear of voices. It was a good compilation of the different voices.

The best of the best voices they say (I have not heard them yet) do not work with the system alone, they work only through the seperate program for the voice. The program and the voice are needed and work together. I have not found it yet.

Solution 2

I'm not an expert on speech synthesizers, but I imagine the best solution probably depends on a variety of factors. For instance:

  • Are you looking for a hardware or software solution?
  • Is there a limit on the memory footprint or resource intensiveness? Are there bandwidth considerations?
  • Do you need custom integration?
  • How do you define quality? Is naturalness more important or intelligibility or consistency? I.e. concatenative synthesis generally produces the most natural/human-sounding voices since it's made from short recorded snippets of actual human voices. However, it also produces very tell-tale glitches where the different recordings are spliced together that fully synthetic voices don't have.
  • What type of voice are you looking for? Most speech synthesis programs seem to have much more realistic male voices than female. Also, as an American, voices with foreign (e.g. Austrian or British) accents tend to sound more natural to me than plain American voices.
  • Similarly, some speech engines produce natural sounding speech across a range of configurations, whereas others may have a lower overall quality but can produce extremely realistic speech in a specific configuration.
  • Another application-specific consideration is the variety of input text you expect to receive. Because domain-specific speech synthesis programs can be the most realistic since they're produced from actual prerecordings of entire words or phrases. But this can only be used when the input text comes from a specific domain that can be easily implemented (e.g. a system that reads movie times or bus schedules, etc.). If the input domain is small enough, it may be best to just hire a voice actor to record all the different phrases and sentences that are required.
  • Do you want to clone the voice of a specific individual for this application? CereProc is one company that specializes in this type of voice synthesis, and they've achieved some pretty incredibly results that really capture the personality of the target individual.
  • While all the previous considerations are primarily to do with the output voice, text parsing is also a major component of speech synthesis, as many speech synthesizers have a hard time with different types of punctuations and numeral representations (fractions, percentages, money, exponents, etc.). So you should also consider how your chosen speech engine will handle tricky tokenizations.

If you have that kind of money to spend, I'd look at a few of the top brands such as Acapela, Cepestral, AT&T, CereProc, RealSpeak, etc., let them know your exact project requirements, and have them pitch to you, or at least demo each of them in front of the major stakeholders for this project using some actual input text that the final application will need to process.

Share:
13,705

Related videos on Youtube

bodacydo
Author by

bodacydo

Updated on September 18, 2022

Comments

  • bodacydo
    bodacydo almost 2 years

    I'm working on a software project and I am researching text-to-speech products to use. Does anyone know what are the current state of the art text-to-speech systems? Ideally the speech should be indistinguishable from a native American or English speaker. I'm looking for products with SDK or API that I can easily hook into.

    Just to clarify and iterate on my question - I'm not looking for things like Microsoft's free text-to-speech synthesis program, I'm looking for a high quality professional product.

    • bodacydo
      bodacydo over 12 years
      @Psycogeek I made a mistake. It's "text-to-speech". I'm correcting it now. (Done now - corrected the mistake.)
    • bodacydo
      bodacydo over 12 years
      I'm sorry @iglvzx and @random? why did you close the question? It's a valid software question.
    • Christoph Rüegg
      Christoph Rüegg over 12 years
      Shopping questions are off topic across the SE network
    • bodacydo
      bodacydo over 12 years
      @random - I am sorry, it was not meant to be a shopping question. I only mentioned the budget that I was allocated for the solution, and that I wasn't looking for $35 windows API wrapper shareware but for a very serious product. Can I please edit the question and you make it available again?
    • bodacydo
      bodacydo over 12 years
      @random - Thanks for let me edit the question. I've now removed the pricing and structure it so that pricing is not included. Can you please now unlock my question?
    • Christoph Rüegg
      Christoph Rüegg over 12 years
      It's still a shopping question and also too localised the current state of the market, which is also a comparison question. Plus it would also be not constructive, inviting a cavalcade of possible products instead of how to work a current solution to your needs
  • bodacydo
    bodacydo over 12 years
    Thanks a lot, I didn't think about these aspects. I now got in touch with all the companies and I'm setting up conference calls tomorrow.