More Voices For Microsoft Speech
Talk to Your Computer and Have It Answer Back with the Microsoft Speech APIJanuary 1. Talk to Your Computer and Have It Answer Back with the Microsoft Speech APIMike Rozak. Mike Rozak is a development lead in the Personal Systems Division of Microsoft, tasked with the incorporation of speech into the operating systems APIs. Mike has been working with engine vendors to design and implement a speech recognition and text to speech API. Click to open or copy the CLOCK project files. In the beginning, humans communicated with their computers using soldering irons and voltmeters. More Voices For Microsoft Speech Voices' title='More Voices For Microsoft Speech Voices' />Needless to say, this grew tiresome quickly. So someone had the bright idea of using toggle switches and light bulbs. But that wasnt so hot either, so soon scientists figured out a way to feed their computers instructions on little cards with holes in them the computers spat their own cards out the other end. Still pretty awkward. Things started really cooking when keyboards and monitors came along. Now people were communicating in a strange dialect with words like mv and grep. The enter key meant do it. And if you wanted to read the results in the bathroom, you could print them on a big old clunky lineprinter. These days, the march to make computers communicate in ways that come naturally to humans continues. Desktop Text to speech download software with natural sounding voices. Supports PDF, word, ebooks, webpages, Convert text to audio files. In the quest for a perfectly transparent user interface, speech is perhaps the final frontier short of direct brain link. Admit it, since you were a kid you wanted to talk to a computer the way Mr. Spock talks to his computer aboard the Enterprise. Computer, what time is itTen fifteen. Shoot, Im late for my pon farr. Hey, print off the latest science officer data for me, wouldjaSure thing, Spock. You want that by animal or mineralIf that sounds far fetched, keep reading. More Voices For Microsoft Speech RecognitionIn this article, Ill bring you up to speed on whats happening with computer speech, and Ill show you how to write a simple talking clock program that speaks the current time of day whenever you ask, What time is it Really Wherefore Speech I know what youre thinking. Why would I want to talk to my computer Why would I want my computer talking to me You imagine a cacophony of computers and people gabbing away in their cubicles. You think how silly youd feel sitting in your home office, talking to a beige box on your desk. Well, its true that keyboards and mice are in little danger of becoming obsolete any time soon, but there are nevertheless many situations where speech is useful. Have you ever played a computer game where a character asks you a question A cartoon style text balloon pops out of the characters mouth and you answer by clicking a button. Wouldnt it be more natural if the character really spoke And for you to answer back in EnglishThe Microsoft texttospeech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API SAPI or the Microsoft Speech Server. Purchase TextAloud, NextUp Talker or lifelike, naturally sounding voices by ATT Natural Voices, Acapela, Ivona and Nuance. Whether you are looking for a TexttoSpeech engine, server, embedded solution, or a simple TexttoSpeech voice solution like our SAPI voices, take a look at our. Narrator is a lightduty screen reader utility included in Microsoft Windows. Narrator reads dialog boxes and window controls in a number of the more basic. More Voices For Microsoft Speech ServerOr French, if thats your preference. Or how about this. Your screen is littered with toolbars, and you cant remember whether its Ctrl Alt F8 to double underline, or Alt Shift F4 or Alt Control Shift Mumble Whatever. Why not just select some text and say, Double underline this. Adobe Imageready For Windows Xp there. You wouldnt have to shout you could say it softly. Or maybe youd like to call your bank and transfer some money from savings to checking. Instead of playing twenty questions with the synthetic phone operator as you maneuver through seven levels of prerecorded menus, why not just say, Transfer one hundred dollars from savings to checking Of course, in that case, youd be talking to the banks computer, not yours but you might want to call your own PC to ask, Do I have any email or Look up Mary Smiths number in my address book. No fussing with buttons while youre driving in traffic no need for a laptop or modem. Just dial up and talkOr, if youre one of the millions of people who suffer from repetitive motion injuries like carpal tunnel syndrome, why not give your fingers a break once in a while. Dont type, dictate. There are other hands off situations where people need to use their computers while doing something else like operating a piece of machinery. Or maybe you just want your computer to read words or numbers back to you as you type them, to help catch typing errors. These are just a few areas where computer speech is really useful. How Do They Do ThatYou dont need to understand the intricacies of speech technology to use it in your apps, but I suspect many of you are curious, so I figured Id give you a very short overview of how it works. There are two basic technologies speech recognition SR and speech synthesis, depending on who is doing the talking you or the computer. Speech synthesis is commonly called text to speech or TTS, since the speech is usually synthesized from text data. Figure 1 shows the architecture of a typical text to speech engine. Figure 1 Text to Speech Engine. The process begins when the application hands the engine a string of text such as, The man walked down 5. St. The text analysis module converts numbers into words, identifies punctuation such as commas, periods, and semicolons, converts abbreviations to words, and even figures out how to pronounce acronyms. Some acronyms are spelled out MSJ whereas others are pronounced as a word FEMA. The sample sentence would get converted to something like lt begin. Statement. The man walked down fifty sixth street. Statement Text analysis is quite complex because written language can be so ambiguous. A human has no trouble pronouncing St. John St. as Saint John Street, but a computer, in typically mechanical fashion, might come up with Street John Street unless a clever programmer gives it some help. Once the text is converted to words, the engine figures out what words should be emphasized by making them louder or longer, or giving them a higher pitch. Other words may be deemphasized. Without word emphasis, or prosody, the result is a monotone voice that sounds robotic, like something out of a 5. After adding prosody, thesamplesentence might end up like this lt begin. Statement. lt de emphasize the lt emphasize man walked. Statement Next, the text to speech engine determines how the words are pronounced, either by looking them up in a pronunciation dictionary, or by running an algorithm that guesses the pronunciation. Some text strings have ambiguous pronunciations, such as read. The engine must use context to disambiguate the pronunciations. The result of this analysis is the original sentence expressed as phonemes. Th uh M A Nw au l k t. D OU Nf ih f t ee. S IH K S TH s t r ee t. Next, the phonemes are parsed and their pronunciations retrieved from a phoneme to sound database that numerically describes what the individual phonemes sound like. If speech were simple, this table would have only forty four entries, one for each of the forty four English phonemes or whatever language is used. In practice, each phoneme is modified slightly by its neighbors, so the table often has as many as 1. Depending on the implementation, the table might store either a short wave recording or parameters that describe the mouth and tongue shape. Either way the sound database values are finally smoothed together using signal processing techniques, and the digital audio signal is sent to an output device such as a PC sound card and out the speakers to your ears. Thats text to speech. Speech recognition is the flip side. Figure 2 shows a generic speech recognition engine. Buku Program Khemah Ibadah more.