摘要:
The present invention discloses a system and a method for creating a reduced script, which is read by a voice talent to create a concatenative text-to-speech (TTS) voice. The method can automatically process pre-recorded audio to derive speech assets for a concatenative TTS voice. The pre-recording audio can include sets of recorded phrases used by a speech user interface (Sill). A set of unfulfilled speech assets needed for foil phonetic coverage of the concatenative TTS voice can be determined. A reduced script can be constructed that includes a set of phrases, which when read by a voice talent result in a reduced corpus. When the reduced corpus is automatically processed, a reduced set of speech assets result. The reduced set includes each of the unfulfilled speech assets. When this reduced corpus is combined with existing speech assets the result will be a voice with a complete set of speech assets.
摘要:
Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.
摘要:
A method for enhancing a media file to enable speech-recognition of spoken navigation commands can be provided. The method can include receiving a plurality of textual items based on subject matter of the media file and generating a grammar for each textual item, thereby generating a plurality of grammars for use by a speech recognition engine. The method can further include associating a time stamp with each grammar, wherein a time stamp indicates a location in the media file of a textual item corresponding with a grammar. The method can further include associating the plurality of grammars with the media file, such that speech recognized by the speech recognition engine is associated with a corresponding location in the media file.
摘要:
The present disclosure relates to prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played.
摘要:
The present invention discloses a text-to-speech system that provides output variability. The system can include a finite state grammar, a variability engine and a text-to-speech engine. The finite state grammar can contain a phrase role consisting of one or more phrase elements. The phrase rule can deterministically generate a variable text phrase based upon at least one random number. The phrase rule can include a definition for each of the phrase elements. Each definition can be associated with at least one defined text string. The variability engine can construct a random text phrase responsive to receiving an action command, wherein said finite state grammar is used to create the text phrase. The variability engine can also rely on user-specified weights to adjust the output probabilities. The speech-to-text engine can convert the text phrase generated by the variability engine into speech output.
摘要:
The present invention discloses a system and a method for creating a reduced script, which is read by a voice talent to create a concatenative text-to-speech (TTS) voice. The method can automatically process pre-recorded audio to derive speech assets for a concatenative TTS voice. The pre-recording audio can include sets of recorded phrases used by a speech user interface (Sill). A set of unfulfilled speech assets needed for foil phonetic coverage of the concatenative TTS voice can be determined. A reduced script can be constructed that includes a set of phrases, which when read by a voice talent result in a reduced corpus. When the reduced corpus is automatically processed, a reduced set of speech assets result. The reduced set includes each of the unfulfilled speech assets. When this reduced corpus is combined with existing speech assets the result will be a voice with a complete set of speech assets.
摘要:
A method for delivering a message to a recipient in an environment with ambient noise includes the steps of recording the ambient noise in the environment at a certain time interval, analyzing the recorded ambient noise to obtain an average power Pnoise or a RMS amplitude Anoise of the ambient noise, providing a predetermined desired SNRdesired, calculating an average signal power Psignal or a RMS amplitude Asignal of the message to be delivered based on the Pnoise or Anoise and the desired SNRdesired, and adjusting a volume of the message to be delivered according to the Psignal or Asignal. Alternatively, the actual SNRactual will be computed and the message will be repeated if the SNRactual falls below the SNRmin. Systems for delivering a message to a recipient in an environment with ambient noise and computer-readable media having computer-executable instructions for carrying out the methods are also provided.
摘要:
A method for searching Web pages that begins with the identification of query criteria entered into a search provider. A set of Web pages that satisfies the query criteria are determined. Then, a page ranking is ascertained for each Web page in the set. The Web pages are presented in order by page ranking. The page ranking is based upon at least one relevancy factor that includes a browsing-time factor. The browsing-time factor can be calculated from browsing behavior exhibited by users, who provided similar query criteria. The set of users from which the browsing-time factor is calculated can include a current user, a set of users sharing characteristics with the current user, and/or a general set of users. Browsing behavior can include time spent at a Web page, where the browsed Web page is a page that was previously presented as a search result for the similar query criteria.
摘要:
A method for searching Web pages that begins with the identification of query criteria entered into a search provider. A set of Web pages that satisfies the query criteria are determined. Then, a page ranking is ascertained for each Web page in the set. The Web pages are presented in order by page ranking. The page ranking is based upon at least one relevancy factor that includes a browsing-time factor. The browsing-time factor can be calculated from browsing behavior exhibited by users, who provided similar query criteria. The set of users from which the browsing-time factor is calculated can include a current user, a set of users sharing characteristics with the current user, and/or a general set of users. Browsing behavior can include time spent at a Web page, where the browsed Web page is a page that was previously presented as a search result for the similar query criteria.
摘要:
Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.