摘要:
A spoken language system (100) includes a recognition component (120) that generates (220) a recognized sequence of words from a sequence of received spoken words, and assigns (225) a confidence score to each word in the recognized sequence of words. A presentation component (140) of the spoken language system adjusts (240) nominal acoustical properties of words in a presentation (142) of the recognized sequence of words, the adjustment performed according to the confidence score of each word. The adjustments include adjustments to acoustical features and acoustical contexts of words and groups of words in the presented sequence of words. The presentation component presents (245) the adjusted sequence of words.
摘要:
A method and apparatus for textual searching of a database is provided herein. During operation a user will input a letter into a search engine. The search engine will score words based on the letter and display results of the highest-scored words. Another letter will again be received and the process repeated. In situations where titles are returned to the user, additional steps of associating the words with a title and scoring the title take place. The highest-scored titles are provided to the user as the displayed results.
摘要:
A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice.
摘要:
A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.
摘要:
A method and apparatus for enabling multimodal tags in a communication device is disclosed. The method comprises receiving a first training signal and receiving a second training signal in conjunction with the first training signal. A multimodal tag is created to represent a combination of the first training signal and the second training signal and a function is associated with the created multimodal tag.
摘要:
A portable electronic communication device, designed for voice and data communication is utilized as a peripheral input device for transmitting/providing character inputs, entered in the first device's touch input mechanism, to a second electronic device. The first device has a mode switching utility that switches the first device between a first standard communication mode and a second peripheral input device mode. When the first device is in the second peripheral input device mode, the first device operates as a peripheral input device for the second device. A character input recognition utility executes on the first device to provide the functions of: detecting an input on the touch screen input mechanism; generating an electronic representation of the input; establishing a communication link between the second communication transmitter and an identified second device; and forwarding the electronic representation of the character input to the communication transmitter for transmission to the identified second device.
摘要:
A voice toolkit (100) and a method (700) for managing pronunciation dictionaries are provided. The visual toolkit can include a user-interface (110) for entering in a text and a corresponding spoken utterance, a text-to-speech system (120) for synthesizing a pronunciation from the text, a talking speech recognizer (132) for generating pronunciations of the spoken utterance, and a voice processor (130) for validating at least one pronunciation. A developer can type a text of a word into the toolkit and listen to the pronunciation to determine whether the pronunciation is acceptable. If the pronunciation is incorrect the developer can speak the word for providing a spoken utterance having a correct pronunciation.
摘要:
A communication device includes: (1) a wireless adapter at which a wireless headset is communicatively connected to the communication device and at which is received a first acoustic input that includes a speech input and a first ambient noise input; (2) a microphone that receives a second acoustic input, which includes a second ambient noise input; and (3) a dual-channel adaptive noise canceller that utilizes the second ambient noise input to filter the first ambient noise input out of the first acoustic input to generate an acoustic output that primarily comprises the speech input.
摘要:
A method, apparatus, and electronic device for voice navigation are disclosed. A voice input mechanism 310 may receive a verbal input from a user to a voice user interface program invisible to the user. A processor 104 may identify in a graphical user interface (GUI) a set of GUI items. The processor 104 may convert the set of GUI items to a set of voice searchable indices 400. The processor 104 may correlate a matching GUI item of the set of GUI items to a phonemic representation of the verbal input.
摘要:
A method and apparatus for speaker independent real-time affect detection includes generating (205) a sequence of audio frames from a segment of speech, generating (210) a sequence of feature sets by generating a feature set for each frame, and applying (215) the sequence of feature sets to a sequential classifier to determine a most likely affect expressed in the segment of speech.