摘要:
A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.
摘要:
A method of providing modified media content is disclosed that includes providing media content to a destination device via a network, where the media content comprises video data and audio data have a first viewing rate. The method further includes receiving data indicating a selection of a second viewing rate via the network and modifying the media content to produce modified media content having approximately the second viewing rate. The modified media content includes modified video data and modified audio data synchronized at approximately the second viewing rate.
摘要:
A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.
摘要:
A method of providing modified media content is disclosed that includes providing media content to a destination device via a network, where the media content comprises video data and audio data have a first viewing rate. The method further includes receiving data indicating a selection of a second viewing rate via the network and modifying the media content to produce modified media content having approximately the second viewing rate. The modified media content includes modified video data and modified audio data synchronized at approximately the second viewing rate.
摘要:
Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.
摘要:
Disclosed are systems and methods for recognizing speech in a spoken dialogue system. The method includes (1) receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant, (2) generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result; (3) distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech, (4) calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score, (5) determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score, and (6) refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.
摘要:
Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.
摘要:
Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.
摘要:
Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
摘要:
Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.