摘要:
An example method is provided and includes receiving a media file that includes video data and audio data; determining an initial scene sequence in the media file; determining an initial speaker sequence in the media file; and updating a selected one of the initial scene sequence and the initial speaker sequence in order to generate an updated scene sequence and an updated speaker sequence respectively. The initial scene sequence is updated based on the initial speaker sequence, and wherein the initial speaker sequence is updated based on the initial scene sequence.
摘要:
A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
摘要:
A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
摘要:
In one embodiment, an audio stream is partitioned into a plurality of segments such that the plurality of segments are clustered into one or more clusters, each of the one or more clusters identifying a subset of the plurality of segments in the audio stream and corresponding to one of a first set of one or more speaker models, each speaker model in the first set of speaker models representing one of a first set of hypothetical speakers. The speaker models in the first set of speaker models are compared with a second set of one or more speaker models, where each speaker model in the second set of speaker models represents one of a second set of hypothetical speakers. Labels associated with one or more speaker models in the second set of speaker models are propagated to one or more speaker models in the first set of speaker models according to a result of the comparing step.
摘要:
Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.
摘要:
Disclosed is a method for drastically reducing the average error rate for signals under mismatched conditions. The method takes a signal (e.g., speech signal) and a set of stored representations (e.g., stored representations of keywords) and performs at least one transformation that results in the signal more closely emulating the stored representations. This is accomplished by using one of three techniques. First, one may transform the signal so that the signal may be better approximated by (e.g., is closer to) one of the stored representations. Second, one may transform the set of stored representations so that one of the stored representations better approximates the signal. Third, one may transform both the signal and the set of stored representations.
摘要:
A computerized pronunciation system is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The system includes a word list including at least one word; transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform; a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including: sets of initial pronunciations of the word, a scoring module configured score pronunciations and to generate phone probabilities, and a set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.
摘要:
In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to being used for such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes. In a network setting, the technique can be implemented in distributed fashion, so that subtasks of the technique are performed by different, interconnected processors such as server, client and proxy processors.
摘要:
In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to being used for such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes. In a network setting, the technique can be implemented in distributed fashion, so that subtasks of the technique are performed by different, interconnected processors such as server, client and proxy processors.
摘要:
In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes.