摘要:
The invention provides a method and apparatus for automatically generating a summary or key phrase for a song. The song, or a portion thereof, is digitized and converted into a sequence of feature vectors, such mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in order decipher the song's structure. Those sections that correspond to different structural elements are then marked with corresponding labels. Once the song is labeled, various heuristics are applied to select a key phrase corresponding to the song's summary. For example, the system may identify the label that appears most frequently within the song, and then select the longest duration of that label as the summary.
摘要:
A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.
摘要:
A method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.
摘要:
A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.