摘要:
The present disclosure relates to a method, apparatus, and system for speaker verification. The method includes: acquiring an audio recording; extracting speech signals from the audio recording; extracting features of the extracted speech signals; and determining whether the extracted speech signals represent speech by a predetermined speaker based on the extracted features and a speaker model trained with reference voice data of the predetermined speaker.
摘要:
In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
摘要:
Disclosed herein are methods of diarizing audio data using first-pass blind diarization and second-pass blind diarization that generate speaker statistical models, wherein the first pass-blind diarization is on a per-frame basis and the second pass-blind diarization is on a per-word basis, and methods of creating acoustic signatures for a common speaker based only on the statistical models of the speakers in each audio session.
摘要:
In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
摘要:
A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.
摘要:
A method of determining an emotion of an utterance. The method can include receiving the utterance at a processor-based device comprising an audio engine. The method also can include extracting emotion-related acoustic features from the utterance. The method additionally can include comparing the emotion-related acoustic features to a plurality of emotion models that are representative of emotions. The method further can include selecting a model from the plurality of emotion models based on the comparing the emotion-related acoustic features to the plurality of emotion models. The method additionally can include outputting the emotion of the utterance, wherein the emotion corresponds to the selected model. Other embodiments are provided.
摘要:
In some embodiments, a method of creating an automatic language characteristic recognition system. The method can include receiving a plurality of audio recordings. The method also can include segmenting each of the plurality of audio recordings to create a plurality of audio segments for each audio recording. The method additionally can include clustering each audio segment of the plurality of audio segments according to audio characteristics of each audio segment to form a plurality of audio segment clusters. Other embodiments are provided.
摘要:
In one embodiment, the system and method for expressive language development; a method for detecting autism in a natural language environment using a microphone, sound recorder, and a computer programmed with software for the specialized purpose of processing recordings captured by the microphone and sound recorder combination; and the computer programmed to execute a method that includes segmenting an audio signal captured by the microphone and sound recorder combination using the computer programmed for the specialized purpose into a plurality recording segments. The method further includes determining which of the plurality of recording segments correspond to a key child. The method also includes extracting acoustic parameters of the key child recordings and comparing the acoustic parameters of the key child recordings to known acoustic parameters for children. The method returns a determination of a likelihood of autism.
摘要:
In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
摘要:
Provided is a method for context independent gender recognition utilizing phoneme transition probability. The method for the context independent gender recognition includes detecting a voice section from a received voice signal, generating feature vectors within the detected voice section, performing a hidden Markov model on the feature vectors by using a search network that is set according to a phoneme rule to recognize a phoneme and obtain scores of first and second likelihoods, and comparing final scores of the first and second likelihoods obtained while the phoneme recognition is performed up to the last section of the voice section to finally decide gender with respect to the voice signal.