摘要:
A system that incorporates teachings of the present disclosure may initiate a communication session with a member device of a social network and may activate a speech capture element based on a pattern of prior speech messages. A speech message may be detected at the speech capture element and, in turn, transmitted to the member device.
摘要:
The present disclosure provides a method for recognition. The method includes inputting a melody and obtaining pitch tracking information of the melody; obtaining beat information of the melody; determining a clarity value according to the pitch tracking information; implementing a first comparison process first to filter a first set of candidate songs from a database and then implementing a second comparison process to filter a second set of candidate songs from the first set of candidate songs if the clarity value is larger than a predetermined threshold; and determining at least one final candidate song from the second set of candidate songs.
摘要:
Methods, apparatuses, and media for filtering a data stream are provided. The data stream is partitioned into a plurality of data stream segments. An acoustic parameter of each of the data stream segments is measured, and it is determined whether the acoustic parameter of each of the data stream segments satisfies a predetermined condition. Extraneous segments of the data stream segments are identified in which the predetermined condition is satisfied, and it is determined whether the extraneous segments have a predetermined relationship in the data stream. The extraneous segments are deleted from the data stream to produce a filtered data stream in response to the extraneous segments having the predetermined relationship.
摘要:
A method and system for endpoint automatic detection of audio record is provided. The method comprises the following steps: acquiring a audio record text and affirming the text endpoint acoustic model for the audio record text; starting acquiring the audio record data of each frame in turn from the audio record start frame in the audio record data; affirming the characteristics acoustic model of the decoding optimal path for the acquired current frame of the audio record data; comparing the characteristics acoustic model of the decoding optimal path acquired from the current frame of the audio record data with the endpoint acoustic model to determine if they are the same; if yes, updating a mute duration threshold with a second time threshold, wherein the second time threshold is less than a first time threshold. This method can improve the recognizing efficiency of the audio record endpoint.
摘要:
A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.
摘要:
An interesting section identifying device for identifying an interesting section of a video file based on an audio signal included in the video file, the interesting section being a section in which a user is estimated to express interest, includes an interesting section candidate extracting unit that extracts an interesting section candidate from the video file, the interesting section candidate being a candidate for the interesting section, a detailed structure determining unit that determines whether the interesting section candidate includes a specific detailed structure, and an interesting section identifying unit that identifies the interesting section by analyzing a specific section when the detailed structure determining unit determines that the interesting section candidate includes the detailed structure, the specific section including the detailed structure and being shorter than the interesting section candidate.
摘要:
A system for determining the status of an answered telephone during the course of an outbound telephone call includes an automated telephone calling device for placing a telephone call to a location having a telephone number at which a target person is listed, upon the telephone call being answered, initiating a prerecorded greeting which asks for the target person and receiving a spoken response from an answering person and a speech recognition device for performing a speech recognition analysis on the spoken response to determine a status of the spoken response. If the speech recognition device determines that the answering person is the target person, the speech recognition device initiates a speech recognition application with the target person.
摘要:
An echo reduction method stores a received audio information stream. A sound detection flag is activated following detection of locally generated sound. Output based on the received audio information stream is muted in response to the activating the sound detection flag. Rendering status of the received audio information stream is saved, in response to the activating the sound detection flag, to reduce loss of audio information. At least a portion of the stored received audio information stream is rendered following inactivation of the sound detection flag.
摘要:
An audio analysis apparatus includes the following components. A main body includes a discrimination unit and a transmission unit. A strap is used for hanging the main body from a user's neck. A first audio acquisition device is provided to the strap or the main body. A second audio acquisition device is provided to the strap at a position where a distance between the second audio acquisition device and the user's mouth is smaller than the distance between the first audio acquisition device and the user's in a state where the strap is worn around the user's neck. The discrimination unit discriminates whether an acquired sound is an uttered voice of the user or of another person by comparing audio signals of the sound acquired by the first and second audio acquisition devices. The transmission unit transmits information including the discrimination result to an external apparatus.
摘要:
A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.