摘要:
Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.
摘要:
Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.
摘要:
A method for recognizing an environmental sound in a client device in cooperation with a server is disclosed. The client device includes a client database having a plurality of sound models of environmental sounds and a plurality of labels, each of which identifies at least one sound model. The client device receives an input environmental sound and generates an input sound model based on the input environmental sound. At the client device, a similarity value is determined between the input sound model and each of the sound models to identify one or more sound models from the client database that are similar to the input sound model. A label is selected from labels associated with the identified sound models, and the selected label is associated with the input environmental sound based on a confidence level of the selected label.
摘要:
A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.
摘要:
A method for detecting a text region in an image is disclosed. The method includes detecting a candidate text region from an input image. A set of oriented gradient images is generated from the candidate text region, and one or more detection window images of the candidate text region are captured. A sum of oriented gradients is then calculated for a region in one of the oriented gradient images. It is classified whether each detection window image contains text by comparing the associated sum of oriented gradients and a threshold. Based on the classifications of the detection window images, it is determined whether the candidate text region is a true text region.
摘要:
A method for grouping a plurality of client devices is disclosed. The method includes receiving sound descriptors from the plurality of client devices. The sound descriptors are extracted from the environmental sound. Each of the sound descriptors is transmitted to a server, which determines a similarity of the sound descriptors received from the client devices. The server groups the plurality of client devices into at least one similar context group based on the similarity of the sound descriptors.
摘要:
A method for recognizing an environmental sound in a client device in cooperation with a server is disclosed. The client device includes a client database having a plurality of sound models of environmental sounds and a plurality of labels, each of which identifies at least one sound model. The client device receives an input environmental sound and generates an input sound model based on the input environmental sound. At the client device, a similarity value is determined between the input sound model and each of the sound models to identify one or more sound models from the client database that are similar to the input sound model. A label is selected from labels associated with the identified sound models, and the selected label is associated with the input environmental sound based on a confidence level of the selected label.
摘要:
A processor is configured to transition in and out of a low-power state at a first rate and to operate in a first mode or a second mode. In a particular method, the processor while coupled to a coder/decoder (CODEC) retrieves audio feature data from a buffer after transitioning out of the low-power state. The CODEC is configured to operate at a second rate in the first mode and at a third rate in the second mode, the second rate and the third rate each greater than the first rate. The audio feature data indicates features of audio data received during the low-power state of the processor. A ratio of CODEC activity to processor activity in the second mode is less than the ratio in the first mode.
摘要:
A method for detecting a text region in an image is disclosed. The method includes detecting a candidate text region from an input image. A set of oriented gradient images is generated from the candidate text region, and one or more detection window images of the candidate text region are captured. A sum of oriented gradients is then calculated for a region in one of the oriented gradient images. It is classified whether each detection window image contains text by comparing the associated sum of oriented gradients and a threshold. Based on the classifications of the detection window images, it is determined whether the candidate text region is a true text region.
摘要:
A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.