摘要:
A method and apparatus for improving the performance of voice recognition in a mobile device are provided. The method of recognizing a voice includes: monitoring the usage pattern of a user of a device for inputting a voice; selecting predetermined words from among words stored in the device based on the result of monitoring, and storing the selected words; and recognizing a voice based on an acoustic model and predetermined words. In this way, a voice can be recognized by using prediction of whom the user mainly makes a call to. Also, by automatically modeling the device usage pattern of the user and applying the pattern to vocabulary for voice recognition based on probabilities, the performance of voice recognition, as actually felt by the user, can be enhanced.
摘要:
A virtual screen sound source is spatially synchronized with a visual object displayed on a display. A plurality of loudspeaker sets, which each include at least three of a plurality of loudspeakers installed at the periphery of a display, are selected, individual sound sources corresponding to the respective selected loudspeaker sets are generated, and a multi-sound source is generated by overlapping the generated individual sound sources and output through loudspeakers included in the loudspeaker sets.
摘要:
A method and an apparatus for selecting a vocabulary closest to an input speech from among lexicons stored in memory, wherein a centroid lexicon representing lexicons belonging to a predetermined lexicon group is generated. Two lexicons, having a longest distance therebetween in the lexicon group, are selected using the centroid lexicon from the lexicon group, and a node indicating the lexicon group branches based on the two selected lexicons. A node having low group similarity is selected from among current terminal nodes, including branch nodes, and the above procedure is repeatedly performed on a lexicon group indicated by the selected node.
摘要:
Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.
摘要:
A speech synthesis method, in which speech units are concatenated using a DB, wherein the speech units to be concatenated are determined and divided into a left speech unit and a right speech unit. The length of an interpolation region of each of the left and right speech units is variably determined. An extension is attached to a right boundary of the left speech unit and an extension to a left boundary of the right speech unit. The locations of pitch marks included in the extension of each of the left and right speech units are aligned so that the pitch marks can fit in the predetermined interpolation region. The left and right speech units are superimposed after fading out the left speech unit and fading in the right speech unit. Accordingly, a determination of whether extra-segmental data exists or not is made, and smoothing concatenation is performed using either an interpolation of existing data or an interpolation of extrapolated data depending on the result of the determination.
摘要:
A flexible printed circuit board includes a first substrate portion having at least one first terminal, a second substrate portion in communication with the first substrate portion and having at least one circuit device, a connection substrate portion in communication with the second substrate portion, the connection substrate portion extending away from the second substrate portion in a same direction as the first substrate portion, and a third substrate portion in communication with the connection substrate portion, the third substrate portion having at least one second terminal.
摘要:
There are provided a system and method for providing information using a spoken dialogue interface. The system includes a speech recognizer for transforming voice signals into sentences; a sentence analyzer for analyzing the sentences by their structural elements; a dialogue manager for extracting information on speech acts or intentions from the structural elements, and generating information on system's speech acts or intentions for a response to the extracted information on speech acts or intentions; a sentence generator for generating sentences based on the information on the system's speech acts or intentions for the response; a speech synthesizer for synthesizing the generated sentences into voices; an information extractor for extracting information required for the response from the Internet in real time; and a user modeling means for analyzing and classifying users' tendencies. Information demanded by a user can be detected in real time and provided through a voice interface with versatile and familiar dialogues based on the user's tendencies.
摘要:
Disclosed herein is a method and apparatus to recognize speech by measuring the confidence levels of respective frames. The method includes the operations of obtaining frequency features of a received speech signal for the respective frames having a predetermined length, calculating a keyword model-based likelihood and a filler model-based likelihood for each of the frame, calculating a confidence score based on the two types of likelihoods, and deciding whether the received speech signal corresponds to a keyword or a non-keyword based on the confidence scores. Also, the method includes the operation of transforming the confidence scores by applying transform functions of clusters, which include the confidence scores or are close to the confidence scores, to the confidence scores.