摘要:
A method for animating an image is useful for animating avatars using real-time speech data. According to one aspect, the method includes identifying an upper facial part and a lower facial part of the image (step 705); animating the lower facial part based on speech data that are classified according to a reduced vowel set (step 710); tilting both the upper facial part and the lower facial part using a coordinate transformation model (step 715); and rotating both the upper facial part and the lower facial part using an image warping model (step 720).
摘要:
An electronic device (200) for speech dialog includes functions that receive (205, 105) an utterance that includes an instantiated variable (215), perform voice recognition (210, 115, 120) of the instantiated variable to determine a most likely set of acoustic states (220) and a corresponding sequence of phonemes with stress information (215), determine prosodic characteristics (272, 274, 276, 130) for a synthesized value of the instantiated variable (236) from the sequence of phonemes with stress information and a set of stored prosody models. The electronic device generates (335, 140) a synthesized value of the instantiated variable using the most likely set of acoustic states and the prosodic characteristics of the instantiated variable.
摘要:
According to one aspect of the invention there is provided a method (20) and electronic device (1) for determining orientation and recognition of handwritten characters scribed on touchscreen (5). The method (20) includes receiving (22) the hand written character and then normalizing (23) the character to provide a scaled character that fits within a defined boundary. The scaled character comprises at least one line and a step of identifying (24) the lines of the scaled character as a vector is effected and thereafter a step of rotating (26) rotates the scaled character from an initial orientation to a final orientation through a plurality of discrete orientations. A step of calculating (27) then calculates, for each of the discrete orientations, magnitudes of co-ordinate components of each vector and then a summing step (28) then sums, for each of said discrete orientations, the co-ordinate components to provide a summed co-ordinate component for the scaled character at a corresponding discrete orientation. An assessing step (31) then assesses each of the summed co-ordinate components to determine a suitable orientation of the scaled character.
摘要:
A technique is used in a speech encoder (107) that reduces non-speech activity of a low bit rate digital voice message. Speech model parameters that include quantized speech spectral parameter vectors are generated in a sequence of frames. A determination is made as to which frames of the sequence of frames are voiced frames and which frames are unvoiced frames. A consecutive sequence of frames of unvoiced frames is identified (2330) as an unvoiced burst when a length, NUV, of the consecutive sequence of frames exceeds a predetermined length, Ns. A non-speech activity portion of the unvoiced burst is identified (2335-2365) and removed.
摘要:
An interactive method for composing an alphanumeric message by a caller using a telephone keypad includes storing (215) a lexical database (135) from which unigram probabilities, forward conditional probabilities, and backward conditional probabilities for a plurality of words can be recovered; storing a received sequence of key codes (405) representing a sequence in which keys on a telephone style keypad are keyed; generating a word trellis including candidate words (415) derived from the sequence and the lexical database; determining a most likely phrase (420) from the candidate words, the unigram probabilities, forward conditional probabilities, and backward conditional probabilities; generating a most likely message (425) from the most likely phrase and presenting the most likely message to the caller; and confirming that the most likely message is the alphanumeric message (430).
摘要:
An apparatus and method for processing a voice message to provide low bit rate speech transmission processes the voice message to generate speech parameters which are arranged into a two dimensional parameter matrix (502) including a sequence of parameter frames. The two dimensional parameter matrix (502) is transformed using a predetermined two dimensional matrix transformation function (414) to obtain a two dimensional transform matrix (506). Distance values representing distances between templates of a set of predetermined templates and the two dimensional transform matrix (506) are then derived. The distance values derived are identified by indexes identifying the templates of the set of predetermined templates. The distance values derived are compared, and an index corresponding to a template of the set of predetermined templates having a shortest distance is selected and then transmitted.
摘要:
An MBE synthesizer (116) for generating a segment of speech from compressed speech data received by a receiver (2004). The compressed speech data includes one or more indexes (2240, 2242) and pitch data (2248). The MBE synthesizer (116) includes the following: an excitation generator (2222) utilizing a transform function for generating transformed excitation components responsive to the pitch data (2248). A memory (3006) for storing a table of predetermined spectral vectors (2205) and associated predetermined voicing vectors (2203). A harmonic amplitude estimator (2209) that is responsive to the one or more predetermined spectra/vectors identified by the indexes (2240, 2242) received, that generates harmonic amplitude control signals. The harmonic amplitude estimator (2209) which includes a peak detector (2503), a peak enhancer (2505), a valley detector (2507), a valley enhancer (2509). A multi-band voicing controller (2214), responsive to the predetermined voicing vectors which are associated with the one or more predetermined spectral vectors identified, for controlling a selection of the excitation components.
摘要:
An apparatus codes excitation parameters for very low bit rate voice messaging using a method that processes a voice message to generating speech parameters. The speech parameters are separated (316) to produce a first group of energy parameters and a second group of pitch and voicing parameters. Subsequently, the first group of energy parameters are encoded and compressed using a non-uniform root-mean-square scalar process (318) to create a first plurality of encoded data. Additionally, the second group of pitch and voicing parameters are compressed, encoded, and combined into a single parameter using a three slope vector encoding process (320) that creates a second plurality of encoded data. Finally, the first and second plurality of encoded data are multiplexed (322) to create a multiplexed signal for transmission, the multiplexed signal representing the voice message.
摘要:
A portable device comprises a data storage for storing avatar data defining a user avatar. The user avatar is formed by a plurality of visual objects. The portable device further comprises a camera for capturing an image. A visual characteristic processor is arranged to determine a first visual characteristic from the image and an avatar processor is arranged to set an object visual characteristic of an object of the plurality of visual objects in response to the first visual characteristic. The invention may allow improved customization of user avatars. For example, a color of an element of a user avatar may be adapted to a color of a real-life object simply by a user taking a picture thereof.
摘要:
A digital signal processor for processing data including voice messaging data that may have both voiced and unvoiced speech components utilizes computer routines stored in a memory used by the digital signal processor. The computer routines programmed provide for control of at least a portion of a selective call receiver; receiving and decoding data received at the selective call receiver; comparing the addresses received at the selective call receiver with addresses stored in a memory location coupled to the digital signal processor; controlling voicing including both voiced and unvoiced speech components; and generating a pitch wave using an inverse discrete Fourier Transform and resample the pitch wave to provide a time domain voiced speech component.