摘要:
An information transmission device which analyzes a diction of a speaker and provides an utterance in accordance with the diction of the speaker, and which has a microphone detecting a sound signal of the speaker, a feature extraction unit extracting at least one feature value of the diction of the speaker based on the sound signal detected by the microphone, a voice synthesis unit synthesizing a voice signal to be uttered so that the voice signal has the same feature value as the diction of the speaker, based on the feature value extracted by the feature extraction unit, and a voice output unit performing an utterance based on the voice signal synthesized by the voice synthesis unit.
摘要:
A voice recognition system (10) for improving the toughness of voice recognition for a voice input for which a deteriorated feature amount cannot be completely identified. The system comprises at least two sound detecting means (16a, 16b) for detecting a sound signal, a sound source localizing unit (21) for determining the direction of a sound source based on the sound signal, a sound source separating unit (23) for separating a sound by the sound source from the sound signal based on the sound source direction, a mask producing unit (25) for producing a mask value according to the reliability of the separation results, a feature extracting unit (27) for extracting the feature amount of the sound signal, and a voice recognizing unit (29) for applying the mask to the feature amount to recognize a voice from the sound signal.
摘要:
A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(ω,f) is separated from an observed signal Y(ω,f) according to a first model and a second model to extract an unknown signal E(ω,f). According to the first model, the original signal X(ω,f) of the current frame f is represented as a combined signal of known signals S(ω,f−m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(ω,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(ω,f) on the observed signal Y(ω,f).
摘要:
An information transmission device which analyzes a diction of a speaker and provides an utterance in accordance with the diction of the speaker, and which has a microphone detecting a sound signal of the speaker, a feature extraction unit extracting at least one feature value of the diction of the speaker based on the sound signal detected by the microphone, a voice synthesis unit synthesizing a voice signal to be uttered so that the voice signal has the same feature value as the diction of the speaker, based on the feature value extracted by the feature extraction unit, and a voice output unit performing an utterance based on the voice signal synthesized by the voice synthesis unit.
摘要:
An acoustic data processor according to the present invention is used for processing acoustic data including signal sounds to reduce noises generated by a mechanical apparatus. The acoustic data processor includes a motion status obtaining section for obtaining motion status of the mechanical apparatus, an acoustic data obtaining section for obtaining acoustic data corresponding to the obtained motion status, and a database for storing various motion statuses of the mechanical apparatus in a unit time and corresponding acoustic data as templates. The acoustic data processor further includes a database searching section for searching the database to retrieve the template having the motion status closest to the obtained motion status; and a template subtraction section for subtracting the acoustic data of the template having the motion status closest to the obtained motion status from the obtained acoustic data to reduce noises generated by the mechanical apparatus.
摘要:
A robot that recognizes speech of a person while performing predetermined motions or gestures, the robot includes: a drive unit executing the motions or gestures; a determination unit determining one of the motions or gestures being executed; a speech recognition unit having at least two recognition algorithms including a multi-condition training algorithm; and a switch unit selecting one of the recognition algorithms depending on one of the motions or gestures determined.
摘要:
A trajectory planning system obtains a trajectory for controlling a state of an object toward a goal state. The system includes a search tree generating section which registers a state of the object as a root of a search tree in a state space, registers a next state of the object after a lapse of a predetermined time interval obtained through dynamical relationships during the time interval as a branch of the search tree in the state space. The system further includes a known-state registration tree storing section which stores a known-state registration tree and a known-state registration tree generating section which determines a cell to which the next state belongs among a plurality of cells previously prepared by segmenting the state space, determines whether or not a state which belongs to the cell has already been registered as a branch of the known-state registration tree, discards the next state when a state which belongs to the cell has been registered, and registers the next step as a branch of the known-state registration tree when a state which belongs to the cell has not been registered. The system further includes a trajectory generating section which selects a state whose distance to the goal state is minimum among states registered as branches of the known-state registration tree and obtains a trajectory using a sequence of states in a backward direction from the state toward the root of the known-state registration tree.
摘要:
In an artificial intelligence system for image recognition, a global image of an object is input from a camera or other optical pick-up device, and is processed in a global image processing means, which performs analytical processing on the global image by extracting global characteristics of the input image and evaluating consistency of the extracted global characteristics. Simultaneously, the image data is processed in a local image processing means which undertakes analytical processing on a plurality of local images defining local portions of the image to be recognized. The local image processing means is constructed by plural modules, each further defined by sub-modules, which conduct respective analyses corresponding to local images having characteristics useful in recognizing the global image, wherein each local processor extracts characteristics of an input local image and evaluates consistency of the extracted characteristic with the object to be recognized. Importantly, the global image processing means receives inputs from the local modules, and deactivates functions of local modules which are inconsistent with the global characteristics, while activating and promoting functions of local modules which are consistent with the global characteristics. Through top-down control from the global image processor, as well as inter-module signals between respective local processing modules, since inconsistent processes are quickly discovered,
摘要:
An artificial visual apparatus and method for image recognition having a simple adaptive scaling mechanism enables the definition of scale invariant visual icons in a processing area corresponding to the anterior inferotemporal cortex (AIT) in a one-step, value-based decision making process. Icon related activity states resulting from sensory filtering to a fourth stage KL filter corresponding to the V4 area are recognized independent of the scale and position of the item to be recognized within the maximum visual field. The AIT processing area controls the window of attention in the V4 area and confines further processing onto this selected spotlight. The invention presents a biologically plausible method for scale invariant mapping from the V4 stage filter to the AIT processor. Filtering based on principal component analysis (PCA), or Karhunen-Loeve (KL) filtering, yields image data of the item of interest in the V4 stage filter, such data then being supplied to the AIT processor by a scale-invariant mapping process which controls the number of inputs to the KL filters to achieve constant resolution independent of the scale of the item of interest in the maximum visual field. Thus, the problem of scale-invariant mapping is reduced to a simple adaptive thresholding by feedforward inhibition at the AIT processor.
摘要:
A learning system according to the present invention includes an event list database for storing a plurality of event lists, each of the event lists being a set including a series of state-action pairs which reaches a state-action pair immediately before earning a reward, an event list managing section for classifying state-action pairs into the plurality of event lists for storing, and a learning control section for updating expectation of reward of a state-action pair which is an element of each of the event lists.