摘要:
A signal processing apparatus includes: a separation processing unit that generates observed signals in the time frequency domain by performing the short-time Fourier transform on mixed signals as outputs, which are acquired from a plurality of sound sources by a plurality of sensors, and generates sound source separation results corresponding to the sound sources by a linear filtering process on the observed signals. The separation processing unit has a linear filtering process section that performs the linear filtering process on the observed signals so as to generate separated signals corresponding to the respective sound sources, an all-null spatial filtering section that applies an all-null spatial filter to generate signals filtered with the all-null spatial filter (spatially filtered signals) in which the acquired sounds in null directions are removed, and a frequency filtering section that performs a filtering process by inputting the separated signals and the spatially filtered signals.
摘要:
An apparatus, method and program for performing a speech recognition process utilizing contextual information that comprises an estimation of the intention of an utterance of a user. The recognition process includes calculating a pre-score based on observed contextual information according intention models which correspond to a plurality of types of intention information and combining the pre-scoring results with acoustic and linguistic scores to obtain an improved recognition or comprehension of the intent of a user utterance.
摘要:
A signal processing device includes a signal transform unit which generates observation signals in the time frequency domain, and an audio source separation unit which generates an audio source separation result, and the audio source separation unit includes a first-stage separation section which calculates separation matrices for separating mixtures included in the first frequency bin data set by a learning process in which Independent Component Analysis is applied to the first frequency bin data set, and acquires a first separation result for the first frequency bin data set, a second-stage separation section which acquires a second separation result for a second frequency bin data set by using a score function in which an envelope is used as a fixed one, and executing a learning process for calculating separation matrices for separating mixtures, and a synthesis section which generates the final separation results by integrating the first and the second separation results.
摘要:
Problems of permutation can be solved with high accuracy without utilizing knowledge about original signals or information concerning positions of microphones and the like when each one of plural signals mixed in an audio signal is separated using independent component analysis. A short-time Fourier transformation section generates spectrograms of observation signals from observation signals in time domain. A signal separation section separates the spectrograms of the observation signals into spectrograms of respective signals, to generate spectrograms of separate signals. A permutation problem solution section calculates a scale corresponding to the degree of permutation, e.g., a Kullback-Leiblar information amount calculated by use of a multidimensional probability density function or multidimensional kurtosis, from substantial whole of the spectrograms of the separate signals. Based on the scale, signals at each of frequencies bin of the spectrograms of the separate signals are exchanged between channels, to solve the permutation problem.
摘要:
A plural number of letters or characters, inferred from the results of letter/character recognition of an image photographed by a CCD camera (20), a plural number of kana readings inferred from the letters or characters and the way of pronunciation corresponding to the kana readings are generated in an pronunciation information generating unit (150) and the plural readings obtained are matched to the pronunciation from the user acquired by a microphone (23) to specify one kana reading and the way of pronunciation (reading) from among the plural generated candidates.
摘要:
A system and method for automatically implementing a finite state automaton for speech recognition includes a finite state automaton generator that analyzes one or more input text sequences and automatically creates a node table and a link table to define the finite state automaton. The node table includes N-tuples from the input text sequences. Each N-tuple includes a current word and a corresponding history of one or more prior words from the input text sequences. The node table also includes unique node identifiers that each correspond to a different respective one of the current words. The link table includes specific links between successive words from the input text sequences. The links identified in the link table are defined by utilizing start node identifiers and end node identifiers from the unique node identifiers of the node table.
摘要:
A signal processing apparatus includes: a learning processing unit that finds a separating matrix for separating mixed signals in which outputs from a plurality of sound sources are mixed, by a learning process that applies ICA (Independent Component Analysis) to observed signals including the mixed signals; a separation processing unit that applies the separating matrix to the observed signals to separate the mixed signals and generate separated signals corresponding to each of the sound sources; and a sound source direction estimating unit that computes a sound source direction of each of the generated separated signals. The sound source direction estimating unit calculates cross-covariance matrices between the observed signals and the separated signals in corresponding time segments in time-frequency domain, computes phase differences between elements of the cross-covariance matrices, and computes a sound source direction corresponding to each of the separated signals by applying the computed phase differences.
摘要:
There is provided a sound signal processing device, in which an observation signal analysis unit receives multi-channels of sound-signals acquired by a sound-signal input unit and estimates a sound direction and a sound segment of a target sound to be extracted and a sound source extraction unit receives the sound direction and the sound segment of the target sound and extracts a sound-signal of the target sound. By applying short-time Fourier transform to the incoming multi-channel sound-signals this device generates an observation signal in the time-frequency domain and detects the sound direction and the sound segment of the target sound. Further, based on the sound direction and the sound segment of the target sound, this device generates a reference signal corresponding to a time envelope indicating changes of the target's sound volume in the time direction, and extracts the signal of the target sound, utilizing the reference signal.
摘要:
A natural language processing apparatus includes an input section for inputting natural language, a representation converting section for converting representation of the natural language, a display section for displaying, for confirmation, sentence converted at the representation converting section, a machine translation section for carrying out machine translation of the confirmed sentence, and a control section for controlling these respective sections, thus to provide natural language processing in which confirmation operation of user is reduced.
摘要:
A signal processing apparatus includes a source separation module for producing respective separation signals corresponding to a plurality of sound sources by applying an ICA (Independent Component Analysis) to observation signals produced based on mixture signals from the sound sources, which are taken by source separation microphones, to thereby execute a separation process of the mixture signals, and a signal projection-back module for receiving observation signals of projection-back target microphones and the separation signals produced by the source separation module, and for producing projection-back signals as respective separation signals corresponding to the sound sources, which are taken by the projection-back target microphones. The signal projection-back module produces the projection-back signals by receiving the observation signals of the projection-back target microphones which differ from the source separation microphones.