摘要:
A neuron device network is provided with a speech input layer, a context layer, a hidden layer, a speech output layer and a hypothesis layer. A phoneme to be learned is spectral-analyzed by an FFT unit and a vector row at a time point t is input to a speech input layer. Also, a vector state of the hidden layer at a time t-1 is input to the context layer, the vector row at a time t+1 is input to the speech output layer as an instructor signal, and a code row for hypothesizing the phoneme, or the code row, is input to the hypothesis layer. The time series relation of the vector rows and the phoneme are hypothetically learned. Alternatively, a spectrum, a cepstrum or a speech vector row based on outputs from the hidden layer of an auto-associative neural network is input to the speech input layer, and the code row is output from the hypothesis layer, taking into account the time series relation. The speech is recognized when a CPU reads the stored output values of the hidden layer and the connection weights of the hidden layer and the hypothesis layer from a memory of the neuron device network and calculates output values of the respective neuron devices of the hypothesis layer based on the output values and the connection weights. The corresponding phoneme is determined by collating the output values of the respective neuron devices of the hypothesis layer with the code rows in an instructor signal table.