Abstract:
Provided is a first acoustic information acquisition unit configured to acquire a first acoustic information obtained by receiving a sound wave emitted from a first sound source by a wearable device worn by a user, a second acoustic information acquisition unit configured to acquire a second acoustic information obtained by receiving a sound wave emitted from a second sound source that is different from the first sound source by the wearable device, and a third acoustic information acquisition unit configured to acquire a third acoustic information used for biometric matching of the user based on the first acoustic information and the second acoustic information.
Abstract:
A speaker recognition system includes a non-transitory computer readable medium configured to store instructions. The speaker recognition system further includes a processor connected to the non-transitory computer readable medium. The processor is configured to execute the instructions for extracting acoustic features from each frame of a plurality of frames in input speech data. The processor is configured to execute the instructions for calculating a saliency value for each frame of the plurality of frames using a first neural network (NN) based on the extracted acoustic features, wherein the first NN is a trained NN using speaker posteriors. The processor is configured to execute the instructions for extracting a speaker feature using the saliency value for each frame of the plurality of frames.
Abstract:
The voice registration device 1X mainly includes a noise reproduction means 220X, a voice data acquisition means 200X, and a voice registration means 210X. The noise reproduction means 220X is configured to reproduce noise data during a time period in which voice input from a user is performed. The voice data acquisition means 200X is configured to acquire the voice data based on the voice input. The voice registration means 210X is configured to register the voice data or data generated based on the voice data as data to be used for verification relating to a voice of the user.
Abstract:
An authentication device is provided with: a plurality of attribute-dependent score calculation units each calculating an attribute-dependent score dependent on a prescribed attribute for input data; an attribute-independent score calculation unit for calculating an attribute-independent score independent of the attribute for the input data; an attribute estimation unit for performing attribute estimation for the input data; and a score integration unit for determining a score weight of each of a plurality of attribute-dependent scores and of the attribute-independent score using the result of the attribute estimation and calculating an output score using the attribute-dependent scores, the attribute-independent score, and the determined score weights.
Abstract:
A feature vector having high class identification capability is generated. A signal processing system provided with: a first generation unit for generating a first feature vector on the basis of one of time-series voice data, meteorological data, sensor data, and text data, or on the basis of a feature quantity of one of these; a weight calculation unit for calculating a weight for the first feature vector; a statistical amount calculation unit for calculating a weighted average vector and a weighted high-order statistical vector of second or higher order using the first feature vector and the weight; and a second generation unit for generating a second feature vector using the weighted high-order statistical vector.
Abstract:
This conversation analysis device comprises: a change detection unit that detects, for each of a plurality of conversation participants, each of a plurality of prescribed change patterns for emotional states, on the basis of data corresponding to voices in a target conversation; an identification unit that identifies, from among the plurality of prescribed change patterns detected by the change detection unit, a beginning combination and an ending combination, which are prescribed combinations of the prescribed change patterns that satisfy prescribed position conditions between the plurality of conversation participants; and an interval determination unit that determines specific emotional intervals, which have a start time and an end time and represent specific emotions of the conversation participants of the target conversation, by determining a start time and an end time on the basis of each time position in the target conversation pertaining to the starting combination and ending combination identified by the identification unit.
Abstract:
A voice detection apparatus includes: a beginning determination unit that determines a beginning of a voice segment including a voice that appears in a voice signal; an end determination unit that determines an end of the voice segment by determining whether or not a length of a non-voice segment that appears after the beginning is determined, is greater than or equal to a threshold; and a setting unit that sets the threshold on the basis of a property of a provisional voice segment starting from the beginning.
Abstract:
A biometric authentication device is provided with: a replay unit for reproducing a sound; an ear authentication unit for acquiring a reverberation sound of the sound in an ear of a user to be authenticated, extracting an ear acoustic feature from the reverberation sound, and calculating an ear authentication score by comparing the extracted ear acoustic feature with an ear acoustic feature stored in advance; a voice authentication unit for extracting a talker feature from a voice of the user that has been input, and calculating a voice authentication score by comparing the extracted talker feature with a talker feature stored in advance; and an authentication integration unit for outputting an authentication integration result calculated based on the ear authentication score and the voice authentication score. After the sound is output into the ear, a recording unit inputs the voice of the user.
Abstract:
The information processing device is provided in a feature extraction block in a neural network. The information processing device acquires a local feature quantity group constituting one unit of information, and computes a weight corresponding to a degree of importance of each local feature quantity. Next, the information processing device computes a weighted statistic for a whole of the local feature quantity group using the computed weights, and deforms and outputs the local feature quantity group using the computed weighted statistic.
Abstract:
A speaker recognition system includes a non-transitory computer readable medium configured to store instructions. The speaker recognition system further includes a processor connected to the non-transitory computer readable medium. The processor is configured to execute the instructions for extracting acoustic features from each frame of a plurality of frames in input speech data. The processor is configured to execute the instructions for calculating a saliency value for each frame of the plurality of frames using a first neural network (NN) based on the extracted acoustic features, wherein the first NN is a trained NN using speaker posteriors. The processor is configured to execute the instructions for extracting a speaker feature using the saliency value for each frame of the plurality of frames.