SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM

    公开(公告)号:US20230238002A1

    公开(公告)日:2023-07-27

    申请号:US17999403

    申请日:2021-05-28

    发明人: MASATO HIRANO

    摘要: For example, the accuracy of voice recognition is improved.
    A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.

    Channel-compensated low-level features for speaker recognition

    公开(公告)号:US11657823B2

    公开(公告)日:2023-05-23

    申请号:US17107496

    申请日:2020-11-30

    摘要: A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.

    Channel-Compensated Low-Level Features For Speaker Recognition

    公开(公告)号:US20180082692A1

    公开(公告)日:2018-03-22

    申请号:US15709024

    申请日:2017-09-19

    摘要: A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.