NEURAL NETWORKS FOR SPEAKER VERIFICATION
    7.
    发明公开

    公开(公告)号:EP4084000A3

    公开(公告)日:2022-11-09

    申请号:EP22179382.1

    申请日:2016-07-27

    申请人: Google LLC

    摘要: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

    METHOD AND SYSTEM FOR TELECONFERENCE ACTUAL PARTICIPANT RECOGNITION

    公开(公告)号:EP4040435A1

    公开(公告)日:2022-08-10

    申请号:EP21305164.2

    申请日:2021-02-05

    申请人: ALE International

    IPC分类号: G10L17/18

    摘要: A method and system for detecting and recognizing an actual participant (607) as an active speaker in a teleconference (606) by training an encoder before the teleconference takes place, and creating a database (602) of reference vectors representing the voice of candidate participants, then, while the teleconference takes place, comparing reference vectors with vectors representing the voice stream of actual participants. The encoder may for example be a Convolutional Neural Network (CNN).

    SIGNAL EXTRACTION SYSTEM, SIGNAL EXTRACTION LEARNING METHOD, AND SIGNAL EXTRACTION LEARNING PROGRAM

    公开(公告)号:EP3979240A1

    公开(公告)日:2022-04-06

    申请号:EP19930251.4

    申请日:2019-05-28

    申请人: NEC Corporation

    摘要: A neural network input unit 81 inputs a neural network in which a first network having a layer for inputting an anchor signal belonging to a predetermined class and a mixed signal including a target signal belonging to the class and a layer for outputting, as an estimation result, a reconstruction mask indicating a time-frequency domain in which the target signal is present in the mixed signal, and a second network having a layer for inputting the target signal extracted by applying the mixed signal to the reconstruction mask and a layer for outputting a result obtained by classifying the input target signal into a predetermined class are combined. A reconstruction mask estimation unit 82 applies the anchor signal and mixed signal to the first network to estimate the reconstruction mask of the class to which the anchor signal belongs. A signal classification unit 83 applies the mixed signal to the estimated reconstruction mask to extract the target signal, and applies the extracted target signal to the second network to classify the target signal into the class.