MULTI-TASK LEARNING FOR PERSONALIZED KEYWORD SPOTTING

    公开(公告)号:US20230298592A1

    公开(公告)日:2023-09-21

    申请号:US18153932

    申请日:2023-01-12

    CPC classification number: G10L17/14 G10L17/18 G10L17/04 G10L17/24

    Abstract: Systems and techniques are provided for processing audio data. For example, the systems and techniques can be used for personalized keyword spotting through multi-task learning (PK-MTL). A process can include obtaining an audio sample, generating a representation of a keyword based on the audio sample, and generating a representation of a speaker based on the audio sample. The speaker can be associated with the keyword. A first similarity score can be determined based on a reference representation and one or more of the representation of the keyword and a representation of the speaker. The reference representation can be associated with one or more of the keyword and the speaker. A keyword spotting (KWS) output can be generated based on analyzing the first similarity score against at least a first threshold, wherein the KWS output accepts or rejects the audio sample as including a target keyword.

    DUMMY PROTOTYPICAL NETWORKS FOR FEW-SHOT OPEN-SET KEYWORD SPOTTING

    公开(公告)号:US20230298572A1

    公开(公告)日:2023-09-21

    申请号:US18062976

    申请日:2022-12-07

    CPC classification number: G10L15/16 G06F18/22 G10L2015/088

    Abstract: Systems and techniques are provided for processing audio data. For example, a dummy prototypical network may be used to perform few-shot open-set keyword spotting (FSOS-KWS). A process can include determining one or more prototype representations based on a plurality of support samples associated with one or more classes. Each prototype representation may be associated with one of the class(es). A dummy prototype representation can be determined in a same learned metric space as the prototype representations. One or more distance metrics can be determined for each query sample of one or more query samples. The distance metrics may be based on the prototype representations and the dummy prototype representation. Each query sample can be classified based on the distance metrics. Each query sample may be classified into one of the class(es) associated with the prototype representations or into an open-set class associated with the dummy prototype representation.

Patent Agency Ranking