-
公开(公告)号:US20250061917A1
公开(公告)日:2025-02-20
申请号:US18235372
申请日:2023-08-18
Applicant: Google LLC
Inventor: Josh Belanich , Taesik Gong , Krishna Somandepalli , Brian Eoff , Brendan Wesley Jou , Arsha Nagrani
Abstract: The technology relates to enhancing speech emotion recognition models with methods that enable the use of unlabeled data by inferring weak emotion labels. This is done by pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, a textual entailment approach selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. The system may employ a method that generates, by one or more processors, a text transcript for a snippet of input speech, and then applies the text transcript to a pre-trained language model. The system can generate, using the pre-trained language model according to an engineered prompt and a predetermined taxonomy, a textual entailment from the text transcript. Based on this, the system may generate, by the one or more processors using the textual entailment, a predicted emotion corresponding to the input speech.
-
公开(公告)号:US20230297852A1
公开(公告)日:2023-09-21
申请号:US18007379
申请日:2021-07-29
Applicant: Google LLC
Inventor: Li Zhang , Andrew Gerald Howard , Brendan Wesley Jou , Yukun Zhu , Mingda Zhang , Andrey Zhmoginov
IPC: G06N5/022
CPC classification number: G06N5/022
Abstract: Example implementations of the present disclosure combine efficient model design and dynamic inference. With a standalone lightweight model, the unnecessary computation on easy examples is avoided and the information extracted by the lightweight model also guide the synthesis of a specialist network from the basis models. With extensive experiments on ImageNet it is shown that a proposed example BasisNet is particularly effective for image classification and a BasisNet-MV3 achieves 80.3% top-1 accuracy with 290 M MAdds without early termination.
-