Patent search ap:("Lemon Inc." OR "Beijing Zitiao Network Technology Co. Page Ltd.") AND inv:"Zongyu Yin"

1.

发明申请
MULTI-MODAL ENCODER PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20250078814A1

公开(公告)日：2025-03-06

申请号：US18819280

申请日：2024-08-29

Applicant: Lemon Inc. , Beijing Zitiao Network Technology Co., Ltd.

Inventor： Dong Guo , Zihao He , Weituo Hao , Xuchen Song , Zongyu Yin , Jingsong Gao , Wei Tsung Lu , Junyu Dai

IPC: G10L15/06 , G06F40/126 , G10L25/30

Abstract: The present disclosure provides a multi-modal encoder processing method and apparatus, a computer device and a storage medium. The method includes: acquiring a pair of mask samples to be processed, the pair of mask samples including a text sample and an audio sample associated with each other, and at least one of the text sample and the audio sample is masked; based on a multi-modal encoder, generating a text encoding feature of the text sample, and generating an audio encoding feature of the audio sample, a linear spectrum feature of the audio sample being fused in the text encoding feature, and a linear word feature of the text sample being fused in the audio encoding feature; and predicting masked mask information according to the text encoding feature and the audio encoding feature, and correcting the multi-modal encoder based on an accuracy of the mask information.

Patent Agency Ranking