Patent search ap:("UBTECH ROBOTICS CORP LTD") AND inv:"Xianjie Yang" Page 1

1.

发明申请
SOUND SOURCE LOCALIZATION METHOD, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM 有权

公开(公告)号：US20250133337A1

公开(公告)日：2025-04-24

申请号：US18911197

申请日：2024-10-09

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： ZEHONG ZHENG , Dongyan Huang , Xianjie Yang , Wan Ding

IPC: H04R1/40

Abstract: A sound source localization method includes: obtaining a first audio frame and at least two second audio frames, wherein the first audio frame and the at least two second audio frames are synchronously sampled, the first audio frame is obtained by processing sound signals collected by the first microphone, the at least two second audio frames are obtained by processing sound signals collected by the second microphones; calculating a time delay estimation between the first audio frame and each of the at least two second audio frames; and determining a sound source orientation corresponding to the first audio frame and the at least two second audio frames through a preset time delay-orientation lookup table according to the time delay estimation between the first audio frame and each of the at least two second audio frames.

2.

发明申请
METHOD AND DEVICE FOR SYNTHESIZING TALKING HEAD VIDEO AND COMPUTER-READABLE STORAGE MEDIUM 有权

公开(公告)号：US20240428493A1

公开(公告)日：2024-12-26

申请号：US18736552

申请日：2024-06-07

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： WAN DING , Dongyan Huang , Xianjie Yang , Zehong Zheng , Penghul Li

IPC: G06T13/40 , G06T7/73 , G06V10/44 , G06V40/16 , G06V40/20 , G10L15/02

Abstract: A method for synthesizing a talking head video includes: obtaining speech data to be synthesized and observation data, wherein the observation data is data obtained through observation other than the speech data; performing feature extraction on the speech data to obtain speech features corresponding to the speech data, and performing feature extraction on the observation data to obtain non-speech features corresponding to the observation data; performing temporal modeling on the speech features and first non-speech features to obtain low-dimensional representations, wherein the first non-speech features are non-speech features that are sensitive to temporal changes; and performing video synthesis based on the low-dimensional representations and second non-speech features, wherein the second non-speech features are non-speech features insensitive to temporal changes.

Patent Agency Ranking