Patent search ap:("INSTITUTE OF AUTOMATION Page CHINESE ACADEMY OF SCIENCES") AND inv:"Bo Xu"

1.

发明授权
Auditory selection method and device based on memory and attention model 有权

公开(公告)号：US10818311B2

公开(公告)日：2020-10-27

申请号：US16632373

申请日：2018-11-14

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jiaming Xu , Jing Shi , Bo Xu

IPC: G10L21/00 , G10L21/0272 , G06F17/16 , G06N3/04 , G10L25/30

Abstract: An auditory selection method based on a memory and attention model, including: step S1, encoding an original speech signal into a time-frequency matrix; step S2, encoding and transforming the time-frequency matrix to convert the matrix into a speech vector; step S3, using a long-term memory unit to store a speaker and a speech vector corresponding to the speaker; step S4, obtaining a speech vector corresponding to a target speaker, and separating a target speech from the original speech signal through an attention selection model. A storage device includes a plurality of programs stored in the storage device. The plurality of programs are configured to be loaded by a processor and execute the auditory selection method based on the memory and attention model. A processing unit includes the processor and the storage device.

2.

发明授权
Autonomous evolution intelligent dialogue method, system, and device based on a game with a physical environment 有权

公开(公告)号：US11487950B2

公开(公告)日：2022-11-01

申请号：US16641256

申请日：2019-04-19

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jiaming Xu , Yiqun Yao , Bo Xu

IPC: G06F40/35 , G06K9/62 , G06N3/04 , G06N3/08

Abstract: The method of the present disclosure includes: obtaining an image to be processed and a question text corresponding to the image; using an optimized dialogue model to encode the image into an image vector and encode the question text into a question vector; generating a state vector based on the image vector and the question vector; decoding the state vector to obtain and output an answer text. A discriminator needs to be introduced in an optimization process of the optimized dialogue model. The dialogue model and the discriminator are alternately optimized until a value of a hybrid loss function of the dialogue model and a value of a loss function of the discriminator do not decrease or fall below a preset value, thereby accomplishing the optimization process.

3.

发明授权
Target speaker separation system, device and storage medium 有权

公开(公告)号：US11978470B2

公开(公告)日：2024-05-07

申请号：US17980473

申请日：2022-11-03

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jiaming Xu , Jian Cui , Bo Xu

IPC: G10L21/0272 , G10L17/02 , G10L17/04 , G10L17/06 , G10L21/028 , H04S1/00

CPC classification number: G10L21/028 , G10L17/02 , G10L17/04 , G10L17/06 , H04S1/007

Abstract: Disclosed are a target speaker separation system, an electronic device and a storage medium. The system includes: first, performing, jointly unified modeling on a plurality of cues based a masked pre-training strategy, to boost the inference capability of a model for missing cues and enhance the representation accuracy of disturbed cues; and second, constructing a hierarchical cue modulation module. A spatial cue is introduced into a primary cue modulation module for directional enhancement of a speech of a speaker; in an intermediate cue modulation module, the speech of the speaker is enhanced on the basis of temporal coherence of a dynamic cue and an auditory signal component; a steady-state cue is introduced into an advanced cue modulation module for selective filtering; and finally, the supervised learning capability of simulation data and the unsupervised learning effect of real mixed data are sufficiently utilized.

4.

发明公开
Speech Separation Method, Electronic Device, Chip, and Computer-Readable Storage Medium 审中-公开

公开(公告)号：US20230335148A1

公开(公告)日：2023-10-19

申请号：US18026960

申请日：2021-08-24

Applicant: Huawei Technologies Co., Ltd. , INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Henghui Lu , Lei Qin , Peng Zhang , Jiaming Xu , Bo Xu

IPC: G10L21/0208 , G06V20/40 , G10L21/055

CPC classification number: G10L21/0208 , G06V20/46 , G10L21/055

Abstract: A speech separation method is provided, and relates to the field of speech. The method includes: obtaining, in a speaking process of a user, audio information including a user speech and video information including a user face; coding the audio information to obtain a mixed acoustic feature; extracting a visual semantic feature of the user from the video information; inputting the mixed acoustic feature and the visual semantic feature into a preset visual speech separation network to obtain an acoustic feature of the user; and decoding the acoustic feature of the user to obtain a speech signal of the user. An electronic device, a chip, and a computer-readable storage medium are provided.

5.

发明授权
Speech extraction method, system, and device based on supervised learning auditory attention 有权

公开(公告)号：US10923136B2

公开(公告)日：2021-02-16

申请号：US16645447

申请日：2019-04-19

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jiaming Xu , Yating Huang , Bo Xu

IPC: G10L25/30 , G10L21/0208

Abstract: A speech extraction method based on the supervised learning auditory attention includes: converting an original overlapping speech signal into a two-dimensional time-frequency signal representation by a short-time Fourier transform to obtain a first overlapping speech signal; performing a first sparsification on the first overlapping speech signal, mapping intensity information of a time-frequency unit of the first overlapping speech signal to preset D intensity levels, and performing a second sparsification on the first overlapping speech signal based on information of the preset D intensity levels to obtain a second overlapping speech signal; converting the second overlapping speech signal into a pulse signal by a time coding method; extracting a target pulse from the pulse signal by a trained target pulse extraction network; converting the target pulse into a time-frequency representation of the target speech to obtain the target speech by an inverse short-time Fourier transform.

Patent Agency Ranking