-
公开(公告)号:US10818311B2
公开(公告)日:2020-10-27
申请号:US16632373
申请日:2018-11-14
Inventor: Jiaming Xu , Jing Shi , Bo Xu
IPC: G10L21/00 , G10L21/0272 , G06F17/16 , G06N3/04 , G10L25/30
Abstract: An auditory selection method based on a memory and attention model, including: step S1, encoding an original speech signal into a time-frequency matrix; step S2, encoding and transforming the time-frequency matrix to convert the matrix into a speech vector; step S3, using a long-term memory unit to store a speaker and a speech vector corresponding to the speaker; step S4, obtaining a speech vector corresponding to a target speaker, and separating a target speech from the original speech signal through an attention selection model. A storage device includes a plurality of programs stored in the storage device. The plurality of programs are configured to be loaded by a processor and execute the auditory selection method based on the memory and attention model. A processing unit includes the processor and the storage device.
-
公开(公告)号:US11487950B2
公开(公告)日:2022-11-01
申请号:US16641256
申请日:2019-04-19
Inventor: Jiaming Xu , Yiqun Yao , Bo Xu
Abstract: The method of the present disclosure includes: obtaining an image to be processed and a question text corresponding to the image; using an optimized dialogue model to encode the image into an image vector and encode the question text into a question vector; generating a state vector based on the image vector and the question vector; decoding the state vector to obtain and output an answer text. A discriminator needs to be introduced in an optimization process of the optimized dialogue model. The dialogue model and the discriminator are alternately optimized until a value of a hybrid loss function of the dialogue model and a value of a loss function of the discriminator do not decrease or fall below a preset value, thereby accomplishing the optimization process.
-
公开(公告)号:US11978470B2
公开(公告)日:2024-05-07
申请号:US17980473
申请日:2022-11-03
Inventor: Jiaming Xu , Jian Cui , Bo Xu
IPC: G10L21/0272 , G10L17/02 , G10L17/04 , G10L17/06 , G10L21/028 , H04S1/00
CPC classification number: G10L21/028 , G10L17/02 , G10L17/04 , G10L17/06 , H04S1/007
Abstract: Disclosed are a target speaker separation system, an electronic device and a storage medium. The system includes: first, performing, jointly unified modeling on a plurality of cues based a masked pre-training strategy, to boost the inference capability of a model for missing cues and enhance the representation accuracy of disturbed cues; and second, constructing a hierarchical cue modulation module. A spatial cue is introduced into a primary cue modulation module for directional enhancement of a speech of a speaker; in an intermediate cue modulation module, the speech of the speaker is enhanced on the basis of temporal coherence of a dynamic cue and an auditory signal component; a steady-state cue is introduced into an advanced cue modulation module for selective filtering; and finally, the supervised learning capability of simulation data and the unsupervised learning effect of real mixed data are sufficiently utilized.
-
4.
公开(公告)号:US20230335148A1
公开(公告)日:2023-10-19
申请号:US18026960
申请日:2021-08-24
Inventor: Henghui Lu , Lei Qin , Peng Zhang , Jiaming Xu , Bo Xu
IPC: G10L21/0208 , G06V20/40 , G10L21/055
CPC classification number: G10L21/0208 , G06V20/46 , G10L21/055
Abstract: A speech separation method is provided, and relates to the field of speech. The method includes: obtaining, in a speaking process of a user, audio information including a user speech and video information including a user face; coding the audio information to obtain a mixed acoustic feature; extracting a visual semantic feature of the user from the video information; inputting the mixed acoustic feature and the visual semantic feature into a preset visual speech separation network to obtain an acoustic feature of the user; and decoding the acoustic feature of the user to obtain a speech signal of the user. An electronic device, a chip, and a computer-readable storage medium are provided.
-
5.
公开(公告)号:US10923136B2
公开(公告)日:2021-02-16
申请号:US16645447
申请日:2019-04-19
Inventor: Jiaming Xu , Yating Huang , Bo Xu
IPC: G10L25/30 , G10L21/0208
Abstract: A speech extraction method based on the supervised learning auditory attention includes: converting an original overlapping speech signal into a two-dimensional time-frequency signal representation by a short-time Fourier transform to obtain a first overlapping speech signal; performing a first sparsification on the first overlapping speech signal, mapping intensity information of a time-frequency unit of the first overlapping speech signal to preset D intensity levels, and performing a second sparsification on the first overlapping speech signal based on information of the preset D intensity levels to obtain a second overlapping speech signal; converting the second overlapping speech signal into a pulse signal by a time coding method; extracting a target pulse from the pulse signal by a trained target pulse extraction network; converting the target pulse into a time-frequency representation of the target speech to obtain the target speech by an inverse short-time Fourier transform.
-
-
-
-