Audio-visual speech separation
    21.
    发明授权

    公开(公告)号:US11456005B2

    公开(公告)日:2022-09-27

    申请号:US16761707

    申请日:2018-11-21

    申请人: GOOGLE LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

    Method and device for audio signal processing, and storage medium

    公开(公告)号:US11289110B2

    公开(公告)日:2022-03-29

    申请号:US16703907

    申请日:2019-12-05

    摘要: A method and device for audio signal processing is provided. The method includes steps of: obtaining an inputted audio signal; parsing the audio signal to obtain at least one audio feature; determining at least one vibration feature corresponding to the at least one audio feature; and generating a vibration signal corresponding to the audio signal according to the at least one vibration feature. The inputted audio signal is automatically converted into a vibration signal by the vibration feature corresponding to the audio feature of the inputted audio signal, which can avoid errors caused by manual operation and make the vibration signal possess high versatility.

    Artificial intelligence based virtual agent trainer

    公开(公告)号:US11270081B2

    公开(公告)日:2022-03-08

    申请号:US16864790

    申请日:2020-05-01

    摘要: The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.

    AUDIO-VISUAL SPEECH SEPARATION
    25.
    发明申请

    公开(公告)号:US20200335121A1

    公开(公告)日:2020-10-22

    申请号:US16761707

    申请日:2018-11-21

    申请人: GOOGLE LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

    Artificial intelligence based virtual agent trainer

    公开(公告)号:US10691897B1

    公开(公告)日:2020-06-23

    申请号:US16555539

    申请日:2019-08-29

    摘要: The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.

    Methods and systems for speech signal processing

    公开(公告)号:US10460734B2

    公开(公告)日:2019-10-29

    申请号:US16279418

    申请日:2019-02-19

    申请人: Frontive, Inc.

    摘要: Methods and systems for speech signal processing an interactive speech are described. Digitized audio data comprising a user query from a user is received over a network in association with a user identifier. A protocol associated with the user identifier is accessed. A personalized interaction model associated with the user identifier is accessed. A response is generated using the personalized interaction model and the protocol. The response is audibly reproduced by a voice assistance device.

    METHODS AND SYSTEMS FOR PROVIDING NON-AUDITORY FEEDBACK TO USERS

    公开(公告)号:US20180350264A1

    公开(公告)日:2018-12-06

    申请号:US15607804

    申请日:2017-05-30

    申请人: XEROX CORPORATION

    IPC分类号: G09B21/00 G10L21/18

    摘要: The present disclosure discloses methods and systems for providing non-auditory feedback to users related to sensitive information. The method includes receiving one or more characters on a first computing device. Each character is encoded into a braille code, the braille code is represented by a matrix of pre-defined size. For each character, the braille code is divided into a first part and a second part. A first vibration output is provided corresponding to the first part of braille code via the first computing device and a second vibration output is provided corresponding to the second part of the braille code via a second computing device. The combination of the first vibration output and the second vibration output is sensed by a user to recognize each character of the one or more characters.

    Method for user communication with information dialogue system
    30.
    发明授权
    Method for user communication with information dialogue system 有权
    用于与信息对话系统进行通信的方法

    公开(公告)号:US09564149B2

    公开(公告)日:2017-02-07

    申请号:US14721012

    申请日:2015-05-26

    摘要: Provided is a method for user communications with an information dialog system, which may be used for organizing user interactions with the information dialog system based on a natural language. The method may include activating a user input subsystem in response to a user entering a request; receiving and converting the request of the user into text by the user input subsystem; sending the text obtained as a result of the conversion of the request to a dialog module; processing, by the dialog module, the text; forming, by the dialog module, the response to the request; sending the response to the user; and displaying and/or reproducing the formed response, where, after the displaying and/or the reproducing of the formed response, the user input subsystem is automatically activated upon entering a further request or a clarification request by the user.

    摘要翻译: 提供了一种用于与信息对话系统进行用户通信的方法,其可以用于基于自然语言来组织与信息对话系统的用户交互。 该方法可以包括响应于用户输入请求而激活用户输入子系统; 由用户输入子系统接收并将用户的请求转换为文本; 将作为将请求转换的结果获得的文本发送到对话模块; 通过对话模块处理文本; 通过对话模块形成对请求的响应; 将响应发送给用户; 以及显示和/或再现所形成的响应,其中,在形成的响应的显示和/或再现之后,用户输入子系统在输入用户的进一步请求或澄清请求时被自动激活。