-
公开(公告)号:US11456005B2
公开(公告)日:2022-09-27
申请号:US16761707
申请日:2018-11-21
申请人: GOOGLE LLC
发明人: Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.
-
公开(公告)号:US11289110B2
公开(公告)日:2022-03-29
申请号:US16703907
申请日:2019-12-05
发明人: Yajun Zheng , Hanlin Deng , Xiang Lu , Yulei Zhang
IPC分类号: G10L25/48 , G06F40/205 , G10L21/16 , G06F3/16 , G06F3/0488 , G10L21/18
摘要: A method and device for audio signal processing is provided. The method includes steps of: obtaining an inputted audio signal; parsing the audio signal to obtain at least one audio feature; determining at least one vibration feature corresponding to the at least one audio feature; and generating a vibration signal corresponding to the audio signal according to the at least one vibration feature. The inputted audio signal is automatically converted into a vibration signal by the vibration feature corresponding to the audio feature of the inputted audio signal, which can avoid errors caused by manual operation and make the vibration signal possess high versatility.
-
公开(公告)号:US11270081B2
公开(公告)日:2022-03-08
申请号:US16864790
申请日:2020-05-01
IPC分类号: G06F40/35 , G10L25/30 , G06K9/62 , G10L21/18 , G06F40/247
摘要: The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.
-
公开(公告)号:US11138984B2
公开(公告)日:2021-10-05
申请号:US16342125
申请日:2017-10-23
申请人: SONY CORPORATION
发明人: Takeshi Ogita , Ayumi Nakagawa , Ikuo Yamano , Yusuke Nakagawa
IPC分类号: G10L19/02 , G10L19/018 , G06F3/01 , H04M1/00 , H04N21/235 , G10L19/00 , G10L21/0272 , G10L21/16 , G10L21/18
摘要: Provided is an information processing apparatus including a file generation unit that generates a file including speech waveform data and vibration waveform data. The file generation unit cuts out waveform data in a to-be-synthesized band from first speech data, synthesizes waveform data extracted from a synthesizing band of vibration data with the to-be-synthesized band to generate second speech data, and encodes the second speech data to generate the file.
-
公开(公告)号:US20200335121A1
公开(公告)日:2020-10-22
申请号:US16761707
申请日:2018-11-21
申请人: GOOGLE LLC
发明人: Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.
-
公开(公告)号:US10691897B1
公开(公告)日:2020-06-23
申请号:US16555539
申请日:2019-08-29
IPC分类号: G06F40/35 , G10L25/30 , G10L21/18 , G06K9/62 , G06F40/247
摘要: The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.
-
公开(公告)号:US10460734B2
公开(公告)日:2019-10-29
申请号:US16279418
申请日:2019-02-19
申请人: Frontive, Inc.
摘要: Methods and systems for speech signal processing an interactive speech are described. Digitized audio data comprising a user query from a user is received over a network in association with a user identifier. A protocol associated with the user identifier is accessed. A personalized interaction model associated with the user identifier is accessed. A response is generated using the personalized interaction model and the protocol. The response is audibly reproduced by a voice assistance device.
-
公开(公告)号:US20180350264A1
公开(公告)日:2018-12-06
申请号:US15607804
申请日:2017-05-30
申请人: XEROX CORPORATION
发明人: Aritra Dhar , Kuldeep Yadav
摘要: The present disclosure discloses methods and systems for providing non-auditory feedback to users related to sensitive information. The method includes receiving one or more characters on a first computing device. Each character is encoded into a braille code, the braille code is represented by a matrix of pre-defined size. For each character, the braille code is divided into a first part and a second part. A first vibration output is provided corresponding to the first part of braille code via the first computing device and a second vibration output is provided corresponding to the second part of the braille code via a second computing device. The combination of the first vibration output and the second vibration output is sensed by a user to recognize each character of the one or more characters.
-
29.
公开(公告)号:US20170169827A1
公开(公告)日:2017-06-15
申请号:US14967726
申请日:2015-12-14
CPC分类号: G10L15/32 , G10L15/26 , G10L21/10 , G10L21/18 , G10L2021/065 , H04N21/4394 , H04N21/44008 , H04N21/4884 , H04N21/84 , H04N21/8456
摘要: Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.
-
公开(公告)号:US09564149B2
公开(公告)日:2017-02-07
申请号:US14721012
申请日:2015-05-26
申请人: OOO “Speaktoit”
发明人: Ilya Genadevich Gelfenbeyn , Artem Goncharuk , Ilya Andreevich Platonov , Olga Aleksandrovna Gelfenbeyn , Pavel Aleksandrovich Sirotin
CPC分类号: G10L25/63 , G06F3/167 , G06F17/30654 , G10L15/18 , G10L15/1807 , G10L15/22 , G10L15/265 , G10L21/18 , G10L2015/223 , G10L2015/227
摘要: Provided is a method for user communications with an information dialog system, which may be used for organizing user interactions with the information dialog system based on a natural language. The method may include activating a user input subsystem in response to a user entering a request; receiving and converting the request of the user into text by the user input subsystem; sending the text obtained as a result of the conversion of the request to a dialog module; processing, by the dialog module, the text; forming, by the dialog module, the response to the request; sending the response to the user; and displaying and/or reproducing the formed response, where, after the displaying and/or the reproducing of the formed response, the user input subsystem is automatically activated upon entering a further request or a clarification request by the user.
摘要翻译: 提供了一种用于与信息对话系统进行用户通信的方法,其可以用于基于自然语言来组织与信息对话系统的用户交互。 该方法可以包括响应于用户输入请求而激活用户输入子系统; 由用户输入子系统接收并将用户的请求转换为文本; 将作为将请求转换的结果获得的文本发送到对话模块; 通过对话模块处理文本; 通过对话模块形成对请求的响应; 将响应发送给用户; 以及显示和/或再现所形成的响应,其中,在形成的响应的显示和/或再现之后,用户输入子系统在输入用户的进一步请求或澄清请求时被自动激活。
-
-
-
-
-
-
-
-
-