专利检索 ipc:"G10L21/18" 第 3 页

21.

发明授权
Audio-visual speech separation 有权

公开(公告)号：US11456005B2

公开(公告)日：2022-09-27

申请号：US16761707

申请日：2018-11-21

申请人： GOOGLE LLC

发明人： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC分类号： G10L21/10 , G06K9/62 , G10L15/16 , G10L21/18 , G06V20/40 , G06V40/16

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

22.

发明授权
Method and device for audio signal processing, and storage medium 有权

公开(公告)号：US11289110B2

公开(公告)日：2022-03-29

申请号：US16703907

申请日：2019-12-05

申请人： AAC Technologies Pte. Ltd.

发明人： Yajun Zheng , Hanlin Deng , Xiang Lu , Yulei Zhang

IPC分类号： G10L25/48 , G06F40/205 , G10L21/16 , G06F3/16 , G06F3/0488 , G10L21/18

摘要： A method and device for audio signal processing is provided. The method includes steps of: obtaining an inputted audio signal; parsing the audio signal to obtain at least one audio feature; determining at least one vibration feature corresponding to the at least one audio feature; and generating a vibration signal corresponding to the audio signal according to the at least one vibration feature. The inputted audio signal is automatically converted into a vibration signal by the vibration feature corresponding to the audio feature of the inputted audio signal, which can avoid errors caused by manual operation and make the vibration signal possess high versatility.

23.

发明授权
Artificial intelligence based virtual agent trainer 有权

公开(公告)号：US11270081B2

公开(公告)日：2022-03-08

申请号：US16864790

申请日：2020-05-01

申请人： Accenture Global Solutions Limited

发明人： Vidya Rajagopal , Kokila Manickam , Marin Grace Mercylawrence , Gaurav Mengi

IPC分类号： G06F40/35 , G10L25/30 , G06K9/62 , G10L21/18 , G06F40/247

摘要： The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.

24.

发明授权
Information processing apparatus and information processing method for generating and processing a file including speech waveform data and vibration waveform data 有权

公开(公告)号：US11138984B2

公开(公告)日：2021-10-05

申请号：US16342125

申请日：2017-10-23

申请人： SONY CORPORATION

发明人： Takeshi Ogita , Ayumi Nakagawa , Ikuo Yamano , Yusuke Nakagawa

IPC分类号： G10L19/02 , G10L19/018 , G06F3/01 , H04M1/00 , H04N21/235 , G10L19/00 , G10L21/0272 , G10L21/16 , G10L21/18

摘要： Provided is an information processing apparatus including a file generation unit that generates a file including speech waveform data and vibration waveform data. The file generation unit cuts out waveform data in a to-be-synthesized band from first speech data, synthesizes waveform data extracted from a synthesizing band of vibration data with the to-be-synthesized band to generate second speech data, and encodes the second speech data to generate the file.

25.

发明申请
AUDIO-VISUAL SPEECH SEPARATION 审中-公开

公开(公告)号：US20200335121A1

公开(公告)日：2020-10-22

申请号：US16761707

申请日：2018-11-21

申请人： GOOGLE LLC

发明人： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC分类号： G10L21/10 , G10L21/18 , G10L15/16 , G06K9/00 , G06K9/62

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

26.

发明授权
Artificial intelligence based virtual agent trainer 有权

公开(公告)号：US10691897B1

公开(公告)日：2020-06-23

申请号：US16555539

申请日：2019-08-29

申请人： Accenture Global Solutions Limited

发明人： Vidya Rajagopal , Kokila Manickam , Marin Grace Mercylawrence , Gaurav Mengi

IPC分类号： G06F40/35 , G10L25/30 , G10L21/18 , G06K9/62 , G06F40/247

摘要： The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.

27.

发明授权
Methods and systems for speech signal processing 有权

公开(公告)号：US10460734B2

公开(公告)日：2019-10-29

申请号：US16279418

申请日：2019-02-19

申请人： Frontive, Inc.

发明人： Charles Anthony Jones , Kim Matthew Branson

IPC分类号： G10L17/22 , G10L21/18 , G10L21/10 , G10L17/26

摘要： Methods and systems for speech signal processing an interactive speech are described. Digitized audio data comprising a user query from a user is received over a network in association with a user identifier. A protocol associated with the user identifier is accessed. A personalized interaction model associated with the user identifier is accessed. A response is generated using the personalized interaction model and the protocol. The response is audibly reproduced by a voice assistance device.

28.

发明申请
METHODS AND SYSTEMS FOR PROVIDING NON-AUDITORY FEEDBACK TO USERS 审中-公开

公开(公告)号：US20180350264A1

公开(公告)日：2018-12-06

申请号：US15607804

申请日：2017-05-30

申请人： XEROX CORPORATION

发明人： Aritra Dhar , Kuldeep Yadav

IPC分类号： G09B21/00 , G10L21/18

摘要： The present disclosure discloses methods and systems for providing non-auditory feedback to users related to sensitive information. The method includes receiving one or more characters on a first computing device. Each character is encoded into a braille code, the braille code is represented by a matrix of pre-defined size. For each character, the braille code is divided into a first part and a second part. A first vibration output is provided corresponding to the first part of braille code via the first computing device and a second vibration output is provided corresponding to the second part of the braille code via a second computing device. The combination of the first vibration output and the second vibration output is sensed by a user to recognize each character of the one or more characters.

29.

发明申请
MULTIMODAL SPEECH RECOGNITION FOR REAL-TIME VIDEO AUDIO-BASED DISPLAY INDICIA APPLICATION 有权

公开(公告)号：US20170169827A1

公开(公告)日：2017-06-15

申请号：US14967726

申请日：2015-12-14

申请人： International Business Machines Corporation

发明人： Priscilla Barreira Avegliano , Carlos Henrique Cardonha , Stefany Mazon , Julio Nogima

IPC分类号： G10L15/32 , H04N21/44 , G10L21/18 , H04N21/488 , G10L15/26 , G10L21/10

CPC分类号： G10L15/32 , G10L15/26 , G10L21/10 , G10L21/18 , G10L2021/065 , H04N21/4394 , H04N21/44008 , H04N21/4884 , H04N21/84 , H04N21/8456

摘要： Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.

30.

发明授权
Method for user communication with information dialogue system 有权
标题翻译：用于与信息对话系统进行通信的方法

公开(公告)号：US09564149B2

公开(公告)日：2017-02-07

申请号：US14721012

申请日：2015-05-26

申请人： OOO “Speaktoit”

发明人： Ilya Genadevich Gelfenbeyn , Artem Goncharuk , Ilya Andreevich Platonov , Olga Aleksandrovna Gelfenbeyn , Pavel Aleksandrovich Sirotin

IPC分类号： G10L15/22 , G10L25/63 , G06F17/30 , G06F3/16 , G10L15/26 , G10L21/18 , G10L15/18

CPC分类号： G10L25/63 , G06F3/167 , G06F17/30654 , G10L15/18 , G10L15/1807 , G10L15/22 , G10L15/265 , G10L21/18 , G10L2015/223 , G10L2015/227

摘要： Provided is a method for user communications with an information dialog system, which may be used for organizing user interactions with the information dialog system based on a natural language. The method may include activating a user input subsystem in response to a user entering a request; receiving and converting the request of the user into text by the user input subsystem; sending the text obtained as a result of the conversion of the request to a dialog module; processing, by the dialog module, the text; forming, by the dialog module, the response to the request; sending the response to the user; and displaying and/or reproducing the formed response, where, after the displaying and/or the reproducing of the formed response, the user input subsystem is automatically activated upon entering a further request or a clarification request by the user.

摘要翻译： 提供了一种用于与信息对话系统进行用户通信的方法，其可以用于基于自然语言来组织与信息对话系统的用户交互。该方法可以包括响应于用户输入请求而激活用户输入子系统; 由用户输入子系统接收并将用户的请求转换为文本; 将作为将请求转换的结果获得的文本发送到对话模块; 通过对话模块处理文本; 通过对话模块形成对请求的响应; 将响应发送给用户; 以及显示和/或再现所形成的响应，其中，在形成的响应的显示和/或再现之后，用户输入子系统在输入用户的进一步请求或澄清请求时被自动激活。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类