专利检索 ipc:"G10L21/18" 第 1 页

1.

发明申请
TRANSCRIPTION SUMMARY PRESENTATION 有权

公开(公告)号：US20220076681A1

公开(公告)日：2022-03-10

申请号：US17526757

申请日：2021-11-15

申请人： Sorenson IP Holdings, LLC

发明人： Scott Boekweg , David Thomson

IPC分类号： G10L15/26 , G10L21/18 , G10L15/30

摘要： A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.

2.

发明申请
ARTIFICIAL INTELLIGENCE BASED VIRTUAL AGENT TRAINER 有权

公开(公告)号：US20210064826A1

公开(公告)日：2021-03-04

申请号：US16864790

申请日：2020-05-01

申请人： Accenture Global Solutions Limited

发明人： Vidya RAJAGOPAL , Kokila MANICKAM , Marin GRACE MERCYLAWRENCE , Gaurav MENGI

IPC分类号： G06F40/35 , G10L21/18 , G06K9/62 , G06F40/247 , G10L25/30

摘要： The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.

3.

发明申请
Audio Channel Monitoring by Voice to Keyword Matching with Notification 审中-公开

公开(公告)号：US20190005978A1

公开(公告)日：2019-01-03

申请号：US15639738

申请日：2017-06-30

申请人： Richard E. Barnett , Terence Sean Sullivan

发明人： Richard E. Barnett , Terence Sean Sullivan

IPC分类号： G10L21/10 , G06F3/16 , G10L15/08 , G10L25/51 , G10L21/18 , G10L15/30

摘要： Systems and methods of monitoring communications channels and automatically providing selective notifications through a network that messages containing useful information, transmitted in the form of voice content, have been received. Keywords are compared with textual data transcribed from voice messages receive on a channel. The textual data and the keywords are compared, and upon identifying a correlation therebetween, a notification is automatically generated that indicates receipt of a given message, the existence of the correlation with the keywords, and an identity of the channel, so that client terminals can receive the message and also receive subsequent or related messages.

4.

发明授权
Multimodal speech recognition for real-time video audio-based display indicia application 有权

公开(公告)号：US09959872B2

公开(公告)日：2018-05-01

申请号：US14967726

申请日：2015-12-14

申请人： International Business Machines Corporation

发明人： Priscilla Barreira Avegliano , Carlos Henrique Cardonha , Stefany Mazon , Julio Nogima

IPC分类号： G10L15/00 , G10L15/26 , G10L15/32 , G10L21/10 , G10L21/18 , H04N21/488 , H04N21/44 , H04N21/439 , H04N21/84 , H04N21/845 , G10L21/06

CPC分类号： G10L15/32 , G10L15/26 , G10L21/10 , G10L21/18 , G10L2021/065 , H04N21/4394 , H04N21/44008 , H04N21/4884 , H04N21/84 , H04N21/8456

摘要： Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.

5.

发明申请
TERMINAL AND SERVER OF SPEAKER-ADAPTATION SPEECH-RECOGNITION SYSTEM AND METHOD FOR OPERATING THE SYSTEM 有权
标题翻译：语音识别系统的终端和服务器及操作系统的方法

公开(公告)号：US20150371634A1

公开(公告)日：2015-12-24

申请号：US14709359

申请日：2015-05-11

申请人： ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

发明人： Dong Hyun KIM

IPC分类号： G10L15/22 , G10L21/18 , G10L21/10 , G10L15/02 , G10L15/14

CPC分类号： G10L15/07 , G10L15/30 , G10L2015/221

摘要： Provided are a terminal and server of a speaker-adaptation speech-recognition system and a method for operating the system. The terminal in the speaker-adaptation speech-recognition system includes a speech recorder which transmits speech data of a speaker to a speech-recognition server, a statistical variable accumulator which receives a statistical variable including acoustic statistical information about speech of the speaker from the speech-recognition server which recognizes the transmitted speech data, and accumulates the received statistical variable, a conversion parameter generator which generates a conversion parameter about the speech of the speaker using the accumulated statistical variable and transmits the generated conversion parameter to the speech-recognition server, and a result displaying user interface which receives and displays result data when the speech-recognition server recognizes the speech data of the speaker using the transmitted conversion parameter and transmits the recognized result data.

摘要翻译： 提供了一种扬声器适配语音识别系统的终端和服务器以及用于操作该系统的方法。扬声器适配语音识别系统中的终端包括将语音数据发送到语音识别服务器的语音记录器，统计变量累加器，其从语音接收包括关于说话者的语音的声学统计信息识别所发送的语音数据并累加接收到的统计变量，转换参数生成器，其使用累积的统计变量生成关于说话者的语音的转换参数，并将生成的转换参数发送到语音识别服务器，并且显示用户界面的结果，其在语音识别服务器使用所发送的转换参数识别说话者的语音数据时接收并显示结果数据，并发送所识别的结果数据。

6.

发明授权
Systems and methods of rendering a textual animation 有权
标题翻译：渲染文本动画的系统和方法

公开(公告)号：US09159338B2

公开(公告)日：2015-10-13

申请号：US12960165

申请日：2010-12-03

申请人： Rahul Powar , Avery Li-Chun Wang

发明人： Rahul Powar , Avery Li-Chun Wang

IPC分类号： G10L21/06 , G10L21/18 , G06F17/30 , G10H1/36 , G11B27/10 , G11B27/11 , G11B27/28 , G10L15/26 , G10L21/10

CPC分类号： G10L21/06 , G06F17/30769 , G10H1/368 , G10H2220/011 , G10H2240/251 , G10L15/26 , G10L21/10 , G10L21/18 , G11B27/10 , G11B27/11 , G11B27/28

摘要： Systems and methods of rendering a textual animation are provided. The methods include receiving an audio sample of an audio signal that is being rendered by a media rendering source. The methods also include receiving one or more descriptors for the audio signal based on at least one of a semantic vector, an audio vector, and an emotion vector. Based on the one or more descriptors, a client device may render the textual transcriptions of vocal elements of the audio signal in an animated manner. The client device may further render the textual transcriptions of the vocal elements of the audio signal to be substantially in synchrony to the audio signal being rendered by the media rendering source. In addition, the client device may further receive an identification of a song corresponding to the audio sample, and may render lyrics of the song in an animated manner.

摘要翻译： 提供了呈现文本动画的系统和方法。所述方法包括接收由媒体呈现源呈现的音频信号的音频样本。所述方法还包括基于语义向量，音频向量和情感向量中的至少一个来接收音频信号的一个或多个描述符。基于一个或多个描述符，客户端设备可以以动画的方式呈现音频信号的声音元素的文本转录。客户端设备还可以使音频信号的声音元素的文本转录基本上与由媒体渲染源呈现的音频信号同步。此外，客户端设备还可以接收与音频样本相对应的歌曲的标识，并且可以以动画的方式呈现歌曲的歌词。

7.

发明申请
Voice-Activated Signal Generator 有权
标题翻译：语音激活信号发生器

公开(公告)号：US20140142949A1

公开(公告)日：2014-05-22

申请号：US13678951

申请日：2012-11-16

申请人： David Edward Newman

发明人： David Edward Newman

IPC分类号： G10L21/18

CPC分类号： G10L15/26 , G10L25/93 , G10L2015/223

摘要： A voice-activated signal generator is a device to produce output signals responsive to spoken commands. The device accepts only predetermined commands and responsively generates specific output signals such as a pulse, a series of pulses, a voltage level, or a periodic waveform. The device is suitable for triggering an oscilloscope, or controlling a circuit under test, or activating another instrument. The invention also enables safely controlling a hazardous system such as a high voltage system, hands-free and with precise timing determined by the user. Also disclosed are fast, compact, robust algorithms for analyzing spoken commands, and particularly for detecting voiced and unvoiced sound, and for identifying commands by comparing the order of sound intervals in the spoken command to templates that represent the predetermined commands. The device may have one output or multiple outputs in parallel, all controlled by voice commands with precision output timing.

摘要翻译： 语音激活信号发生器是响应于语音命令产生输出信号的装置。器件只接受预定的命令，并且响应地产生特定的输出信号，例如脉冲，一系列脉冲，电压电平或周期波形。该器件适用于触发示波器，或控制被测电路，或激活另一台仪器。本发明还能够安全地控制诸如高压系统的危险系统，免提并且由用户确定的精确定时。还公开了用于分析语音命令，特别是用于检测有声和无声的声音的快速，紧凑，鲁棒的算法，以及通过将口语命令中的声音间隔的顺序与表示预定命令的模板进行比较来识别命令。该设备可以具有并行的一个输出或多个输出，全部由具有精确输出定时的语音命令控制。

8.

发明申请
ESTIMATING CONGNITIVE-LOAD IN HUMAN-MACHINE INTERACTION 审中-公开
标题翻译：估计人机交互中的约束力

公开(公告)号：US20130325482A1

公开(公告)日：2013-12-05

申请号：US13761541

申请日：2013-02-07

申请人： GM GLOBAL TECHNOLOGY OPERATIONS LLC

发明人： Eli Tzirkel-Hancock , Omer Tsimhoni

IPC分类号： G10L21/18

CPC分类号： G10L21/18 , G10L15/22

摘要： Estimating cognitive-load of a user in human-machine interaction by identifying an expression of cognitive-load within a user expression captured by a dialogue system and using a user model to estimate a level of the cognitive-load based on the expression of cognitive-load.

摘要翻译： 通过识别由对话系统捕获的用户表达中的认知负荷的表达，并且使用用户模型基于认知负荷的表达来估计认知负荷的水平来估计用户在人机交互中的认知负荷，加载。

9.

发明申请
FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES 有权

公开(公告)号：US20230071778A1

公开(公告)日：2023-03-09

申请号：US17446877

申请日：2021-09-03

申请人： Google LLC

发明人： Ruofei Du , Hendrik Wagenaar , Alex Olwal

IPC分类号： G10L21/10 , G06K9/00 , G06F3/01 , H04R3/00 , H04R1/40 , G06T7/70 , G10L15/26 , G10L15/22 , H04R1/08 , G10L15/25 , G10L21/18 , G06T7/50

摘要： An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.

10.

发明申请
CONCURRENT MULTI-PATH PROCESSING OF AUDIO SIGNALS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS 有权

公开(公告)号：US20220139368A1

公开(公告)日：2022-05-05

申请号：US17433868

申请日：2019-02-28

申请人： BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

发明人： Yi ZHANG , Hui SONG , Yongtao SHA , Chengyun DENG

IPC分类号： G10L15/00 , G10L21/18 , G10L19/02 , G10L25/18

摘要： A system and method for concurrent multi-path processing of audio signals for automatic speech recognition is presented. Audio information defining a set of audio signals may be obtained (502). The audio signals may convey mixed audio content produced by multiple audio sources. A set of source-specific audio signals may be determined by demixing the mixed audio content produced by the multiple audio sources. Determining the set of source-specific audio signals may comprises providing the set of audio signals to both a first signal processing path and a second signal processing path (504). The first signal processing path may determine a value of a demixing parameter for demixing the mixed audio content (506). The second signal processing path may apply the value of the demixing parameter to the individual audio signals of the set of audio signals (508) to generate the individual source-specific audio signals (510).

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类