-
公开(公告)号:US12131747B2
公开(公告)日:2024-10-29
申请号:US17283398
申请日:2019-08-23
申请人: SONY CORPORATION
发明人: Ryuichi Namba , Seiji Miyama , Yoshihiro Manabe , Yoshiaki Oikawa
IPC分类号: G10L21/00 , G10L21/0216 , H04R1/32
CPC分类号: G10L21/0216 , H04R1/326
摘要: Noise suppression performance is enhanced by performing appropriate noise suppression suitable for an environment of noise. Noise dictionary data read out from a noise database unit on the basis of installation environment information including information regarding a type of noise, and an orientation between a sound reception point and a noise source is acquired. Then, noise suppression processing is performed on a voice signal obtained by a microphone arranged at the sound reception point, using the acquired noise dictionary data.
-
公开(公告)号:US20240354499A1
公开(公告)日:2024-10-24
申请号:US18652614
申请日:2024-05-01
发明人: JEFFREY PENROD ADAMS
CPC分类号: G06F40/20 , G06F17/00 , G10L15/00 , G10L15/22 , G10L21/00 , G10L2015/223 , G10L2015/228
摘要: A device is configured with multiple applications that each respond to various commands. The correct application to receive a natural language command is identified by consideration of how well the command matches functions of the application. A target application to receive the command may additionally be selected by consideration of which application is most likely to receive a command. The likelihood of an application to receive a command may be determined by considering context. The command may be a voice input that is analyzed by speech recognition technology to determine word strings representing possible commands. Thus, the selection of a target application to receive the command may be based on any or all of the word strings from the natural language input, a closeness of fit between the command and an application, and the likelihood an application is the target for the next incoming command.
-
公开(公告)号:US12039977B2
公开(公告)日:2024-07-16
申请号:US17388347
申请日:2021-07-29
发明人: Seong Soo Yae , Seo Hwan Choi , Hyun Woo Lee
CPC分类号: G10L15/22 , H04M3/42221 , G10L2015/223
摘要: An embodiment call termination apparatus includes a call termination word existence determination device configured to determine whether a call termination word exists in a call voice of a user, and a controller configured to compare the call termination word of the user with a call termination example word previously registered in a vehicle to control whether to terminate a call.
-
公开(公告)号:US12020722B2
公开(公告)日:2024-06-25
申请号:US17210108
申请日:2021-03-23
申请人: Otter.ai, Inc.
发明人: Yun Fu , Simon Lau , Kaisuke Nakajima , Julius Cheng , Gelei Chen , Sam Song Liang , James Mason Altreuter , Kean Kheong Chin , Zhenhao Ge , Hitesh Anand Gupta , Xiaoke Huang , James Francis McAteer , Brian Francis Williams , Tao Xing
CPC分类号: G10L21/10 , G06F16/438 , G10L17/02 , G10L17/04 , G10L17/22 , H04L63/104
摘要: A system for processing and presenting a conversation includes a sensor, a processor, and a presenter. The sensor is configured to capture an audio-form conversation. The processor is configured to automatically transform the audio-form conversation into a transformed conversation. The transformed conversation includes a synchronized text, wherein the synchronized text is synchronized with the audio-form conversation. The presenter is configured to present the transformed conversation including the synchronized text and the audio-form conversation. The presenter is further configured to present the transformed conversation to be navigable, searchable, assignable, editable, and shareable.
-
公开(公告)号:US12014738B2
公开(公告)日:2024-06-18
申请号:US18144713
申请日:2023-05-08
申请人: GOOGLE LLC
CPC分类号: G10L15/22 , G10L15/26 , H04S7/301 , G10L2015/223 , G10L2015/228
摘要: Techniques described herein are directed to arbitrating between multiple potentially-responsive, automated-assistant capable electronic devices to determine which should respond to the user's utterance, and/or which should defer to other electronic device(s). In various implementations, a spoken utterance of a user may be detected at a microphone of a first electronic device, a spoken utterance provided by a user. Sound(s) emitted by additional electronic device(s) may also be detected at the microphone. Each of the sound(s) may encode a timestamp corresponding to detection of the spoken utterance at a respective electronic device. Timestamp(s) may be extracted from the sound(s) and compared to a local timestamp corresponding to detection of the spoken utterance at the first electronic device. Based on the comparison, the first electronic device may either invoke an automated assistant locally or defer to one of the additional electronic devices.
-
公开(公告)号:US11978478B2
公开(公告)日:2024-05-07
申请号:US18182811
申请日:2023-03-13
IPC分类号: G10L25/87 , G10L15/00 , G10L21/00 , G10L21/0216 , G10L25/78
CPC分类号: G10L25/87 , G10L15/00 , G10L2021/02166 , G10L25/78
摘要: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.
-
公开(公告)号:US11977841B2
公开(公告)日:2024-05-07
申请号:US17645641
申请日:2021-12-22
发明人: Jeremy A. Geiman , Kongkuo Lu , Ron Papka
IPC分类号: G10L15/00 , G06F40/279 , G06F40/30 , G10L21/00
CPC分类号: G06F40/279
摘要: An apparatus includes a display device that displays an input document in a user interface and at least one processor configured to receive a command to determine a document type of the input document and classify the input document to assign at least one document type and a respective confidence score. The processor assigns a significance score to each word of the input document that is indicative of a degree of influence the word has in deciding that the input document is of the at least one document type. The processor determines a level of visual emphasis to be placed on each word of the input document based on the significance score of the word and displays the input document on the display device with each word of the input document visually emphasized in accordance with the determined level of visual emphasis of the word.
-
公开(公告)号:US20240137468A1
公开(公告)日:2024-04-25
申请号:US18403045
申请日:2024-01-03
发明人: Nils B. Lahr , Garrick C. Barr
IPC分类号: H04N7/173 , G06F3/0482 , G06F3/04842 , G06F16/182 , G06F16/583 , G06F16/78 , G06F16/783 , G06F16/955 , G06F16/958 , G06F18/22 , G06T1/00 , G06T7/215 , G06V20/40 , G10L21/00 , H04L67/02 , H04L67/06 , H04L67/10 , H04N21/222 , H04N21/239 , H04N21/431 , H04N21/432 , H04N21/433 , H04N21/442 , H04N21/4627 , H04N21/472 , H04N21/4782 , H04N21/81 , H04N21/8358 , H04N21/84
CPC分类号: H04N7/17318 , G06F3/0482 , G06F3/04842 , G06F16/183 , G06F16/5838 , G06F16/78 , G06F16/7847 , G06F16/7867 , G06F16/9562 , G06F16/958 , G06F18/22 , G06T1/0021 , G06T7/215 , G06V20/40 , G06V20/48 , G10L21/00 , H04L67/02 , H04L67/06 , H04L67/10 , H04N21/2223 , H04N21/2393 , H04N21/4312 , H04N21/4325 , H04N21/4334 , H04N21/44236 , H04N21/4627 , H04N21/47214 , H04N21/4782 , H04N21/812 , H04N21/8173 , H04N21/8358 , H04N21/84 , G06T2207/10016
摘要: Systems and methods for replacing original media bookmarks of at least a portion of a digital media file with replacement bookmarks is described. A media fingerprint engine detects the location of the original fingerprints associated with the portion of the digital media file and a region analysis algorithm characterizes regions of media file spanning the location of the original bookmarks by data class types. The replacement bookmarks are associated with the data class types and are overwritten or otherwise are substituted for the original bookmarks. The replacement bookmarks then are subjected to a fingerprint matching algorithm that incorporates media timeline and media related metadata.
-
公开(公告)号:US11902711B2
公开(公告)日:2024-02-13
申请号:US17884964
申请日:2022-08-10
发明人: Nils B. Lahr , Garrick C. Barr
IPC分类号: G06K9/00 , H04N7/173 , G06F16/78 , G06F16/182 , G06F16/958 , G06F16/583 , G06F16/783 , G06F16/955 , H04N21/432 , H04N21/433 , H04N21/442 , H04N21/472 , H04N21/81 , H04N21/8358 , H04N21/84 , G06T7/215 , G06V20/40 , G06F18/22 , G06F3/04842 , H04L67/02 , G06T1/00 , G06F3/0482 , H04L67/06 , H04L67/10 , G10L21/00 , H04N21/222 , H04N21/239 , H04N21/431 , H04N21/4627 , H04N21/4782
CPC分类号: H04N7/17318 , G06F3/0482 , G06F3/04842 , G06F16/183 , G06F16/5838 , G06F16/78 , G06F16/7847 , G06F16/7867 , G06F16/958 , G06F16/9562 , G06F18/22 , G06T1/0021 , G06T7/215 , G06V20/40 , G06V20/48 , G10L21/00 , H04L67/02 , H04L67/06 , H04L67/10 , H04N21/2223 , H04N21/2393 , H04N21/4312 , H04N21/4325 , H04N21/4334 , H04N21/44236 , H04N21/4627 , H04N21/4782 , H04N21/47214 , H04N21/812 , H04N21/8173 , H04N21/8358 , H04N21/84 , G06T2207/10016
摘要: Systems and methods for replacing original media bookmarks of at least a portion of a digital media file with replacement bookmarks is described. A media fingerprint engine detects the location of the original fingerprints associated with the portion of the digital media file and a region analysis algorithm characterizes regions of media file spanning the location of the original bookmarks by data class types. The replacement bookmarks are associated with the data class types and are overwritten or otherwise are substituted for the original bookmarks. The replacement bookmarks then are subjected to a fingerprint matching algorithm that incorporates media timeline and media related metadata.
-
公开(公告)号:US11894008B2
公开(公告)日:2024-02-06
申请号:US16769122
申请日:2018-11-28
申请人: SONY CORPORATION
发明人: Naoya Takahashi
IPC分类号: G10L21/00 , G10L25/00 , G10L21/007 , G10L21/028 , G10L21/013
CPC分类号: G10L21/007 , G10L21/013 , G10L21/028
摘要: Provided is a signal processing apparatus that includes a voice quality conversion unit that converts acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data.
-
-
-
-
-
-
-
-
-