Patent search cpc:"G10L17/02" Page 1

1.

发明公开
Systems and Methods for Audio Preparation and Delivery 审中-公开

公开(公告)号：US20240321286A1

公开(公告)日：2024-09-26

申请号：US18189764

申请日：2023-03-24

Applicant: Super Hi-Fi, LLC

Inventor： Brendon Patrick Cassidy , Zack J. Zalon

IPC: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30

CPC classification number: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30

Abstract: The present application relates to systems and methods for audio preparation and delivery. Such systems and methods may involve a controller configured to carry out operations. The operations include receiving source audio comprising a vocal portion. The operations also include selecting, using a trained machine learning model, a primary voice profile based on an analysis of the vocal portion of the received source audio. The primary voice profile is selected from a plurality of predetermined voice profiles. The operations also include adjusting, based on the selected primary voice profile, at least a portion of the source audio. The operations also include providing output audio based on the adjusted portion of source audio.

2.

发明公开
Systems and Methods for Distinguishing Between Human Speech and Machine Generated Speech 审中-公开

公开(公告)号：US20240312466A1

公开(公告)日：2024-09-19

申请号：US18602835

申请日：2024-03-12

Applicant: Outbound AI Inc.

Inventor： Ronen Reouveni , Robert Piro , Jonathan Wiggs , Christina Quinn , Mohammed Soliman

IPC: G10L17/14 , G10L17/02 , G10L17/22 , G10L17/26 , H04M3/51

CPC classification number: G10L17/14 , G10L17/02 , G10L17/22 , G10L17/26 , H04M3/5166

Abstract: Systems, devices, and methods for determining whether a segment of speech was generated by a human or by a machine, such as a robotic voice that is synthesized and used as part of an IVR system. The disclosed approach can be used to assist in implementing a process to automate the detection of the start and end of a hold time during a call to a call center and in response execute a desired action.

3.

发明授权
General speech enhancement method and apparatus using multi-source auxiliary information 有权

公开(公告)号：US12094484B2

公开(公告)日：2024-09-17

申请号：US18360838

申请日：2023-07-28

Applicant: ZHEJIANG LAB

Inventor： Jingsong Li , Zhenchuan Zhang , Tianshu Zhou , Yu Tian

IPC: G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30

CPC classification number: G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30

Abstract: The present disclosure discloses a general speech enhancement method and apparatus using multi-source auxiliary information. The method includes following steps: S1: building a training data set; S2: using the training data set to learn network parameters of a model, and building a speech enhancement model; S3: building a sound source information database in a pre-collection or on-site collection mode; S4: acquiring an input of the speech enhancement model; and S5: taking a noisy original signal as a main input of the speech enhancement model, taking auxiliary sound signals of a target source group and auxiliary sound signals of an interference source group as side inputs of the speech enhancement model for speech enhancement, and obtaining an enhanced speech signal.

4.

发明公开
INFORMATION PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM 审中-公开

公开(公告)号：US20240273883A1

公开(公告)日：2024-08-15

申请号：US18644689

申请日：2024-04-24

Applicant: Panasonic Intellectual Property Corporation of America

Inventor： Shintaro OKADA , Masanari MIYAMOTO , Kousuke ITAKURA

IPC: G06V10/80 , G06V10/74 , G06V40/16 , G10L17/02 , G10L17/10

CPC classification number: G06V10/803 , G06V10/761 , G06V40/168 , G06V40/172 , G10L17/02 , G10L17/10

Abstract: An information processing device performs: acquiring a face similarity indicating a similarity between a face of a first person and a face of a second person; acquiring a voice similarity indicating a similarity between a voice of the first person and a voice of the second person; calculating an integrated similarity by integrating the face similarity and the voice similarity, and determining the integrated similarity as a final similarity when the face similarity falls within an integrated range including a threshold which is used to determine whether the first person and the second person are identical to each other, and calculating the face similarity as a final similarity when the face similarity is out of the integrated range; and outputting the final similarity.

5.

发明公开
SYSTEM AND METHOD FOR ELECTRONIC COMMUNICATION 审中-公开

公开(公告)号：US20240249723A1

公开(公告)日：2024-07-25

申请号：US18597703

申请日：2024-03-06

Applicant: Capital One Services, LLC

Inventor： Christopher CAMENARES , Joseph BOAYUE , Lee ADCOCK , Ana CRUZ , Nahid Farhady GHALATY

IPC: G10L15/26 , G06F21/30 , G06F21/31 , G06F40/253 , G06F40/289 , G10L15/18 , G10L17/02

CPC classification number: G10L15/26 , G06F40/289 , G10L15/1822 , G06F21/30 , G06F21/316 , G06F40/253 , G10L17/02

Abstract: Systems, methods, and computer-readable storage media for providing communication recommendations to users. The system receives electronic transcripts associated with a first user and generates, based on the transcripts, a communication profile of the user. The system also receives additional user transcripts associated with a plurality of additional users and generates additional communication profiles for those additional users based on the additional transcripts. The system receives a request to communicate with at least one user within the plurality of additional users regarding a specified topic, identifies a second user from within the plurality of additional users, and generates a communication initiation recommendation for the first user to communicate with the second user. The system then transmits the communication initiation recommendation to a first user computing device associated with the first user.

6.

发明授权
Audio signal processing method and apparatus, electronic device, and storage medium 有权

公开(公告)号：US12039995B2

公开(公告)日：2024-07-16

申请号：US17667370

申请日：2022-02-08

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Jun Wang , Wingyip Lam

IPC: G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L21/0208 , G10L21/0232

CPC classification number: G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L2021/02087 , G10L21/0232

Abstract: This application discloses an audio signal processing method performed by an electronic device. According to this application, embedding processing is performed on a mixed audio signal by mapping the mixed audio signal to an embedding space, to obtain an embedding feature of the mixed audio signal in the embedding space; and generalized feature extraction is performed on the embedding feature, so that a generalized feature of a target component in the mixed audio signal can be obtained through extraction. The generalized feature of the target component has good generalization capability and expression capability, and can be used for different scenarios. Audio signal processing is performed on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object, thereby improving the robustness and generalization of an audio signal processing process, and improving the accuracy of audio signal processing.

7.

发明公开
System and Method for Podcast Repetitive Content Detection 审中-公开

公开(公告)号：US20240233747A1

公开(公告)日：2024-07-11

申请号：US18405269

申请日：2024-01-05

Applicant: Gracenote, Inc.

Inventor： Amanmeet Garg , Aneesh Vartakavi

IPC: G10L25/51 , G10L17/02 , G10L17/06 , G10L25/90

CPC classification number: G10L25/51 , G10L17/02 , G10L17/06 , G10L25/90

Abstract: In one aspect, a method includes detecting a fingerprint match between query fingerprint data representing at least one audio segment within podcast content and reference fingerprint data representing known repetitive content within other podcast content, detecting a feature match between a set of audio features across multiple time-windows of the podcast content, and detecting a text match between at least one query text sentences from a transcript of the podcast content and reference text sentences, the reference text sentences comprising text sentences from the known repetitive content within the other podcast content. The method also includes responsive to the detections, generating sets of labels identifying potential repetitive content within the podcast content. The method also includes selecting, from the sets of labels, a consolidated set of labels identifying segments of repetitive content within the podcast content, and responsive to selecting the consolidated set of labels, performing an action.

8.

发明公开
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING SYSTEM 审中-公开

公开(公告)号：US20240233743A1

公开(公告)日：2024-07-11

申请号：US18561481

申请日：2022-02-25

Applicant: SONY GROUP CORPORATION

Inventor： RINA KOTANI , SHIRO SUZUKI

IPC: G10L21/0272 , G10L17/02

CPC classification number: G10L21/0272 , G10L17/02

Abstract: An information processing apparatus (100) includes a signal acquiring unit (132), a signal identification unit (133), a signal processing unit (134), and a signal transmission unit (135). The signal acquiring unit (132) acquires, from a communication terminal, at least one of a first voice signal corresponding to a voice of a preceding speaker and a second voice signal corresponding to a voice of an intervening speaker. When the signal strengths of the first voice signal and the second voice signal exceed a predetermined threshold, the signal identification unit (133) specifies an overlapping section in which the first voice signal and the second voice signal overlap, and identifies either the first voice signal or the second voice signal as a phase inversion target in the overlapping section. The signal processing unit (134) performs phase inversion processing on one voice signal identified as the phase inversion target while the overlapping section continues. The signal transmission unit (135) adds one voice signal on which the phase inversion processing has been performed and the other voice signal on which the phase inversion processing has not been performed, and transmits the resulting signal to a communication terminal (10).

9.

发明授权
Display apparatus and processing method for display apparatus with camera 有权

公开(公告)号：US12028617B2

公开(公告)日：2024-07-02

申请号：US18060210

申请日：2022-11-30

Applicant: HISENSE VISUAL TECHNOLOGY CO., LTD.

Inventor： Luming Yang , Dayong Wang , Xusheng Wang , Jin Cheng , Wenqin Yu , Le Ma , Jiayi Ding

IPC: G10L17/06 , G06T7/70 , G10L17/02 , G10L17/22 , H04N21/422 , H04N21/431 , H04N23/611 , H04N23/695 , H04R1/02 , H04R1/40 , H04R3/00

CPC classification number: H04N23/695 , G06T7/70 , G10L17/02 , G10L17/06 , G10L17/22 , H04N21/422 , H04N21/431 , H04N23/611 , H04R1/028 , H04R1/406 , H04R3/005 , G06T2207/30196 , H04R2499/15

Abstract: Disclosed are a display apparatus and a processing method for the display apparatus with a camera. The display apparatus includes a camera, a sound collector and controller. The controller is configured for: starting shooting at least one image through the camera; in response to the at least one image not including a portrait of a user, starting obtaining a first test audio signal input from the user through the sound collector; in response to the first test audio signal, determining a target azimuth corresponding to the user; generating a rotation instruction for the camera according to the target azimuth of the user; sending the rotation instruction to the camera to adjust a shooting direction of the camera to the target azimuth.

10.

发明公开
MANUAL-ENROLLMENT-FREE PERSONALIZED DENOISE 审中-公开

公开(公告)号：US20240212702A1

公开(公告)日：2024-06-27

申请号：US18088070

申请日：2022-12-23

Applicant: Zoom Video Communications, Inc.

Inventor： Jiachuan Deng , Cheng Lun Hu , Zhaofeng Jia , Qiyong Liu , Zhengwei Wei , Da-Yi Wu

IPC: G10L21/0232 , G10L17/02

CPC classification number: G10L21/0232 , G10L17/02 , G10L25/18

Abstract: Various embodiments of an apparatus, method(s), system(s) and computer program product(s) described herein are directed to a Denoise Engine. The Denoise Engine collects segments of voice content of a first user account from audio data associated with a virtual meeting. The audio data further includes additional types of audio content. The Denoise Engine identifies an audio embedding model. The Denoise Engine receives a speaker embedding generated by the audio embedding model. The speaker embedding based on the collected segments of voice content. The Denoise Engine generates personalized denoised voice content of the first user account for the virtual meeting by applying the speaker embedding to the audio data associated with a virtual meeting.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification