-
公开(公告)号:US20240321286A1
公开(公告)日:2024-09-26
申请号:US18189764
申请日:2023-03-24
Applicant: Super Hi-Fi, LLC
Inventor: Brendon Patrick Cassidy , Zack J. Zalon
IPC: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30
CPC classification number: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30
Abstract: The present application relates to systems and methods for audio preparation and delivery. Such systems and methods may involve a controller configured to carry out operations. The operations include receiving source audio comprising a vocal portion. The operations also include selecting, using a trained machine learning model, a primary voice profile based on an analysis of the vocal portion of the received source audio. The primary voice profile is selected from a plurality of predetermined voice profiles. The operations also include adjusting, based on the selected primary voice profile, at least a portion of the source audio. The operations also include providing output audio based on the adjusted portion of source audio.
-
2.
公开(公告)号:US20240312466A1
公开(公告)日:2024-09-19
申请号:US18602835
申请日:2024-03-12
Applicant: Outbound AI Inc.
Inventor: Ronen Reouveni , Robert Piro , Jonathan Wiggs , Christina Quinn , Mohammed Soliman
CPC classification number: G10L17/14 , G10L17/02 , G10L17/22 , G10L17/26 , H04M3/5166
Abstract: Systems, devices, and methods for determining whether a segment of speech was generated by a human or by a machine, such as a robotic voice that is synthesized and used as part of an IVR system. The disclosed approach can be used to assist in implementing a process to automate the detection of the start and end of a hold time during a call to a call center and in response execute a desired action.
-
公开(公告)号:US12094484B2
公开(公告)日:2024-09-17
申请号:US18360838
申请日:2023-07-28
Applicant: ZHEJIANG LAB
Inventor: Jingsong Li , Zhenchuan Zhang , Tianshu Zhou , Yu Tian
IPC: G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30
CPC classification number: G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30
Abstract: The present disclosure discloses a general speech enhancement method and apparatus using multi-source auxiliary information. The method includes following steps: S1: building a training data set; S2: using the training data set to learn network parameters of a model, and building a speech enhancement model; S3: building a sound source information database in a pre-collection or on-site collection mode; S4: acquiring an input of the speech enhancement model; and S5: taking a noisy original signal as a main input of the speech enhancement model, taking auxiliary sound signals of a target source group and auxiliary sound signals of an interference source group as side inputs of the speech enhancement model for speech enhancement, and obtaining an enhanced speech signal.
-
4.
公开(公告)号:US20240273883A1
公开(公告)日:2024-08-15
申请号:US18644689
申请日:2024-04-24
Inventor: Shintaro OKADA , Masanari MIYAMOTO , Kousuke ITAKURA
CPC classification number: G06V10/803 , G06V10/761 , G06V40/168 , G06V40/172 , G10L17/02 , G10L17/10
Abstract: An information processing device performs: acquiring a face similarity indicating a similarity between a face of a first person and a face of a second person; acquiring a voice similarity indicating a similarity between a voice of the first person and a voice of the second person; calculating an integrated similarity by integrating the face similarity and the voice similarity, and determining the integrated similarity as a final similarity when the face similarity falls within an integrated range including a threshold which is used to determine whether the first person and the second person are identical to each other, and calculating the face similarity as a final similarity when the face similarity is out of the integrated range; and outputting the final similarity.
-
公开(公告)号:US20240249723A1
公开(公告)日:2024-07-25
申请号:US18597703
申请日:2024-03-06
Applicant: Capital One Services, LLC
Inventor: Christopher CAMENARES , Joseph BOAYUE , Lee ADCOCK , Ana CRUZ , Nahid Farhady GHALATY
IPC: G10L15/26 , G06F21/30 , G06F21/31 , G06F40/253 , G06F40/289 , G10L15/18 , G10L17/02
CPC classification number: G10L15/26 , G06F40/289 , G10L15/1822 , G06F21/30 , G06F21/316 , G06F40/253 , G10L17/02
Abstract: Systems, methods, and computer-readable storage media for providing communication recommendations to users. The system receives electronic transcripts associated with a first user and generates, based on the transcripts, a communication profile of the user. The system also receives additional user transcripts associated with a plurality of additional users and generates additional communication profiles for those additional users based on the additional transcripts. The system receives a request to communicate with at least one user within the plurality of additional users regarding a specified topic, identifies a second user from within the plurality of additional users, and generates a communication initiation recommendation for the first user to communicate with the second user. The system then transmits the communication initiation recommendation to a first user computing device associated with the first user.
-
公开(公告)号:US12039995B2
公开(公告)日:2024-07-16
申请号:US17667370
申请日:2022-02-08
Inventor: Jun Wang , Wingyip Lam
IPC: G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L21/0208 , G10L21/0232
CPC classification number: G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L2021/02087 , G10L21/0232
Abstract: This application discloses an audio signal processing method performed by an electronic device. According to this application, embedding processing is performed on a mixed audio signal by mapping the mixed audio signal to an embedding space, to obtain an embedding feature of the mixed audio signal in the embedding space; and generalized feature extraction is performed on the embedding feature, so that a generalized feature of a target component in the mixed audio signal can be obtained through extraction. The generalized feature of the target component has good generalization capability and expression capability, and can be used for different scenarios. Audio signal processing is performed on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object, thereby improving the robustness and generalization of an audio signal processing process, and improving the accuracy of audio signal processing.
-
公开(公告)号:US20240233747A1
公开(公告)日:2024-07-11
申请号:US18405269
申请日:2024-01-05
Applicant: Gracenote, Inc.
Inventor: Amanmeet Garg , Aneesh Vartakavi
Abstract: In one aspect, a method includes detecting a fingerprint match between query fingerprint data representing at least one audio segment within podcast content and reference fingerprint data representing known repetitive content within other podcast content, detecting a feature match between a set of audio features across multiple time-windows of the podcast content, and detecting a text match between at least one query text sentences from a transcript of the podcast content and reference text sentences, the reference text sentences comprising text sentences from the known repetitive content within the other podcast content. The method also includes responsive to the detections, generating sets of labels identifying potential repetitive content within the podcast content. The method also includes selecting, from the sets of labels, a consolidated set of labels identifying segments of repetitive content within the podcast content, and responsive to selecting the consolidated set of labels, performing an action.
-
公开(公告)号:US20240233743A1
公开(公告)日:2024-07-11
申请号:US18561481
申请日:2022-02-25
Applicant: SONY GROUP CORPORATION
Inventor: RINA KOTANI , SHIRO SUZUKI
IPC: G10L21/0272 , G10L17/02
CPC classification number: G10L21/0272 , G10L17/02
Abstract: An information processing apparatus (100) includes a signal acquiring unit (132), a signal identification unit (133), a signal processing unit (134), and a signal transmission unit (135). The signal acquiring unit (132) acquires, from a communication terminal, at least one of a first voice signal corresponding to a voice of a preceding speaker and a second voice signal corresponding to a voice of an intervening speaker. When the signal strengths of the first voice signal and the second voice signal exceed a predetermined threshold, the signal identification unit (133) specifies an overlapping section in which the first voice signal and the second voice signal overlap, and identifies either the first voice signal or the second voice signal as a phase inversion target in the overlapping section. The signal processing unit (134) performs phase inversion processing on one voice signal identified as the phase inversion target while the overlapping section continues. The signal transmission unit (135) adds one voice signal on which the phase inversion processing has been performed and the other voice signal on which the phase inversion processing has not been performed, and transmits the resulting signal to a communication terminal (10).
-
公开(公告)号:US12028617B2
公开(公告)日:2024-07-02
申请号:US18060210
申请日:2022-11-30
Applicant: HISENSE VISUAL TECHNOLOGY CO., LTD.
Inventor: Luming Yang , Dayong Wang , Xusheng Wang , Jin Cheng , Wenqin Yu , Le Ma , Jiayi Ding
IPC: G10L17/06 , G06T7/70 , G10L17/02 , G10L17/22 , H04N21/422 , H04N21/431 , H04N23/611 , H04N23/695 , H04R1/02 , H04R1/40 , H04R3/00
CPC classification number: H04N23/695 , G06T7/70 , G10L17/02 , G10L17/06 , G10L17/22 , H04N21/422 , H04N21/431 , H04N23/611 , H04R1/028 , H04R1/406 , H04R3/005 , G06T2207/30196 , H04R2499/15
Abstract: Disclosed are a display apparatus and a processing method for the display apparatus with a camera. The display apparatus includes a camera, a sound collector and controller. The controller is configured for: starting shooting at least one image through the camera; in response to the at least one image not including a portrait of a user, starting obtaining a first test audio signal input from the user through the sound collector; in response to the first test audio signal, determining a target azimuth corresponding to the user; generating a rotation instruction for the camera according to the target azimuth of the user; sending the rotation instruction to the camera to adjust a shooting direction of the camera to the target azimuth.
-
公开(公告)号:US20240212702A1
公开(公告)日:2024-06-27
申请号:US18088070
申请日:2022-12-23
Applicant: Zoom Video Communications, Inc.
Inventor: Jiachuan Deng , Cheng Lun Hu , Zhaofeng Jia , Qiyong Liu , Zhengwei Wei , Da-Yi Wu
IPC: G10L21/0232 , G10L17/02
CPC classification number: G10L21/0232 , G10L17/02 , G10L25/18
Abstract: Various embodiments of an apparatus, method(s), system(s) and computer program product(s) described herein are directed to a Denoise Engine. The Denoise Engine collects segments of voice content of a first user account from audio data associated with a virtual meeting. The audio data further includes additional types of audio content. The Denoise Engine identifies an audio embedding model. The Denoise Engine receives a speaker embedding generated by the audio embedding model. The speaker embedding based on the collected segments of voice content. The Denoise Engine generates personalized denoised voice content of the first user account for the virtual meeting by applying the speaker embedding to the audio data associated with a virtual meeting.
-
-
-
-
-
-
-
-
-