Systems and Methods for Audio Preparation and Delivery

    公开(公告)号:US20240321286A1

    公开(公告)日:2024-09-26

    申请号:US18189764

    申请日:2023-03-24

    CPC classification number: G10L21/007 G10L17/02 G10L21/028 G10L25/18 G10L25/30

    Abstract: The present application relates to systems and methods for audio preparation and delivery. Such systems and methods may involve a controller configured to carry out operations. The operations include receiving source audio comprising a vocal portion. The operations also include selecting, using a trained machine learning model, a primary voice profile based on an analysis of the vocal portion of the received source audio. The primary voice profile is selected from a plurality of predetermined voice profiles. The operations also include adjusting, based on the selected primary voice profile, at least a portion of the source audio. The operations also include providing output audio based on the adjusted portion of source audio.

    General speech enhancement method and apparatus using multi-source auxiliary information

    公开(公告)号:US12094484B2

    公开(公告)日:2024-09-17

    申请号:US18360838

    申请日:2023-07-28

    Applicant: ZHEJIANG LAB

    CPC classification number: G10L21/0232 G10L17/02 G10L17/04 G10L25/30

    Abstract: The present disclosure discloses a general speech enhancement method and apparatus using multi-source auxiliary information. The method includes following steps: S1: building a training data set; S2: using the training data set to learn network parameters of a model, and building a speech enhancement model; S3: building a sound source information database in a pre-collection or on-site collection mode; S4: acquiring an input of the speech enhancement model; and S5: taking a noisy original signal as a main input of the speech enhancement model, taking auxiliary sound signals of a target source group and auxiliary sound signals of an interference source group as side inputs of the speech enhancement model for speech enhancement, and obtaining an enhanced speech signal.

    System and Method for Podcast Repetitive Content Detection

    公开(公告)号:US20240233747A1

    公开(公告)日:2024-07-11

    申请号:US18405269

    申请日:2024-01-05

    CPC classification number: G10L25/51 G10L17/02 G10L17/06 G10L25/90

    Abstract: In one aspect, a method includes detecting a fingerprint match between query fingerprint data representing at least one audio segment within podcast content and reference fingerprint data representing known repetitive content within other podcast content, detecting a feature match between a set of audio features across multiple time-windows of the podcast content, and detecting a text match between at least one query text sentences from a transcript of the podcast content and reference text sentences, the reference text sentences comprising text sentences from the known repetitive content within the other podcast content. The method also includes responsive to the detections, generating sets of labels identifying potential repetitive content within the podcast content. The method also includes selecting, from the sets of labels, a consolidated set of labels identifying segments of repetitive content within the podcast content, and responsive to selecting the consolidated set of labels, performing an action.

    INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING SYSTEM

    公开(公告)号:US20240233743A1

    公开(公告)日:2024-07-11

    申请号:US18561481

    申请日:2022-02-25

    CPC classification number: G10L21/0272 G10L17/02

    Abstract: An information processing apparatus (100) includes a signal acquiring unit (132), a signal identification unit (133), a signal processing unit (134), and a signal transmission unit (135). The signal acquiring unit (132) acquires, from a communication terminal, at least one of a first voice signal corresponding to a voice of a preceding speaker and a second voice signal corresponding to a voice of an intervening speaker. When the signal strengths of the first voice signal and the second voice signal exceed a predetermined threshold, the signal identification unit (133) specifies an overlapping section in which the first voice signal and the second voice signal overlap, and identifies either the first voice signal or the second voice signal as a phase inversion target in the overlapping section. The signal processing unit (134) performs phase inversion processing on one voice signal identified as the phase inversion target while the overlapping section continues. The signal transmission unit (135) adds one voice signal on which the phase inversion processing has been performed and the other voice signal on which the phase inversion processing has not been performed, and transmits the resulting signal to a communication terminal (10).

    MANUAL-ENROLLMENT-FREE PERSONALIZED DENOISE
    10.
    发明公开

    公开(公告)号:US20240212702A1

    公开(公告)日:2024-06-27

    申请号:US18088070

    申请日:2022-12-23

    CPC classification number: G10L21/0232 G10L17/02 G10L25/18

    Abstract: Various embodiments of an apparatus, method(s), system(s) and computer program product(s) described herein are directed to a Denoise Engine. The Denoise Engine collects segments of voice content of a first user account from audio data associated with a virtual meeting. The audio data further includes additional types of audio content. The Denoise Engine identifies an audio embedding model. The Denoise Engine receives a speaker embedding generated by the audio embedding model. The speaker embedding based on the collected segments of voice content. The Denoise Engine generates personalized denoised voice content of the first user account for the virtual meeting by applying the speaker embedding to the audio data associated with a virtual meeting.

Patent Agency Ranking