SPEECH ENHANCEMENT
    12.
    发明公开
    SPEECH ENHANCEMENT 审中-公开

    公开(公告)号:US20240363131A1

    公开(公告)日:2024-10-31

    申请号:US18577597

    申请日:2022-07-12

    CPC classification number: G10L21/0208 G10L25/27 G10L2021/02082

    Abstract: A method for dereverberating audio signals is provided. In some implementations, the method involves obtaining a real acoustic impulse response (AIR); identifying a first portion of the real AIR corresponding to early reflections of a direct sound and a second portion of the real AIR that corresponding to late reflections of the direct sound; generating one or more synthesized AIRs by modifying the first portion of the real AIR and/or the second portion of the real AIR; and using the real AIR and the one or more synthesized AIRs to generate a plurality of training samples, each training sample comprising an input audio signal and a reverberated audio signal, wherein the reverberated audio signal is generated based on the input audio signal and one of the real AIR or one of the one or more synthesized AIRs, which plurality of training samples are used to train a machine learning model.

    SPEECH ENHANCEMENT
    14.
    发明公开
    SPEECH ENHANCEMENT 审中-公开

    公开(公告)号:US20240177726A1

    公开(公告)日:2024-05-30

    申请号:US18577586

    申请日:2022-07-12

    CPC classification number: G10L21/0208 G06N3/08 G10L21/0232 G10L2021/02082

    Abstract: A method for enhancing audio signals is provided. In some implementations, the method involves (a) obtaining a training set comprising a plurality of training samples, each training sample comprising a distorted audio signal and a clean audio signal. In some implementations, the method involves (b), for a training sample of the plurality of training samples: obtaining a frequency-domain representation of the distorted audio signal; providing the frequency-domain representation to a convolutional neural network (CNN) comprising a plurality of convolutional layers and to a recurrent element, wherein an output of the recurrent element is provided to a subset of the plurality of convolutional layers; generating a predicted enhancement mask, wherein the CNN generates the predicted enhancement mask; generating a predicted enhanced audio signal based on the predicted enhancement mask; and updating weights associated with the CNN and the recurrent element based on the predicted enhanced audio signal.

    ACOUSTIC ZONING WITH DISTRIBUTED MICROPHONES

    公开(公告)号:US20220335937A1

    公开(公告)日:2022-10-20

    申请号:US17630895

    申请日:2020-07-28

    Abstract: A method for estimating a user's location in an environment may involve receiving output signals from each microphone of a plurality of microphones in the environment. At least two microphones of the plurality of microphones may be included in separate devices at separate locations in the environment and the output signals may correspond to a current utterance of a user. The method may involve determining multiple current acoustic features from the output signals of each microphone and applying a classifier to the multiple current acoustic features. Applying the classifier may involve applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the user in a plurality of user zones in the environment. The method may involve determining, based at least in part on output from the classifier, an estimate of the user zone in which the user is currently located.

    Post-Teleconference Playback Using Non-Destructive Audio Transport

    公开(公告)号:US20180295240A1

    公开(公告)日:2018-10-11

    申请号:US15578386

    申请日:2016-06-15

    Abstract: Teleconference audio data including a plurality of individual uplink data packet streams, may be received during a teleconference. Each uplink data packet stream may corresponding to a telephone endpoint used by one or more teleconference participants. The teleconference audio data may be analyzed to determine a plurality of suppressive gain coefficients, which may be applied to first instances of the teleconference audio data during the teleconference, to produce first gain-suppressed audio data provided to the telephone endpoints during the teleconference. Second instances of the teleconference audio data, as well as gain coefficient data corresponding to the plurality of suppressive gain coefficients, may be sent to a memory system as individual uplink data packet streams. The second instances of the teleconference audio data may be less gain-suppressed than the first gain-suppressed audio data.

    Jitter Buffer Apparatus and Method
    18.
    发明申请

    公开(公告)号:US20170272375A1

    公开(公告)日:2017-09-21

    申请号:US15460490

    申请日:2017-03-16

    Abstract: Disclosed is a method and apparatus operative to process packets of media received from a network including a receiver unit operative, a jitter buffer data structure and a playback head defining a point in the jitter buffer data structure from which the ordered queue of packets are to be played back, and at least one prototype head. Each prototype head having a predetermined latency assigned thereto and defining a point in the jitter buffer data structure from which the ordered queue of packets is being played back containing said latency a processor operable to determine a measure of conversational quality associated with the ordered queue of packets being played back by each prototype head. Also described is a head selector operable to compare the measures of conversational quality associated with the ordered queue of packets being played back by each prototype head to select the prototype head with the highest measure of conversational quality and a playback unit coupled to the playback head.

Patent Agency Ranking