-
公开(公告)号:US20230162754A1
公开(公告)日:2023-05-25
申请号:US17915074
申请日:2021-03-25
Inventor: Chunghsin Yeh , Giulio Cengarle , Mark David de Burgh David de Burgh
IPC: G10L21/0364 , G10L25/30 , G10L25/21 , G10L25/84 , G10L17/20 , G10L21/028 , G10L21/034
CPC classification number: G10L21/0364 , G10L25/30 , G10L25/21 , G10L25/84 , G10L17/20 , G10L21/028 , G10L21/034 , G10L2025/786
Abstract: Embodiments are disclosed for automatic leveling of speech content. In an embodiment, a method comprises: receiving, using one or more processors, frames of an audio recording including speech and non-speech content; for each frame: determining, using the one or more processors, a speech probability; analyzing, using the one or more processors, a perceptual loudness of the frame; obtaining, using the one or more processors, a target loudness range for the frame; computing, using the one or more processors, gains to apply to the frame based on the target loudness range and the perceptual loudness analysis, where the gains include dynamic gains that change frame-by-frame and that are scaled based on the speech probability; and applying the gains to the frame so that a resulting loudness range of the speech content in the audio recording fits within the target loudness range.
-
公开(公告)号:US20220262387A1
公开(公告)日:2022-08-18
申请号:US17733397
申请日:2022-04-29
Inventor: Giulio Cengarle , Antonio Mateos Sole , Brett G. Crockett
IPC: G10L21/0232 , G10L21/0264 , H03G3/30
Abstract: Methods, systems, and computer program products of automatic de-essing are disclosed. An automatic de-esser can be used without manually setting parameters and can perform reliable sibilance detection and reduction regardless of absolute signal level, singer gender and other extraneous factors. An audio processing device divides input audio signals into buffers each containing a number of samples, the buffers overlapping one another. The audio processing device transforms each buffer from the time domain into the frequency domain and implements de-essing as a multi-band compressor that only acts on a designated sibilance band. The audio processing device determines an amount of attenuation in the sibilance band based on comparison of energy level in sibilance band of a buffer to broadband energy level in a previous buffer. The amount of attenuation is also determined based on a zero-crossing rate, as well as a slope and onset of a compression curve.
-
公开(公告)号:US11322170B2
公开(公告)日:2022-05-03
申请号:US16753029
申请日:2018-10-02
Inventor: Giulio Cengarle , Antonio Mateos Sole , Brett G. Crockett
IPC: H03G7/00 , H03G5/00 , H03G3/20 , G10L21/0232 , G10L21/0264 , H03G3/30
Abstract: Methods, systems, and computer program products of automatic de-essing are disclosed. An automatic de-esser can be used without manually setting parameters and can perform reliable sibilance detection and reduction regardless of absolute signal level, singer gender and other extraneous factors. An audio processing device divides input audio signals into buffers each containing a number of samples, the buffers overlapping one another. The audio processing device transforms each buffer from the time domain into the frequency domain and implements de-essing as a multi-band compressor that only acts on a designated sibilance band. The audio processing device determines an amount of attenuation in the sibilance band based on comparison of energy level in sibilance band of a buffer to broadband energy level in a previous buffer. The amount of attenuation is also determined based on a zero-crossing rate, as well as a slope and onset of a compression curve.
-
公开(公告)号:US20210219083A1
公开(公告)日:2021-07-15
申请号:US17149683
申请日:2021-01-14
Inventor: Jun Wang , Giulio Cengarle , Juan Felix Torres , Daniel Arteaga
Abstract: An audio object including audio content and object metadata is received. The object metadata indicates an object spatial position of the audio object to be rendered by audio speakers in a playback environment. Based on the object spatial position and source spatial positions of the audio speakers, initial gain values for the audio speakers are determined. The initial gain values can be used to select a set of audio speakers from among the audio speakers. Based on the object spatial position and a set of source spatial positions at which the set of audio speakers are respectively located in the playback environment, a set of non-negative optimized gain values for the set of audio speakers is determined. The audio object at the object spatial position is rendered with the set of optimized gain values for the set of audio speakers.
-
公开(公告)号:US09712939B2
公开(公告)日:2017-07-18
申请号:US14908094
申请日:2014-06-17
Inventor: Antonio Mateos Sole , Giulio Cengarle , Dirk Jeroen Breebart , Nicolas R. Tsingos
CPC classification number: H04S7/30 , H04S2400/03 , H04S2400/11
Abstract: A gain contribution of the audio signal for each of the N audio objects to at least one of M speakers may be determined. Determining the gain contribution may involve determining a center of loudness position that is a function of speaker (or cluster) positions and gains assigned to each speaker (or cluster). Determining the gain contribution also may involve determining a minimum value of a cost function. A first term of the cost function may represent a difference between the center of loudness position and an audio object position.
-
公开(公告)号:US20250126428A1
公开(公告)日:2025-04-17
申请号:US18704402
申请日:2022-10-14
Inventor: Xu Li , Giulio Cengarle , Qingyuan Bin , Michael Getty Horgan
IPC: H04S7/00
Abstract: A method of audio processing includes generating a detection score based on the partial loudnesses of a reference audio signal, extracted audio objects, extracted bed channels, a rendered audio signal and a channel-based audio signal. The detection score is indicative of an audio artifact in one or more of the audio objects and the bed channels. The extracted audio objects and extracted bed channels may be modified, in accordance with the detection score, to reduce the audio artifact.
-
公开(公告)号:US20240013799A1
公开(公告)日:2024-01-11
申请号:US18044777
申请日:2021-09-21
Inventor: Davide Scaini , Chunghsin Yeh , Giulio Cengarle , Mark David de Burgh
IPC: G10L21/0232 , G10L21/028 , G10L25/18 , G10L25/84 , G10L21/034 , G10L21/0364 , G10L25/21
CPC classification number: G10L21/0232 , G10L21/028 , G10L25/18 , G10L25/84 , G10L21/034 , G10L21/0364 , G10L25/21
Abstract: In some embodiments, a method, comprises: dividing, using at least one processor, an audio input into speech and non-speech segments; for each frame in each non-speech segment, estimating, using the at least one processor, a time-varying noise spectrum of the non-speech segment; for each frame in each speech segment, estimating, using the at least one processor, speech spectrum of the speech segment; for each frame in each speech segment, identifying one or more non-speech frequency components in the speech spectrum; comparing the one or more non-speech frequency components with one or more corresponding frequency components in a plurality of estimated noise spectra and selecting the estimated noise spectrum from the plurality of estimated noise spectra based on a result of the comparing.
-
公开(公告)号:US11425503B2
公开(公告)日:2022-08-23
申请号:US16987197
申请日:2020-08-06
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Daniel Arteaga , Giulio Cengarle , David Matthew Fischer , Antonio Mateos Sole , Davide Scaini , Alan Seefeldt
Abstract: Embodiments are described for a method of simultaneously localizing a set of speakers and microphones, having only the times of arrival between each of the speakers and microphones. An autodiscovery process uses an external input to set: a global translation (3 continuous parameters), a global rotation (3 continuous parameters), and discrete symmetries, i.e., an exchange of any axis pairs and/or reversal of any axis. Different time of arrival acquisition techniques may be used, such as ultrasonic sweeps or generic multitrack audio content. The autodiscovery algorithm is based in minimizing a certain cost function, and the process allows for latencies in the recordings, possibly linked to the latencies in the emission.
-
公开(公告)号:US11356787B2
公开(公告)日:2022-06-07
申请号:US17149683
申请日:2021-01-14
Inventor: Jun Wang , Giulio Cengarle , Juan Felix Torres , Daniel Arteaga
Abstract: An audio object including audio content and object metadata is received. The object metadata indicates an object spatial position of the audio object to be rendered by audio speakers in a playback environment. Based on the object spatial position and source spatial positions of the audio speakers, initial gain values for the audio speakers are determined. The initial gain values can be used to select a set of audio speakers from among the audio speakers. Based on the object spatial position and a set of source spatial positions at which the set of audio speakers are respectively located in the playback environment, a set of non-negative optimized gain values for the set of audio speakers is determined. The audio object at the object spatial position is rendered with the set of optimized gain values for the set of audio speakers.
-
公开(公告)号:US20190387342A1
公开(公告)日:2019-12-19
申请号:US16555126
申请日:2019-08-29
Inventor: Jun Wang , Giulio Cengarle , Juan Felix Torres , Daniel Arteaga
Abstract: An audio object including audio content and object metadata is received. The object metadata indicates an object spatial position of the audio object to be rendered by audio speakers in a playback environment. Based on the object spatial position and source spatial positions of the audio speakers, initial gain values for the audio speakers are determined. The initial gain values can be used to select a set of audio speakers from among the audio speakers. Based on the object spatial position and a set of source spatial positions at which the set of audio speakers are respectively located in the playback environment, a set of non-negative optimized gain values for the set of audio speakers is determined. The audio object at the object spatial position is rendered with the set of optimized gain values for the set of audio speakers.
-
-
-
-
-
-
-
-
-