-
公开(公告)号:US11138989B2
公开(公告)日:2021-10-05
申请号:US16296122
申请日:2019-03-07
Applicant: ADOBE INC.
Inventor: Prem Seetharaman , Gautham J. Mysore , Bryan A. Pardo
IPC: G10L25/60 , G10L25/30 , G10L21/0232 , G10L25/84 , G10L21/0208
Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for sound quality prediction and real-time feedback about sound quality, such as room acoustics quality and background noise. Audio data can be sampled from a live sound source and stored in an audio buffer. The audio data in the buffer is analyzed to calculate a stream of values of one or more sound quality measures, such as speech transmission index and signal-to-noise ratio. Speech transmission index can be calculated using a convolution neural network configured to predict speech transmission index from reverberant speech. The stream of values can be used to provide real-time feedback about sound quality of the audio data. For example, a visual indicator on a graphical user interface can be updated based on consistency of the values over time. The real-time feedback about sound quality can help users optimize their recording setup.
-
公开(公告)号:US20190130894A1
公开(公告)日:2019-05-02
申请号:US15796292
申请日:2017-10-27
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu Jin , Gautham J. Mysore , Stephen DiVerdi , Jingwan Lu , Adam Finkelstein
CPC classification number: G10L13/08 , G06F17/24 , G10L13/00 , G10L13/04 , G10L13/06 , G10L13/07 , G10L15/02 , G10L21/00 , G10L2021/0135 , G11B27/022
Abstract: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.
-
公开(公告)号:US20190318726A1
公开(公告)日:2019-10-17
申请号:US16108996
申请日:2018-08-22
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu Jin , Gautham J. Mysore , Jingwan Lu , Adam Finkelstein
Abstract: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.
-
公开(公告)号:US10249321B2
公开(公告)日:2019-04-02
申请号:US13681643
申请日:2012-11-20
Applicant: Adobe Inc.
Inventor: Brian John King , Gautham J. Mysore , Paris Smaragdis
IPC: G10L21/00 , G10L21/043
Abstract: Sound rate modification techniques are described. In one or more implementations, an indication is received of an amount that a rate of output of sound data is to be modified. One or more sound rate rules are applied to the sound data that, along with the received indication, are usable to calculate different rates at which different portions of the sound data are to be modified, respectively. The sound data is then output such that the calculated rates are applied.
-
公开(公告)号:US10262680B2
公开(公告)日:2019-04-16
申请号:US13931450
申请日:2013-06-28
Applicant: Adobe Inc.
Inventor: Gautham J. Mysore , Paris Smaragdis
IPC: G10L21/0364 , G10L15/20 , G10L21/0208 , G10L25/84 , G10L25/51
Abstract: Variable sound decomposition masking techniques are described. In one or more implementations, a mask is generated that incorporates a user input as part of the mask, the user input is usable at least in part to define a threshold that is variable based on the user input and configured for use in performing a sound decomposition process. The sound decomposition process is performed using the mask to assign portions of sound data to respective ones of a plurality of sources of the sound data.
-
公开(公告)号:US10347238B2
公开(公告)日:2019-07-09
申请号:US15796292
申请日:2017-10-27
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu Jin , Gautham J. Mysore , Stephen DiVerdi , Jingwan Lu , Adam Finkelstein
Abstract: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.
-
公开(公告)号:US20200286504A1
公开(公告)日:2020-09-10
申请号:US16296122
申请日:2019-03-07
Applicant: ADOBE INC.
Inventor: Prem Seetharaman , Gautham J. Mysore , Bryan A. Pardo
IPC: G10L25/60 , G10L25/30 , G10L25/84 , G10L21/0232
Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for sound quality prediction and real-time feedback about sound quality, such as room acoustics quality and background noise. Audio data can be sampled from a live sound source and stored in an audio buffer. The audio data in the buffer is analyzed to calculate a stream of values of one or more sound quality measures, such as speech transmission index and signal-to-noise ratio. Speech transmission index can be calculated using a convolution neural network configured to predict speech transmission index from reverberant speech. The stream of values can be used to provide real-time feedback about sound quality of the audio data. For example, a visual indicator on a graphical user interface can be updated based on consistency of the values over time. The real-time feedback about sound quality can help users optimize their recording setup.
-
公开(公告)号:US10770063B2
公开(公告)日:2020-09-08
申请号:US16108996
申请日:2018-08-22
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu Jin , Gautham J. Mysore , Jingwan Lu , Adam Finkelstein
Abstract: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.
-
公开(公告)号:US10460763B2
公开(公告)日:2019-10-29
申请号:US15497433
申请日:2017-04-26
Applicant: Adobe Inc.
Inventor: Zhengshan Shi , Gautham J. Mysore
Abstract: Methods and systems for automatic audio loop generation from an audio track identify suitable portions of the audio track for generating audio loops. One or more embodiments identify portions of the audio track that include a beginning beat and an ending beat that have similar audio features that provide for seamless transitions when generating the audio loops. One or more embodiments generate scores for the portions based on the similarity of the audio features of the corresponding beginning and ending beats. Additionally, one or more embodiments use the generated scores to determine whether each portion is a suitable audio loop candidate. One or more embodiments then generate one or more audio loops using one or more suitable portions of the audio track.
-
-
-
-
-
-
-
-