-
1.
公开(公告)号:US11514925B2
公开(公告)日:2022-11-29
申请号:US16863591
申请日:2020-04-30
发明人: Zeyu Jin , Jiaqi Su , Adam Finkelstein
IPC分类号: G10L21/0364 , G10L25/30 , G10L25/18 , G06N3/08 , G06N3/04
摘要: Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.
-
公开(公告)号:US20190130894A1
公开(公告)日:2019-05-02
申请号:US15796292
申请日:2017-10-27
发明人: Zeyu Jin , Gautham J. Mysore , Stephen DiVerdi , Jingwan Lu , Adam Finkelstein
CPC分类号: G10L13/08 , G06F17/24 , G10L13/00 , G10L13/04 , G10L13/06 , G10L13/07 , G10L15/02 , G10L21/00 , G10L2021/0135 , G11B27/022
摘要: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.
-
公开(公告)号:US20240281577A1
公开(公告)日:2024-08-22
申请号:US18170735
申请日:2023-02-17
摘要: Discontinuity modeling techniques of computing functions of a program are described. In one example, a program has a computing function that includes a discontinuity. An input is received by the data modeling system that identifies an axis. A plurality of samples is then generated by the data modeling system along the axis based on an output of the program. The samples are then used as a basis by the data modeling system to generate a data model that models the discontinuity. The data model includes, in one example, one or more gradients and models the discontinuity using a 1D box kernel.
-
公开(公告)号:US11361467B2
公开(公告)日:2022-06-14
申请号:US16692450
申请日:2019-11-22
申请人: Adobe Inc. , Princeton University
发明人: Wilmot Li , Hijung Shin , Adam Finkelstein , Nora Willett
摘要: This disclosure generally relates to character animation. More specifically, this disclosure relates to pose selection using data analytics techniques applied to training data, and generating 2D animations of illustrated characters using performance data and the selected poses. An example process or system includes extracting sets of joint positions from a training video including the subject, grouping the plurality of frames into frame groups using the sets of joint positions for each frame, identifying a representative frame for each frame group using the frame groups, clustering the frame groups into clusters using the representative frames, outputting a visualization of the clusters at a user interface, and receiving a selection of a cluster for animation of the subject.
-
公开(公告)号:US10347238B2
公开(公告)日:2019-07-09
申请号:US15796292
申请日:2017-10-27
发明人: Zeyu Jin , Gautham J. Mysore , Stephen DiVerdi , Jingwan Lu , Adam Finkelstein
摘要: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.
-
6.
公开(公告)号:US20210343305A1
公开(公告)日:2021-11-04
申请号:US16863591
申请日:2020-04-30
发明人: Zeyu Jin , Jiaqi Su , Adam Finkelstein
摘要: Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.
-
公开(公告)号:US20210158593A1
公开(公告)日:2021-05-27
申请号:US16692471
申请日:2019-11-22
申请人: Adobe Inc. , Princeton University
发明人: Wilmot Li , Hijung Shin , Adam Finkelstein , Nora Willett
摘要: This disclosure generally relates to character animation. More specifically, but not by way of limitation, this disclosure relates to pose selection using data analytics techniques applied to training data, and generating 2D animations of illustrated characters using performance data and the selected poses. An example process or system includes obtaining a selection of training poses of the subject and a set of character poses, obtaining a performance video of the subject, wherein the performance video includes a plurality of performance frames that include poses performed by the subject, grouping the plurality of performance frames into groups of performance frames, assigning a selected training pose from the selection of training poses to each group of performance frames using the clusters of training frames, generating a sequence of character poses based on the groups of performance frames and their assigned training poses, outputting the sequence of character poses.
-
公开(公告)号:US20210158565A1
公开(公告)日:2021-05-27
申请号:US16692450
申请日:2019-11-22
申请人: Adobe Inc. , Princeton University
发明人: Wilmot Li , Hijung Shin , Adam Finkelstein , Nora Willett
摘要: This disclosure generally relates to character animation. More specifically, this disclosure relates to pose selection using data analytics techniques applied to training data, and generating 2D animations of illustrated characters using performance data and the selected poses. An example process or system includes extracting sets of joint positions from a training video including the subject, grouping the plurality of frames into frame groups using the sets of joint positions for each frame, identifying a representative frame for each frame group using the frame groups, clustering the frame groups into clusters using the representative frames, outputting a visualization of the clusters at a user interface, and receiving a selection of a cluster for animation of the subject.
-
公开(公告)号:US10770063B2
公开(公告)日:2020-09-08
申请号:US16108996
申请日:2018-08-22
发明人: Zeyu Jin , Gautham J. Mysore , Jingwan Lu , Adam Finkelstein
摘要: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.
-
公开(公告)号:US11282257B2
公开(公告)日:2022-03-22
申请号:US16692471
申请日:2019-11-22
申请人: Adobe Inc. , Princeton University
发明人: Wilmot Li , Hijung Shin , Adam Finkelstein , Nora Willett
摘要: This disclosure generally relates to character animation. More specifically, but not by way of limitation, this disclosure relates to pose selection using data analytics techniques applied to training data, and generating 2D animations of illustrated characters using performance data and the selected poses. An example process or system includes obtaining a selection of training poses of the subject and a set of character poses, obtaining a performance video of the subject, wherein the performance video includes a plurality of performance frames that include poses performed by the subject, grouping the plurality of performance frames into groups of performance frames, assigning a selected training pose from the selection of training poses to each group of performance frames using the clusters of training frames, generating a sequence of character poses based on the groups of performance frames and their assigned training poses, outputting the sequence of character poses.
-
-
-
-
-
-
-
-
-