-
公开(公告)号:US11830481B2
公开(公告)日:2023-11-28
申请号:US17538683
申请日:2021-11-30
Applicant: Adobe Inc.
Inventor: Maxwell Morrison , Zeyu Jin , Nicholas Bryan , Juan Pablo Caceres Chomali , Lucas Rencker
IPC: G10L15/18 , G10L25/90 , G10L15/187 , G10L15/02 , G10L15/04 , G10L15/16 , G10L21/0208
CPC classification number: G10L15/1807 , G10L15/02 , G10L15/04 , G10L15/16 , G10L15/187 , G10L21/0208 , G10L25/90 , G10L2015/025 , G10L2021/02082
Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.
-
公开(公告)号:US20230169961A1
公开(公告)日:2023-06-01
申请号:US17538683
申请日:2021-11-30
Applicant: Adobe Inc.
Inventor: Maxwell Morrison , Zeyu Jin , Nicholas Bryan , Juan Pablo Caceres Chomali , Lucas Rencker
IPC: G10L15/18 , G10L25/90 , G10L15/187 , G10L15/02 , G10L15/04 , G10L21/0208 , G10L15/16 , G06N3/08
CPC classification number: G10L15/1807 , G10L25/90 , G10L15/187 , G10L15/02 , G10L15/04 , G10L21/0208 , G10L15/16 , G06N3/088 , G10L2015/025 , G10L2021/02082 , G06N3/0454
Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.
-