-
公开(公告)号:US20250013831A1
公开(公告)日:2025-01-09
申请号:US18493465
申请日:2023-10-24
Applicant: Adobe Inc. , University of Maryland
Inventor: Puneet Mathur , Vlad Morariu , Verena Kaynig-Fittkau , Jiuxiang Gu , Franck Dernoncourt , Quan Tran , Ani Nenkova , Dinesh Manocha , Rajiv Jain
IPC: G06F40/30
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates a temporal dependency graph. For example, the disclosed systems generate from a text document, a structural vector, a syntactic vector, and a semantic vector. In some embodiments, the disclosed systems generate a multi-dimensional vector by combining the various vectors. In these or other embodiments, the disclosed systems generate an initial dependency graph structure and an adjacency matrix utilizing an iterative deep graph learning model. Further, in some embodiments, the disclosed systems generate an entity-level relation matrix utilizing a convolutional graph neural network. Moreover, in some embodiments, the disclosed systems generate a temporal dependency graph from the entity-level relation matrix and the adjacency matrix.
-
公开(公告)号:US20250095631A1
公开(公告)日:2025-03-20
申请号:US18528116
申请日:2023-12-04
Applicant: Adobe Inc.
Inventor: Puneet Mathur , Franck Dernoncourt , Quan Hung Tran , Jiuxiang Gu , Ani Nenkova , Vlad Ion Morariu , Rajiv Bhawanji Jain , Dinesh Manocha
Abstract: Position-based text-to-speech model and training techniques are described. A digital document, for instance, is received by an audio synthesis service. A text-to-speech model is utilized by the audio synthesis service to generate digital audio from text included in the digital document. The text-to-speech model, for instance, is configured to generate a text encoding and a document positional encoding from an initial text sequence of the digital document. The document positional encoding is based on a location of the text encoding within the digital document. Digital audio is then generated by the text-to-speech model that includes a spectrogram having a reordered text sequence, which is different from the initial text sequence, by decoding the text encoding and the document positional encoding.
-