-
公开(公告)号:US12190061B2
公开(公告)日:2025-01-07
申请号:US17644856
申请日:2021-12-17
Applicant: ADOBE INC.
Inventor: Shashank Shailabh , Madhur Panwar , Milan Aggarwal , Pinkesh Badjatiya , Simra Shahid , Nikaash Puri , S Sejal Naidu , Sharat Chandra Racha , Balaji Krishnamurthy , Ganesh Karbhari Palwe
IPC: G06F40/289 , G06F40/30 , G06F40/40
Abstract: Systems and methods for topic modeling are described. The systems and methods include encoding words of a document using an embedding matrix to obtain word embeddings for the document. The words of the document comprise a subset of words in a vocabulary, and the embedding matrix is trained as part of a topic attention network based on a plurality of topics. The systems and methods further include encoding a topic-word distribution matrix using the embedding matrix to obtain a topic embedding matrix. The topic-word distribution matrix represents relationships between the plurality of topics and the words of the vocabulary. The systems and methods further include computing a topic context matrix based on the topic embedding matrix and the word embeddings and identifying a topic for the document based on the topic context matrix.
-
公开(公告)号:US20230169271A1
公开(公告)日:2023-06-01
申请号:US17644856
申请日:2021-12-17
Applicant: ADOBE INC.
Inventor: Shashank Shailabh , Madhur Panwar , Milan Aggarwal , Pinkesh Badjatiya , Simra Shahid , Nikaash Puri , S Sejal Naidu , Sharat Chandra Racha , Balaji Krishnamurthy , Ganesh Karbhari Palwe
IPC: G06F40/289 , G06F40/40 , G06F40/30
CPC classification number: G06F40/289 , G06F40/40 , G06F40/30
Abstract: Systems and methods for topic modeling are described. The systems and methods include encoding words of a document using an embedding matrix to obtain word embeddings for the document. The words of the document comprise a subset of words in a vocabulary, and the embedding matrix is trained as part of a topic attention network based on a plurality of topics. The systems and methods further include encoding a topic-word distribution matrix using the embedding matrix to obtain a topic embedding matrix. The topic-word distribution matrix represents relationships between the plurality of topics and the words of the vocabulary. The systems and methods further include computing a topic context matrix based on the topic embedding matrix and the word embeddings and identifying a topic for the document based on the topic context matrix.
-