-
公开(公告)号:US20240160651A1
公开(公告)日:2024-05-16
申请号:US18065803
申请日:2022-12-14
Applicant: Amazon Technologies, Inc.
Inventor: Kasturi Bhattacharjee , Rashmi Gangadharaiah , Senthil C. Chidambaram , Ankit Kapoor , Sharon Shapira , Tony Chun Tung Ng , Deepak Seetharam Nadig
IPC: G06F16/2457 , G06F16/242 , G06F16/28
CPC classification number: G06F16/24578 , G06F16/242 , G06F16/285
Abstract: Systems and methods are used to detect underlying themes from a collection of documents at an aggregated level. A representative set of documents may be selected from a cluster of documents, with the representative set of documents corresponding to a general theme of the cluster. Candidate theme phrases may then be extracted from the documents and used to generate document embeddings and candidate phrase embeddings, which may be ranked, such as with a diversity-based ranking approach. Certain candidates may be selected from the ranking. Each of the documents forming the representative set may then be concatenated and a query embedding may be generated and ranked against the candidate phrases. In this manner, a collection of phrases associated with both the general underlying theme of the cluster, along with granular topics associated with that theme, may be identified.