-
公开(公告)号:US20160314182A1
公开(公告)日:2016-10-27
申请号:US14414855
申请日:2014-09-18
Applicant: Google, Inc.
Inventor: Xincheng Zhang , Hui Tan , Zhiyu Wang , Jinan Lou
CPC classification number: G06F17/30598 , G06F17/30705 , H04L43/04
Abstract: Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.
Abstract translation: 基于一个或多个分类术语和可选地基于文档的结构路径的相似性的与聚类文档相关的方法和装置。 在一些实现中,文档是诸如结构化电子邮件或其他结构化通信之类的通信。 在这些实现中的一些实现中,对通信进行聚类包括识别指示分类的多个分类项,识别包括未标记有与分类的关联的通信的通信语料库,以及基于发生的确定通信集群 集群通信中的一个或多个分类术语。
-
公开(公告)号:US20140279864A1
公开(公告)日:2014-09-18
申请号:US14143835
申请日:2013-12-30
Applicant: Google Inc.
Inventor: Mikhail Lopyrev , Gaurav Jain , Bote Deepak Narayan , Vitaly Repeshko , Chengling Chan , Jinan Lou
IPC: G06F17/30
CPC classification number: G06F17/2705 , G06F16/258
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a first document, the first document being associated with a user, executing a plurality of parsers, each parser of the plurality of parsers processing the first document to provide one or more first data values, merging the one or more first data values provided from the plurality of parsers to populate a data record having one or more data fields, the data record being specific to the user, and storing the data record in computer-readable memory.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于接收第一文档的计算机程序,第一文档与用户相关联,执行多个解析器,多个解析器的每个解析器处理第一文档到 提供一个或多个第一数据值,合并从多个解析器提供的一个或多个第一数据值,以填充具有一个或多个数据字段的数据记录,该数据记录是用户特有的,并将数据记录存储在计算机中 可读内存
-
公开(公告)号:US10657158B2
公开(公告)日:2020-05-19
申请号:US15360939
申请日:2016-11-23
Applicant: Google Inc.
Inventor: Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt
IPC: G06F16/00 , G06F16/28 , G06N20/00 , G06F16/93 , G06Q10/10 , G06N20/20 , G06F40/174 , G06F40/186
Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.
-
公开(公告)号:US20170140022A1
公开(公告)日:2017-05-18
申请号:US14289355
申请日:2014-05-28
Applicant: Google Inc.
Inventor: Jinan Lou , Hongtao Zhong
CPC classification number: G06F16/284 , G06F16/337 , G06F16/955 , G06N5/025 , G06N20/00
Abstract: Methods, apparatus and computer-readable media (transitory and non-transitory) are disclosed for analyzing a document associated with a user to identify an assumption about the user, comparing the assumption with on one or more signals that are associated with the user and separate from the document to determine a veracity of the assumption, and updating one or more techniques for identifying an assumption based on feedback that is generated based on the veracity.
-
公开(公告)号:US09600543B1
公开(公告)日:2017-03-21
申请号:US14040466
申请日:2013-09-27
Applicant: Google Inc.
Inventor: Lucian Florin Cionca , Andre Rohe , Yonatan Zunger , Sangsoo Sung , Mohit Oberoi , Daniel Belov , Harish Rajamani , Jinan Lou
CPC classification number: G06F17/30554 , G06F17/30867 , G06Q50/10
Abstract: In one aspect, a method includes receiving an indication of a request from a user to view a stream associated with the user, generating a request for one or more items visible to the user for display within the stream, the request including a search query identifying search criteria including one or more tokens, the one or more tokens including at least a user token identifying the user, receiving one or more items in response to the request, the one or more items including at least one of the one or more tokens and further being visible to the user and providing the one or more items for display to the user within the stream in response to the request. Other aspects can be embodied in corresponding systems and apparatus, including computer program products.
-
公开(公告)号:US10360537B1
公开(公告)日:2019-07-23
申请号:US15484933
申请日:2017-04-11
Applicant: Google Inc.
Inventor: Mike Bendersky , Maureen Heymans , Jinan Lou , Jie Yang , MyLinh Yang , Amitabh Saikia , Marc-Allen Cartright , Vanja Josifovski , Hui Tan , Luis Garcia Pueyo
IPC: G06F17/30 , G06Q10/10 , G06F16/248 , G06F16/9535 , H04W4/029
Abstract: Techniques are described herein for generating and applying event data extraction templates. In various implementations, a data extraction template may be applied to structured communications to extract, from each structured communication, event data associated with a transient markup language path indicated in the data extraction template. The data extraction template may include an event-related semantic data type assigned to the transient markup language path and a strength of association between the transient structural path and the event-related semantic data type. Feedback may be obtained concerning event data extracted from one or more of the structured communications. Based on the feedback, the strength of association between the transient markup language path and the event-related semantic data type may be altered. The data extraction template may then be applied to a subsequent structured communication to extract new event data from the structured communication based on the altered strength of association.
-
公开(公告)号:US10007717B2
公开(公告)日:2018-06-26
申请号:US14414855
申请日:2014-09-18
Applicant: Google Inc.
Inventor: Xincheng Zhang , Hui Tan , Zhiyu Wang , Jinan Lou
CPC classification number: G06F16/285 , G06F16/35 , H04L43/04
Abstract: Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.
-
公开(公告)号:US20180144042A1
公开(公告)日:2018-05-24
申请号:US15360939
申请日:2016-11-23
Applicant: Google Inc.
Inventor: Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt
CPC classification number: G06F16/285 , G06F16/93 , G06F17/243 , G06F17/248 , G06N20/00 , G06N20/20 , G06Q10/10
Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.
-
公开(公告)号:US09652530B1
公开(公告)日:2017-05-16
申请号:US14470416
申请日:2014-08-27
Applicant: Google Inc.
Inventor: Mike Bendersky , Maureen Heymans , Jinan Lou , Jie Yang , MyLinh Yang , Amitabh Saikia , Marc-Allen Cartright , Vanja Josifovski , Hui Tan , Luis Garcia Pueyo
IPC: G06F17/30
CPC classification number: G06F17/30705 , G06F17/30923
Abstract: Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be applied to the communications of the corpus. A determination may be made, based on the applying, that the communications of the corpus are event-related. An event data type may be assigned to the transient structural path based on the applying. An event data extraction template may be generated to extract, from one or more subsequent communications, one or more event-related segments of text associated with the transient structural path.
-
-
-
-
-
-
-
-