-
公开(公告)号:US09607314B1
公开(公告)日:2017-03-28
申请号:US15066225
申请日:2016-03-10
申请人: Google Inc.
发明人: Gaofeng Zhao , Yingwei Cui , Hui Tan , Bahman Rabii , Wei Chai
CPC分类号: G06Q30/0249 , G06F17/2247 , G06F17/30867 , G06Q30/0246 , G06Q30/0253 , G06Q30/0256 , G06Q30/0277
摘要: Systems and methods of evaluating information in a computer network environment are provided. A data processing system can obtain or receive a content placement criterion, such as a keyword, associated with a content item and can determine a quality metric of the content placement criterion. The data processing system can identify a candidate content placement criterion and expand placement criteria associated with the content item to include the content placement criterion and the candidate content placement criterion based at least in part on an evaluation of the quality metric of the content placement criterion. The data processing system can expand placement criteria based in part on a throttling parameter. The data processing system can identify a correlation between a document and the placement criteria to identify appropriate content items for the document.
-
公开(公告)号:US20160314182A1
公开(公告)日:2016-10-27
申请号:US14414855
申请日:2014-09-18
申请人: Google, Inc.
发明人: Xincheng Zhang , Hui Tan , Zhiyu Wang , Jinan Lou
CPC分类号: G06F17/30598 , G06F17/30705 , H04L43/04
摘要: Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.
摘要翻译: 基于一个或多个分类术语和可选地基于文档的结构路径的相似性的与聚类文档相关的方法和装置。 在一些实现中,文档是诸如结构化电子邮件或其他结构化通信之类的通信。 在这些实现中的一些实现中,对通信进行聚类包括识别指示分类的多个分类项,识别包括未标记有与分类的关联的通信的通信语料库,以及基于发生的确定通信集群 集群通信中的一个或多个分类术语。
-
公开(公告)号:US10360537B1
公开(公告)日:2019-07-23
申请号:US15484933
申请日:2017-04-11
申请人: Google Inc.
发明人: Mike Bendersky , Maureen Heymans , Jinan Lou , Jie Yang , MyLinh Yang , Amitabh Saikia , Marc-Allen Cartright , Vanja Josifovski , Hui Tan , Luis Garcia Pueyo
IPC分类号: G06F17/30 , G06Q10/10 , G06F16/248 , G06F16/9535 , H04W4/029
摘要: Techniques are described herein for generating and applying event data extraction templates. In various implementations, a data extraction template may be applied to structured communications to extract, from each structured communication, event data associated with a transient markup language path indicated in the data extraction template. The data extraction template may include an event-related semantic data type assigned to the transient markup language path and a strength of association between the transient structural path and the event-related semantic data type. Feedback may be obtained concerning event data extracted from one or more of the structured communications. Based on the feedback, the strength of association between the transient markup language path and the event-related semantic data type may be altered. The data extraction template may then be applied to a subsequent structured communication to extract new event data from the structured communication based on the altered strength of association.
-
公开(公告)号:US10007717B2
公开(公告)日:2018-06-26
申请号:US14414855
申请日:2014-09-18
申请人: Google Inc.
发明人: Xincheng Zhang , Hui Tan , Zhiyu Wang , Jinan Lou
CPC分类号: G06F16/285 , G06F16/35 , H04L43/04
摘要: Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.
-
公开(公告)号:US09652530B1
公开(公告)日:2017-05-16
申请号:US14470416
申请日:2014-08-27
申请人: Google Inc.
发明人: Mike Bendersky , Maureen Heymans , Jinan Lou , Jie Yang , MyLinh Yang , Amitabh Saikia , Marc-Allen Cartright , Vanja Josifovski , Hui Tan , Luis Garcia Pueyo
IPC分类号: G06F17/30
CPC分类号: G06F17/30705 , G06F17/30923
摘要: Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be applied to the communications of the corpus. A determination may be made, based on the applying, that the communications of the corpus are event-related. An event data type may be assigned to the transient structural path based on the applying. An event data extraction template may be generated to extract, from one or more subsequent communications, one or more event-related segments of text associated with the transient structural path.
-
公开(公告)号:US10540610B1
公开(公告)日:2020-01-21
申请号:US15139807
申请日:2016-04-27
申请人: Google Inc.
发明人: Jie Yang , Amr Ahmed , Luis Garcia Pueyo , Mike Bendersky , Amitabh Saikia , Marc-Allen Cartright , Marc Alexander Najork , MyLinh Yang , Hui Tan , Weinan Zhang , Vanja Josifovski , Alexander J. Smola
摘要: Methods, apparatus, and computer-readable media are provided for analyzing a cluster of communications, such as B2C emails, to generate a template for the cluster that defines transient segments and fixed segments of the cluster of communications. More particularly, methods, apparatus, and computer-readable media are provided for generating and/or applying a trained structured machine learning model for a generated template that can be used to determine, for one or more transient segments of subsequent communications, a corresponding probability that a given semantic label is the correct semantic label for extracted content of the transient segment(s).
-
-
-
-
-