CLUSTERING COMMUNICATIONS BASED ON CLASSIFICATION
    1.
    发明申请
    CLUSTERING COMMUNICATIONS BASED ON CLASSIFICATION 有权
    基于分类的聚类通信

    公开(公告)号:US20160314182A1

    公开(公告)日:2016-10-27

    申请号:US14414855

    申请日:2014-09-18

    申请人: Google, Inc.

    IPC分类号: G06F17/30 H04L12/26

    摘要: Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.

    摘要翻译: 基于一个或多个分类术语和可选地基于文档的结构路径的相似性的与聚类文档相关的方法和装置。 在一些实现中,文档是诸如结构化电子邮件或其他结构化通信之类的通信。 在这些实现中的一些实现中,对通信进行聚类包括识别指示分类的多个分类项,识别包括未标记有与分类的关联的通信的通信语料库,以及基于发生的确定通信集群 集群通信中的一个或多个分类术语。

    GENERATING DATA RECORDS BASED ON PARSING
    2.
    发明申请
    GENERATING DATA RECORDS BASED ON PARSING 审中-公开
    基于PARSING生成数据记录

    公开(公告)号:US20140279864A1

    公开(公告)日:2014-09-18

    申请号:US14143835

    申请日:2013-12-30

    申请人: Google Inc.

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2705 G06F16/258

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a first document, the first document being associated with a user, executing a plurality of parsers, each parser of the plurality of parsers processing the first document to provide one or more first data values, merging the one or more first data values provided from the plurality of parsers to populate a data record having one or more data fields, the data record being specific to the user, and storing the data record in computer-readable memory.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的用于接收第一文档的计算机程序,第一文档与用户相关联,执行多个解析器,多个解析器的每个解析器处理第一文档到 提供一个或多个第一数据值,合并从多个解析器提供的一个或多个第一数据值,以填充具有一个或多个数据字段的数据记录,该数据记录是用户特有的,并将数据记录存储在计算机中 可读内存

    Template-based structured document classification and extraction

    公开(公告)号:US10657158B2

    公开(公告)日:2020-05-19

    申请号:US15360939

    申请日:2016-11-23

    申请人: Google Inc.

    摘要: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.

    Query-based stream
    5.
    发明授权

    公开(公告)号:US09600543B1

    公开(公告)日:2017-03-21

    申请号:US14040466

    申请日:2013-09-27

    申请人: Google Inc.

    IPC分类号: G06F17/30 G06Q50/10

    摘要: In one aspect, a method includes receiving an indication of a request from a user to view a stream associated with the user, generating a request for one or more items visible to the user for display within the stream, the request including a search query identifying search criteria including one or more tokens, the one or more tokens including at least a user token identifying the user, receiving one or more items in response to the request, the one or more items including at least one of the one or more tokens and further being visible to the user and providing the one or more items for display to the user within the stream in response to the request. Other aspects can be embodied in corresponding systems and apparatus, including computer program products.

    Generating and applying event data extraction templates

    公开(公告)号:US10360537B1

    公开(公告)日:2019-07-23

    申请号:US15484933

    申请日:2017-04-11

    申请人: Google Inc.

    摘要: Techniques are described herein for generating and applying event data extraction templates. In various implementations, a data extraction template may be applied to structured communications to extract, from each structured communication, event data associated with a transient markup language path indicated in the data extraction template. The data extraction template may include an event-related semantic data type assigned to the transient markup language path and a strength of association between the transient structural path and the event-related semantic data type. Feedback may be obtained concerning event data extracted from one or more of the structured communications. Based on the feedback, the strength of association between the transient markup language path and the event-related semantic data type may be altered. The data extraction template may then be applied to a subsequent structured communication to extract new event data from the structured communication based on the altered strength of association.

    Clustering communications based on classification

    公开(公告)号:US10007717B2

    公开(公告)日:2018-06-26

    申请号:US14414855

    申请日:2014-09-18

    申请人: Google Inc.

    IPC分类号: G06F17/30 H04L12/26

    摘要: Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.

    TEMPLATE-BASED STRUCTURED DOCUMENT CLASSIFICATION AND EXTRACTION

    公开(公告)号:US20180144042A1

    公开(公告)日:2018-05-24

    申请号:US15360939

    申请日:2016-11-23

    申请人: Google Inc.

    IPC分类号: G06F17/30 G06F17/24 G06N99/00

    摘要: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.

    Generating and applying event data extraction templates

    公开(公告)号:US09652530B1

    公开(公告)日:2017-05-16

    申请号:US14470416

    申请日:2014-08-27

    申请人: Google Inc.

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705 G06F17/30923

    摘要: Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be applied to the communications of the corpus. A determination may be made, based on the applying, that the communications of the corpus are event-related. An event data type may be assigned to the transient structural path based on the applying. An event data extraction template may be generated to extract, from one or more subsequent communications, one or more event-related segments of text associated with the transient structural path.