Techniques for automatically identifying salient entities in documents

    公开(公告)号:US09619457B1

    公开(公告)日:2017-04-11

    申请号:US14332996

    申请日:2014-07-16

    Applicant: Google Inc.

    Abstract: A computer-implemented technique can include obtaining a training corpus including pairs of (i) documents and (ii) corresponding abstracts. The technique can include identifying a set of entity mentions in each abstract and each corresponding document based on their respective part-of-speech (POS) tags and dependency parses. The technique can include clustering the sets of entity mentions referring to a same underlying entity to obtain clusters for each document and each corresponding abstract. The technique can include aligning specific abstract entity mentions to corresponding document entity mentions to obtain a set of aligned abstract and document entities. The technique can include labeling the set of aligned entities as salient and unaligned entities as non-salient to generate a labeled corpus. The technique can also include training features of a classifier using the labeled corpus to obtain a trained classifier.

    Automatic annotation for training and evaluation of semantic analysis engines
    13.
    发明授权
    Automatic annotation for training and evaluation of semantic analysis engines 有权
    自动注释用于语义分析引擎的训练和评估

    公开(公告)号:US09224103B1

    公开(公告)日:2015-12-29

    申请号:US13801197

    申请日:2013-03-13

    Applicant: Google Inc.

    CPC classification number: G06N99/005

    Abstract: Implementations include systems and methods generate data for training or evaluating semantic analysis engines. For example, a method may include receiving documents from a corpus that includes an authoritative set of documents from an authoritative source. Each document in the authoritative set may be associated with an entity. A second set of documents from the corpus that do not overlap with the first set may include at least one link to a document in the authoritative set, the at least one link being associated with anchor text. For each document in the second set, the method may include identifying entity mentions in the document based on the anchor text. The method may include associating the entity mention with the entity in a graph-structured knowledge base or associating entity types with the entity mention. The method may also include training a semantic analysis engine using the identified entity mentions and associations.

    Abstract translation: 实现包括系统和方法生成用于训练或评估语义分析引擎的数据。 例如,一种方法可以包括从语料库接收包括来自权威来源的权威的一组文件的文档。 权威集中的每个文档可能与一个实体相关联。 来自语料库的与第一组不重叠的第二组文档可以包括至少一个链接到权威集合中的文档,该至少一个链接与锚文本相关联。 对于第二组中的每个文档,该方法可以包括基于锚文本识别文档中的实体提及。 该方法可以包括将实体提及与图形结构化知识库中的实体相关联或将实体类型与实体提及相关联。 该方法还可以包括使用所识别的实体提及和关联来训练语义分析引擎。

    Querying a data graph using natural language queries

    公开(公告)号:US10810193B1

    公开(公告)日:2020-10-20

    申请号:US13801598

    申请日:2013-03-13

    Applicant: Google Inc.

    Abstract: Implementations include systems and methods for querying a data graph. An example method includes receiving a machine learning module trained to produce a model with multiple features for a query, each feature representing a path in a data graph. The method also includes receiving a search query that includes a first search term, mapping the search query to the query, and mapping the first search term to a first entity in the data graph. The method may also include identifying a second entity in the data graph using the first entity and at least one of the multiple weighted features, and providing information relating to the second entity in a response to the search query. Some implementations may also include training the machine learning module by, for example, generating positive and negative training examples from an answer to a query.

    Information extraction from question and answer websites

    公开(公告)号:US09875296B2

    公开(公告)日:2018-01-23

    申请号:US14667792

    申请日:2015-03-25

    Applicant: Google Inc.

    CPC classification number: G06F17/3064 G06F17/2705 G06F17/2785

    Abstract: Methods, systems, and apparatus for obtaining a resource, identifying a first portion of text of the resource that is characterized as a question, and a second part of text of the resource that is characterized as an answer to the question, identifying an entity that is referenced by one or more terms of the text that is characterized as the question, a relationship type that is referenced by one or more other terms of the text that is characterized as the question, and an entity that is referenced by the text that is characterized as the answer to the question, and adjusting a score for a relationship of the relationship type for the entity that is referenced by the one or more terms of the text that is characterized as the question and the entity that is referenced by the text that is characterized as the answer to the question.

    ASSOCIATING A SEGMENT OF AN ELECTRONIC MESSAGE WITH ONE OR MORE SEGMENT ADDRESSEES
    18.
    发明申请
    ASSOCIATING A SEGMENT OF AN ELECTRONIC MESSAGE WITH ONE OR MORE SEGMENT ADDRESSEES 审中-公开
    与一个或多个分部地址相关联的电子消息分段

    公开(公告)号:US20170070469A1

    公开(公告)日:2017-03-09

    申请号:US15331456

    申请日:2016-10-21

    Applicant: Google Inc.

    Abstract: Methods and apparatus related to associating a segment of an electronic message with one or more segment addressees. One or more message addressees of an electronic message may be identified, the one or more message addressees identifying at least one recipient of the electronic message. A segment of the electronic message may be identified via one or more processors. One or more segment addressees may be determined from the at least one recipient, the one or more segment addressees identifying an addressee for the identified segment. One or more aspects of the segment may be associated with the one or more segment addressees. An indication pertaining to the one or more aspects of the segment may be provided to the one or more segment addressees.

    Abstract translation: 将电子消息的段与一个或多个段接收者相关联的方法和装置。 可以识别电子消息的一个或多个消息收件人,所述一个或多个消息收件人识别电子消息的至少一个接收者。 可以经由一个或多个处理器来识别电子消息的段。 可以从至少一个接收者确定一个或多个段接收者,所述一个或多个段接收者标识所识别的段的接收者。 该段的一个或多个方面可以与一个或多个段接收者相关联。 可以向一个或多个段接收者提供关于该段的一个或多个方面的指示。

    Techniques for automatic photo album generation
    19.
    发明授权
    Techniques for automatic photo album generation 有权
    自动相册生成技术

    公开(公告)号:US08983193B1

    公开(公告)日:2015-03-17

    申请号:US13628735

    申请日:2012-09-27

    Applicant: Google Inc.

    Abstract: A computer-implemented technique can receive, at a computing device including one or more processors, a plurality of photos. The technique can extract quality features and similarity features for each of the plurality of photos and can obtain weights for the various quality features and similarity features based on an analysis of a reference photo collection. The technique can generate a quality metric for each of the plurality of photos and can generate a similarity matrix for the plurality of photos by analyzing the various quality features and similarity features and using the obtained weights. The technique can perform joint global maximization of photo quality and photo diversity using the quality metrics and the similarity matrix in order to select a subset of the plurality of photos having a high degree of representativeness. The technique can then store the subset of the plurality of photos in a memory.

    Abstract translation: 计算机实现的技术可以在包括一个或多个处理器的计算设备处接收多个照片。 该技术可以为多张照片中的每一张照片提取质量特征和相似性特征,并且可以基于参考照片集合的分析来获得各种质量特征和相似性特征的权重。 该技术可以为多个照片中的每一个生成质量度量,并且可以通过分析各种质量特征和相似性特征并使用获得的权重来生成多张照片的相似性矩阵。 该技术可以使用质量度量和相似性矩阵来执行照片质量和照片分集的联合全局最大化,以便选择具有高度代表性的多张照片的子集。 该技术然后可以将多个照片的子集存储在存储器中。

    TECHNIQUES FOR USER CUSTOMIZATION IN A PHOTO MANAGEMENT SYSTEM
    20.
    发明申请
    TECHNIQUES FOR USER CUSTOMIZATION IN A PHOTO MANAGEMENT SYSTEM 有权
    用于照片管理系统中用户自定义的技术

    公开(公告)号:US20150074574A1

    公开(公告)日:2015-03-12

    申请号:US14547078

    申请日:2014-11-18

    Applicant: Google Inc.

    Abstract: A computer-implemented technique can receive a plurality of photos and automatically select a subset of the plurality of photos having a high degree of representativeness by jointly maximizing both photo quality and photo diversity to obtain a photo album. The technique can determine one or more clusters for the photo album using a hierarchical clustering algorithm, and store the photo album according to the one or more clusters. The technique can control the manner in which the photo album is displayed using the one or more clusters. The technique can adjust at least one of the one or more clusters and the automatic photo album generation based on user input. The user input can include at least one of adding, deleting, and moving a photo with respect to the one or more clusters. The technique can then re-cluster, automatically generate a new photo album, and/or adjust the presentation.

    Abstract translation: 计算机实现的技术可以接收多张照片,并通过联合最大化照片质量和照片分集来获得相册,自动选择具有高度代表性的多张照片的子集。 该技术可以使用分层聚类算法确定相册的一个或多个聚类,并根据一个或多个聚类存储相册。 该技术可以使用一个或多个聚类来控制相册的显示方式。 该技术可以基于用户输入来调整一个或多个聚类中的至少一个和自动相册的生成。 用户输入可以包括关于一个或多个聚类的添加,删除和移动照片中的至少一个。 该技术可以重新聚集,自动生成新的相册和/或调整演示文稿。

Patent Agency Ranking