-
公开(公告)号:US08924436B1
公开(公告)日:2014-12-30
申请号:US13854858
申请日:2013-04-01
Applicant: Google Inc.
Inventor: Vinicius J. Fortuna , Andriy Bihun , Leonardo A. Laroco, Jr. , Daniel Loreto , Elena Erbiceanu , Jeffrey C. Reynar , Andrew William Hogue , Ankur Bhargava
CPC classification number: G06F17/2247 , G06F17/30864
Abstract: Methods, systems, and apparatus, including computer programs stored on computer storage media, for populating a structured presentation with new values. One aspect can be embodied in machine-implemented methods that include the actions of obtaining a plurality of instances and a plurality of attributes; for each instance: identifying one or more documents from an unstructured document collection that are relevant to the instance, where each of the one or more documents include at least a value for an attribute in the plurality of attributes; and establishing a subset of the one or more values as characterizing the instance; and adding each instance, the respective attributes, and the respective subset of values to a structured data collection.
Abstract translation: 方法,系统和装置,包括存储在计算机存储介质上的计算机程序,用于以新值填充结构化表示。 一个方面可以体现在包括获取多个实例和多个属性的动作的机器实现的方法中; 对于每个实例:从非结构化文档集合识别与所述实例相关的一个或多个文档,其中所述一个或多个文档中的每一个至少包括所述多个属性中的属性的值; 以及建立所述一个或多个值的子集来表征所述实例; 并将每个实例,各自的属性和相应的值子集添加到结构化数据集合中。
-
公开(公告)号:US20140379743A1
公开(公告)日:2014-12-25
申请号:US14457869
申请日:2014-08-12
Applicant: Google Inc.
Inventor: Leonardo A. Laroco, Jr. , Nikola Jevtic , Nikolai V. Yakovenko , Jeffrey Reynar
IPC: G06F17/30
CPC classification number: G06F16/93 , G06F16/955 , G06N5/04 , G06N20/00
Abstract: A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.
-
公开(公告)号:US09760570B2
公开(公告)日:2017-09-12
申请号:US14300148
申请日:2014-06-09
Applicant: GOOGLE INC
Inventor: Leonardo A. Laroco, Jr. , Nikola Jevtic , Nikolai V. Yakovenko , Jeffrey Reynar
CPC classification number: G06F17/30011 , G06F17/30876 , G06N5/04 , G06N99/005
Abstract: A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.
-
-