-
公开(公告)号:US09558186B2
公开(公告)日:2017-01-31
申请号:US14460117
申请日:2014-08-14
Applicant: GOOGLE INC.
Inventor: Jonathan T. Betz , Shubin Zhao
CPC classification number: G06F17/30011 , G06F17/30707 , G06F17/30864 , G06F17/30914 , Y10S707/962
Abstract: A system and method for extracting facts from documents. A fact is extracted from a first document. The attribute and value of the fact extracted from the first document are used as a seed attribute-value pair. A second document containing the seed attribute-value pair is analyzed to determine a contextual pattern used in the second document. The contextual pattern is used to extract other attribute-value pairs from the second document. The extracted attributes and values are stored as facts.
Abstract translation: 一种从文件中提取事实的系统和方法。 从第一个文档中提取一个事实。 从第一个文档提取的事实的属性和值被用作种子属性 - 值对。 分析包含种子属性值对的第二文档以确定在第二文档中使用的上下文模式。 上下文模式用于从第二个文档中提取其他属性值对。 提取的属性和值作为事实存储。
-
公开(公告)号:US20140372473A1
公开(公告)日:2014-12-18
申请号:US14460117
申请日:2014-08-14
Applicant: GOOGLE INC.
Inventor: Jonathan T. Betz , Shubin Zhao
IPC: G06F17/30
CPC classification number: G06F17/30011 , G06F17/30707 , G06F17/30864 , G06F17/30914 , Y10S707/962
Abstract: A system and method for extracting facts from documents. A fact is extracted from a first document. The attribute and value of the fact extracted from the first document are used as a seed attribute-value pair. A second document containing the seed attribute-value pair is analyzed to determine a contextual pattern used in the second document. The contextual pattern is used to extract other attribute-value pairs from the second document. The extracted attributes and values are stored as facts.
Abstract translation: 一种从文件中提取事实的系统和方法。 从第一个文档中提取一个事实。 将从第一个文档提取的事实的属性和值用作种子属性 - 值对。 分析包含种子属性值对的第二文档以确定在第二文档中使用的上下文模式。 上下文模式用于从第二个文档中提取其他属性值对。 提取的属性和值作为事实存储。
-
公开(公告)号:US09785686B2
公开(公告)日:2017-10-10
申请号:US14616537
申请日:2015-02-06
Applicant: GOOGLE INC.
Inventor: Shubin Zhao , Krzysztof Czuba
CPC classification number: G06F17/30554 , G06F17/30011 , G06F17/30371 , G06F17/30395 , G06F17/30528 , G06F17/30684 , G06F17/30867 , G06N5/041
Abstract: A query is defined that has an answer formed of terms from electronic documents. A repository having facts is examined to identify attributes corresponding to terms in the query. The electronic documents are examined to find other terms that commonly appear near the query terms. Hypothetical facts representing possible answers to the query are created based on the information identified in the fact repository and the commonly-appearing terms. These hypothetical facts are corroborated using the electronic documents to determine how many documents support each fact. Additionally, contextual clues in the documents are examined to determine whether the hypothetical facts can be expanded to include additional terms. A hypothetical fact that is supported by at least a certain number of documents, and is not contained within another fact with at least the same level of support, is presented as likely correct.
-
公开(公告)号:US20140372478A1
公开(公告)日:2014-12-18
申请号:US14463393
申请日:2014-08-19
Applicant: GOOGLE INC.
Inventor: Shubin Zhao
IPC: G06F17/30
CPC classification number: G06F16/9535 , G06F16/31
Abstract: A system, method, and computer program product for learning objects and facts from documents. Embodiments of the method comprise selecting a source object and a source document and identifying a title pattern and a contextual pattern based on the source object and the source document. A set of documents matching the title pattern and the contextual pattern are selected. For each document in the selected set, a name and one or more facts are identified by applying the title pattern and the contextual pattern to the document. Objects are identified or created based on the identified names and associated with the identified facts.
Abstract translation: 一种用于从文档中学习对象和事实的系统,方法和计算机程序产品。 该方法的实施例包括:选择源对象和源文档,并基于源对象和源文档识别标题模式和上下文模式。 选择与标题模式和上下文模式匹配的一组文档。 对于所选集中的每个文档,通过将标题模式和上下文模式应用于文档来识别名称和一个或多个事实。 基于识别的名称并与所识别的事实相关联来识别或创建对象。
-
-
-