-
公开(公告)号:US09547718B2
公开(公告)日:2017-01-17
申请号:US13325072
申请日:2011-12-14
申请人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
发明人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06Q30/0201
摘要: A set expansion system is described herein that improves precision, recall, and performance of prior set expansion methods for large sets of data. The system maintains high precision and recall by 1) identifying the qualify of particular lists and applying that quality through a weight, 2) allowing for the specification or negative examples in a set of seeds to reduce the introduction of bad entities into the set, and 3) applying a cutoff to eliminate lists that include a low number of positive matches. The system may perform multiple passes to first generate a good candidate result set and then refine the set to find a set with highest quality. The system may also apply Map Reduce or other distributed processing techniques to allow calculation in parallel. Thus, the system efficiently expands large concept sets from a potentially small set of initial seeds from readily available web data.
摘要翻译: 本文描述了一种扩展系统,可提高大型数据集的先前设置扩展方法的精度,调用和性能。 该系统通过1)确定特定列表的资格并通过权重来应用该质量,保持高精度和召回; 2)允许一组种子中的规范或否定示例,以减少将不良实体引入到集合中; 3)应用截止值来消除包括少量正匹配的列表。 系统可以执行多次通过以首先产生良好的候选结果集合,然后对该集合进行优化以找到具有最高质量的集合。 该系统还可以应用Map Reduce或其他分布式处理技术来并行计算。 因此,系统从容易获得的网络数据的一小部分初始种子中有效地扩展了大概念集。
-
公开(公告)号:US20130159317A1
公开(公告)日:2013-06-20
申请号:US13325072
申请日:2011-12-14
申请人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
发明人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06Q30/0201
摘要: A set expansion system is described herein that improves precision, recall, and performance of prior set expansion methods for large sets of data. The system maintains high precision and recall by 1) identifying the qualify of particular lists and applying that quality through a weight, 2) allowing for the specification or negative examples in a set of seeds to reduce the introduction of bad entities into the set, and 3) applying a cutoff to eliminate lists that include a low number of positive matches. The system may perform multiple passes to first generate a good candidate result set and then refine the set to find a set with highest quality. The system may also apply Map Reduce or other distributed processing techniques to allow calculation in parallel. Thus, the system efficiently expands large concept sets from a potentially small set of initial seeds from readily available web data.
摘要翻译: 本文描述了一种扩展系统,可提高大型数据集的先前设置扩展方法的精度,调用和性能。 该系统通过1)确定特定列表的资格并通过权重来应用该质量,保持高精度和召回; 2)允许一组种子中的规范或否定示例,以减少将不良实体引入到集合中; 3)应用截止值来消除包括少量正匹配的列表。 系统可以执行多次通过以首先产生良好的候选结果集合,然后对该集合进行优化以找到具有最高质量的集合。 该系统还可以应用Map Reduce或其他分布式处理技术来并行计算。 因此,系统从容易获得的网络数据的一小部分初始种子中有效地扩展了大概念集。
-
公开(公告)号:US08886631B2
公开(公告)日:2014-11-11
申请号:US13538336
申请日:2012-06-29
申请人: Daniel Abadi , Jiewen Huang
发明人: Daniel Abadi , Jiewen Huang
IPC分类号: G06F17/30
CPC分类号: G06F17/30498 , G06F17/30445 , G06F17/30463 , G06F17/30471 , G06F17/30545
摘要: System, method and computer program product for processing a query are disclosed. Query processing includes partitioning the stored data into a plurality of partitions based on at least one vertex in the plurality of vertexes, storing at least another triple in the plurality of triples on the at least one node, assigning, based on the triple containing the at least one vertex, at least one partition in the plurality of partitions corresponding to the triple to at least one node in the plurality of nodes, and processing, based on the assigning, the query by processing the plurality of partitions.
摘要翻译: 公开了用于处理查询的系统,方法和计算机程序产品。 查询处理包括基于多个顶点中的至少一个顶点将存储的数据划分成多个分区,在至少一个节点上存储多个三元组中的至少另一个三元组,基于包含at 至少一个顶点,所述多个分区中的至少一个分区对应于所述多个节点中的所述三个至所述至少一个节点,并且基于通过处理所述多个分区来分配所述查询来进行处理。
-
公开(公告)号:US20120310916A1
公开(公告)日:2012-12-06
申请号:US13538336
申请日:2012-06-29
申请人: Daniel Abadi , Jiewen Huang
发明人: Daniel Abadi , Jiewen Huang
IPC分类号: G06F17/30
CPC分类号: G06F17/30498 , G06F17/30445 , G06F17/30463 , G06F17/30471 , G06F17/30545
摘要: System, method and computer program product for processing a query are disclosed. Query processing includes partitioning the stored data into a plurality of partitions based on at least one vertex in the plurality of vertexes, storing at least another triple in the plurality of triples on the at least one node, assigning, based on the triple containing the at least one vertex, at least one partition in the plurality of partitions corresponding to the triple to at least one node in the plurality of nodes, and processing, based on the assigning, the query by processing the plurality of partitions.
摘要翻译: 公开了用于处理查询的系统,方法和计算机程序产品。 查询处理包括基于多个顶点中的至少一个顶点将存储的数据划分成多个分区,在至少一个节点上存储多个三元组中的至少另一个三元组,基于包含at 至少一个顶点,所述多个分区中的至少一个分区对应于所述多个节点中的所述三个至所述至少一个节点,并且基于通过处理所述多个分区来分配所述查询来进行处理。
-
-
-