-
公开(公告)号:US20120323921A1
公开(公告)日:2012-12-20
申请号:US13160532
申请日:2011-06-15
申请人: Zhimin Chen , Eduardo Laureano , Renfei Luo , Tsheko Mutungu , Vivek Narasayya , David Talby
发明人: Zhimin Chen , Eduardo Laureano , Renfei Luo , Tsheko Mutungu , Vivek Narasayya , David Talby
IPC分类号: G06F17/30
CPC分类号: G06F17/30616
摘要: A plurality of items included in a catalog may be obtained, each item associated with an item category. Brand indicators may be obtained, each brand indicator associated with the item category. Brand indicators associated with each of the items may be determined, and the each item may be assigned to a partition group associated with the brand indicator that is associated with the each item. Correlated string tokens that are correlated, greater than a predetermined correlation threshold value, with the brand indicator associated with the partition group that is associated with the each one of the items, the correlated string tokens associated with the each one of the plurality of items, may be determined. A dictionary hierarchy may be generated based on the one or more correlated string tokens.
摘要翻译: 可以获得包括在目录中的多个项目,每个项目与项目类别相关联。 可以获得品牌指标,每个品牌指标与项目类别相关联。 可以确定与每个项目相关联的品牌指示符,并且可以将每个项目分配给与与每个项目相关联的品牌指示符相关联的分区组。 与相关联的字符串令牌,大于预定的相关阈值,与与与每个项目相关联的分区组相关联的品牌指示符,与多个项目中的每一个相关联的相关联的字符串令牌, 可以确定。 可以基于一个或多个相关串令牌来生成词典层次。
-
公开(公告)号:US07720883B2
公开(公告)日:2010-05-18
申请号:US11769050
申请日:2007-06-27
CPC分类号: G06F17/30536
摘要: Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.
摘要翻译: 提供采用关键轮廓计算和数据模式轮廓计算的数据轮廓计算技术的架构。 数据表中的关键轮廓计算包括精密键和近似键,并且基于关键优点。 100%的关键优势是一个确切的关键,其中一个关键的任何其他百分比。 基于具有重复的属性值的表行的数量来估计关键强度。 只返回超过阈值的列集。 模式分析标识一组最佳描述一组给定属性值中的模式的正则表达式模式。 模式分析包括三个阶段:用于确定令牌正则表达式的第一阶段,用于确定候选正则表达式的第二阶段,以及用于识别与属性值匹配的候选的最佳正则表达式的第三阶段。
-
公开(公告)号:US20090006392A1
公开(公告)日:2009-01-01
申请号:US11769050
申请日:2007-06-27
CPC分类号: G06F17/30536
摘要: Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.
摘要翻译: 提供采用关键轮廓计算和数据模式轮廓计算的数据轮廓计算技术的架构。 数据表中的关键轮廓计算包括精密键和近似键,并且基于关键优点。 100%的关键优势是一个确切的关键,其中一个关键的任何其他百分比。 基于具有重复的属性值的表行的数量来估计关键强度。 只返回超过阈值的列集。 模式分析标识一组最佳描述一组给定属性值中的模式的正则表达式模式。 模式分析包括三个阶段:用于确定令牌正则表达式的第一阶段,用于确定候选正则表达式的第二阶段,以及用于识别与属性值匹配的候选的最佳正则表达式的第三阶段。
-
公开(公告)号:US20060253422A1
公开(公告)日:2006-11-09
申请号:US11124516
申请日:2005-05-06
申请人: Vivek Narasayya , Zhimin Chen
发明人: Vivek Narasayya , Zhimin Chen
IPC分类号: G06F17/30
CPC分类号: G06F16/24535
摘要: Systems and methodologies for computation of multiple group by queries via an optimizer that examines the space of plans in a systematic and cost based manner. The optimizer includes a merging component to merge pairs of sub plans to facilitate a plan choice with a lowest cost. The merging component can take as input two sub plans (e.g., sub plan P1 with root node V1 and sub plan P2 with root node V2, wherein each sub plan is a sub-tree of a logical plan whose root node is directly pointed to a Relation “R”), to return a set of sub-plans as out put with a root node V1∪V2 that is the smallest relation from which both V1 and V2 can be computed.
摘要翻译: 用于通过查询计算多组的系统和方法,该优化器以系统和成本为基础的方式检查计划的空间。 优化器包括合并组件以合并子计划对,以便以最低成本进行计划选择。 合并组件可以将根节点V <1>和子计划P <2> SUB>的子计划(例如,子计划P&lt; 1&lt; 1&gt; 节点V 2,其中每个子计划是逻辑计划的子树,其根节点直接指向关系“R”),以返回一组子计划,如与 作为V SUB 1和V 2 2两者之间的最小关系的根节点V 1 2 V 2 2&lt; 1&lt; 1&lt; 计算。
-
公开(公告)号:US09547718B2
公开(公告)日:2017-01-17
申请号:US13325072
申请日:2011-12-14
申请人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
发明人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06Q30/0201
摘要: A set expansion system is described herein that improves precision, recall, and performance of prior set expansion methods for large sets of data. The system maintains high precision and recall by 1) identifying the qualify of particular lists and applying that quality through a weight, 2) allowing for the specification or negative examples in a set of seeds to reduce the introduction of bad entities into the set, and 3) applying a cutoff to eliminate lists that include a low number of positive matches. The system may perform multiple passes to first generate a good candidate result set and then refine the set to find a set with highest quality. The system may also apply Map Reduce or other distributed processing techniques to allow calculation in parallel. Thus, the system efficiently expands large concept sets from a potentially small set of initial seeds from readily available web data.
摘要翻译: 本文描述了一种扩展系统,可提高大型数据集的先前设置扩展方法的精度,调用和性能。 该系统通过1)确定特定列表的资格并通过权重来应用该质量,保持高精度和召回; 2)允许一组种子中的规范或否定示例,以减少将不良实体引入到集合中; 3)应用截止值来消除包括少量正匹配的列表。 系统可以执行多次通过以首先产生良好的候选结果集合,然后对该集合进行优化以找到具有最高质量的集合。 该系统还可以应用Map Reduce或其他分布式处理技术来并行计算。 因此,系统从容易获得的网络数据的一小部分初始种子中有效地扩展了大概念集。
-
公开(公告)号:US08606788B2
公开(公告)日:2013-12-10
申请号:US13160532
申请日:2011-06-15
申请人: Zhimin Chen , Eduardo Laureano , Renfei Luo , Tsheko Mutungu , Vivek Narasayya , David Talby
发明人: Zhimin Chen , Eduardo Laureano , Renfei Luo , Tsheko Mutungu , Vivek Narasayya , David Talby
IPC分类号: G06F17/30
CPC分类号: G06F17/30616
摘要: A plurality of items included in a catalog may be obtained, each item associated with an item category. Brand indicators may be obtained, each brand indicator associated with the item category. Brand indicators associated with each of the items may be determined, and the each item may be assigned to a partition group associated with the brand indicator that is associated with the each item. Correlated string tokens that are correlated, greater than a predetermined correlation threshold value, with the brand indicator associated with the partition group that is associated with the each one of the items, the correlated string tokens associated with the each one of the plurality of items, may be determined. A dictionary hierarchy may be generated based on the one or more correlated string tokens.
摘要翻译: 可以获得包括在目录中的多个项目,每个项目与项目类别相关联。 可以获得品牌指标,每个品牌指标与项目类别相关联。 可以确定与每个项目相关联的品牌指示符,并且可以将每个项目分配给与与每个项目相关联的品牌指示符相关联的分区组。 与相关联的字符串令牌,大于预定的相关阈值,与与与每个项目相关联的分区组相关联的品牌指示符,与多个项目中的每一个相关联的相关联的字符串令牌, 可以确定。 可以基于一个或多个相关串令牌来生成词典层次。
-
公开(公告)号:US20130159317A1
公开(公告)日:2013-06-20
申请号:US13325072
申请日:2011-12-14
申请人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
发明人: Jiewen Huang , Zhimin Chen , Arvind Arasu , Vivek Narasayya
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06Q30/0201
摘要: A set expansion system is described herein that improves precision, recall, and performance of prior set expansion methods for large sets of data. The system maintains high precision and recall by 1) identifying the qualify of particular lists and applying that quality through a weight, 2) allowing for the specification or negative examples in a set of seeds to reduce the introduction of bad entities into the set, and 3) applying a cutoff to eliminate lists that include a low number of positive matches. The system may perform multiple passes to first generate a good candidate result set and then refine the set to find a set with highest quality. The system may also apply Map Reduce or other distributed processing techniques to allow calculation in parallel. Thus, the system efficiently expands large concept sets from a potentially small set of initial seeds from readily available web data.
摘要翻译: 本文描述了一种扩展系统,可提高大型数据集的先前设置扩展方法的精度,调用和性能。 该系统通过1)确定特定列表的资格并通过权重来应用该质量,保持高精度和召回; 2)允许一组种子中的规范或否定示例,以减少将不良实体引入到集合中; 3)应用截止值来消除包括少量正匹配的列表。 系统可以执行多次通过以首先产生良好的候选结果集合,然后对该集合进行优化以找到具有最高质量的集合。 该系统还可以应用Map Reduce或其他分布式处理技术来并行计算。 因此,系统从容易获得的网络数据的一小部分初始种子中有效地扩展了大概念集。
-
公开(公告)号:US08332388B2
公开(公告)日:2012-12-11
申请号:US12818237
申请日:2010-06-18
CPC分类号: G06F17/30463
摘要: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.
摘要翻译: 描述技术用于查询优化器的转换规则剖析。 该方法可以包括获得配置为由数据库系统的查询优化器优化的数据库查询。 可以使用主机转换规则集查找数据库查询的优化查询计划。 一次可以删除和检查一个转换规则。 可以检查每个变换规则以确定变换规则是否影响最优查询计划输出。 每个转换规则已被删除后,可以生成测试查询计划。 在没有删除的转换规则的情况下,查询优化器可以确定测试查询计划是否与优化的查询计划不同。 可以创建一组等效的转换规则,其中包括转换规则,其中从等效转换规则集生成的测试查询计划等同于优化的计划。
-
公开(公告)号:US20110314000A1
公开(公告)日:2011-12-22
申请号:US12818237
申请日:2010-06-18
IPC分类号: G06F17/30
CPC分类号: G06F17/30463
摘要: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.
摘要翻译: 描述技术用于查询优化器的转换规则剖析。 该方法可以包括获得配置为由数据库系统的查询优化器优化的数据库查询。 可以使用主机转换规则集查找数据库查询的优化查询计划。 一次可以删除和检查一个转换规则。 可以检查每个变换规则以确定变换规则是否影响最优查询计划输出。 每个转换规则已被删除后,可以生成测试查询计划。 在没有删除的转换规则的情况下,查询优化器可以确定测试查询计划是否与优化的查询计划不同。 可以创建一组等效的转换规则,其中包括转换规则,其中从等效转换规则集生成的测试查询计划等同于优化的计划。
-
公开(公告)号:US20100235347A1
公开(公告)日:2010-09-16
申请号:US12404284
申请日:2009-03-14
IPC分类号: G06F17/30
CPC分类号: G06F17/30463
摘要: An exact cardinality query optimization system and method for optimizing a query having a plurality of expressions to obtain a cardinality-optimal query execution plan for the query. Embodiments of the system and method use various techniques to shorten the time necessary to obtain the cardinality-optimal query execution plan, which contains the query execution plan when all cardinalities are exact. Embodiments of the system and method include a covering queries technique that leverages query execution feedback to obtain an unordered subset of relevant expressions for the query, an early termination technique that bounds the cardinality to determine whether the processing can be terminate before each of the expressions are executed, and an expressions ordering technique that finds an ordering of expressions that yields the greatest reduction in time to obtain the cardinality-optimal query execution plan.
摘要翻译: 一种精确的基数查询优化系统和方法,用于优化具有多个表达式的查询,以获得查询的基数最优查询执行计划。 系统和方法的实施例使用各种技术来缩短获得基数优化查询执行计划所需的时间,当所有基数是精确的时,其包含查询执行计划。 该系统和方法的实施例包括利用查询执行反馈来获取查询的相关表达式的无序子集的覆盖查询技术,限制基数以确定处理是否可以在每个表达式之前终止的提前终止技术是 以及表达式排序技术,其找到产生最大时间缩短以获得基数优化查询执行计划的表达式的排序。
-
-
-
-
-
-
-
-
-