-
1.
公开(公告)号:US20100011030A1
公开(公告)日:2010-01-14
申请号:US12557804
申请日:2009-09-11
申请人: Lipyecw Lim , George Andrei Mihaila , Min Wang
发明人: Lipyecw Lim , George Andrei Mihaila , Min Wang
CPC分类号: G06F17/30536 , G06F17/30442 , G06F17/30935 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99944 , Y10S707/99953
摘要: Disclosed are a system, method, and computer readable medium for collecting statistics associated with data in a database. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory as determined is allocated for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated.
摘要翻译: 公开了用于收集与数据库中的数据相关联的统计信息的系统,方法和计算机可读介质。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配所确定的内存量用于收集所定义数据类型的数据的统计信息。 在通过数据库的单次传递中以及已分配的内存量内,对定义的数据类型的数据执行统计信息收集。
-
2.
公开(公告)号:US07472108B2
公开(公告)日:2008-12-30
申请号:US11435353
申请日:2006-05-16
申请人: Lipyeow Lim , George Andrei Mihaila , Min Wang
发明人: Lipyeow Lim , George Andrei Mihaila , Min Wang
CPC分类号: G06F17/30442 , G06F17/30306 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99944 , Y10S707/99953
摘要: A method for collecting statistics associated with data in a database are disclosed. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory is allocated as determined for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated. The performing includes at least determining a total number of instances of at least one path-identifier associated with a given value within a given set of documents.
摘要翻译: 公开了一种用于收集与数据库中的数据相关联的统计信息的方法。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配的内存量被确定为收集定义的数据类型的数据的统计信息。 在通过数据库的单次传递中以及已经分配的内存量中,对定义的数据类型的数据执行统计信息收集。 执行包括至少确定与给定文档集合内的给定值相关联的至少一个路径标识符的实例的总数。
-
3.
公开(公告)号:US20100161930A1
公开(公告)日:2010-06-24
申请号:US12341309
申请日:2008-12-22
申请人: LIPYEOW LIM , George Andrei Mihaila , Min Wang
发明人: LIPYEOW LIM , George Andrei Mihaila , Min Wang
IPC分类号: G06F12/02
CPC分类号: G06F17/30935 , G06F17/30911
摘要: A method, system, and computer readable medium for collecting statistics associated with data in a database are disclosed. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory is allocated as determined for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated. The performing includes at least determining a total number of instances of at least one path-identifier associated with a given value within a given set of documents.
摘要翻译: 公开了一种用于收集与数据库中的数据相关联的统计信息的方法,系统和计算机可读介质。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配的内存量被确定为收集定义的数据类型的数据的统计信息。 在通过数据库的单次传递中以及已经分配的内存量中,对定义的数据类型的数据执行统计信息收集。 执行包括至少确定与给定文档集合内的给定值相关联的至少一个路径标识符的实例的总数。
-
4.
公开(公告)号:US07613682B2
公开(公告)日:2009-11-03
申请号:US11435017
申请日:2006-05-16
申请人: Lipyeow Lim , George Andrei Mihaila , Min Wang
发明人: Lipyeow Lim , George Andrei Mihaila , Min Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30536 , G06F17/30442 , G06F17/30935 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99944 , Y10S707/99953
摘要: Disclosed are a method for collecting statistics associated with data in a database. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory as determined is allocated for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated.
摘要翻译: 公开了一种用于收集与数据库中的数据相关联的统计信息的方法。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配所确定的内存量用于收集所定义数据类型的数据的统计信息。 在通过数据库的单次传递中以及已分配的内存量内,对定义的数据类型的数据执行统计信息收集。
-
公开(公告)号:US08489645B2
公开(公告)日:2013-07-16
申请号:US10950800
申请日:2004-09-27
申请人: George Andrei Mihaila , Min Wang
发明人: George Andrei Mihaila , Min Wang
IPC分类号: G06F7/00
CPC分类号: G06F17/30539 , G06Q10/10 , G06Q30/02
摘要: Techniques for estimating items (e.g., data item or objects) frequencies in large data sets are disclosed. For example, a technique for determining items and their frequencies at multiple levels of interest in a collection of nested bags includes the following steps. A hierarchy of a plurality of levels of nested bags and the levels of interest are inputted. Among the plurality of levels, a subset of bags is sampled from at least one level. At each level of interest, the frequency is counted of each distinct item in the bags obtained in the sampling step. At each level of interest, the item frequencies obtained in the counting step are extrapolated based on sampling ratios associated with the sampling step. At each level of interest, the items are sorted according to their frequencies obtained from the extrapolating step and those items with highest frequencies are retained. A bag may refer to one or more subsets or groups of data items or objects. Also, a bag may, itself, contain one or more other bags.
摘要翻译: 公开了用于估计大数据集中的项目(例如,数据项或对象)频率的技术。 例如,用于在嵌套袋集合中确定多个兴趣等级的项目及其频率的技术包括以下步骤。 输入多个级别的嵌套袋和感兴趣的级别的层次结构。 在多个级别中,从至少一个级别采样袋子集。 在每个感兴趣的级别,对采样步骤中获得的行李中的每个不同项目的频率进行计数。 在感兴趣的每个级别,基于与采样步骤相关联的采样比,在计数步骤中获得的项目频率被外推。 在感兴趣的每个级别,根据从外推步骤获得的频率对项目进行排序,并保留具有最高频率的项目。 袋可以指一个或多个子集或数据项或对象组。 此外,袋子本身可以包含一个或多个其他袋子。
-
6.
公开(公告)号:US20070271218A1
公开(公告)日:2007-11-22
申请号:US11435353
申请日:2006-05-16
申请人: Lipyeow Lim , George Andrei Mihaila , Min Wang
发明人: Lipyeow Lim , George Andrei Mihaila , Min Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30442 , G06F17/30306 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99944 , Y10S707/99953
摘要: A method, system, and computer readable medium for collecting statistics associated with data in a database are disclosed. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory is allocated as determined for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated. The performing includes at least determining a total number of instances of at least one path-identifier associated with a given value within a given set of documents.
摘要翻译: 公开了一种用于收集与数据库中的数据相关联的统计信息的方法,系统和计算机可读介质。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配的内存量被确定为收集定义的数据类型的数据的统计信息。 在通过数据库的单次传递中以及已经分配的内存量中,对定义的数据类型的数据执行统计信息收集。 执行包括至少确定与给定文档集合内的给定值相关联的至少一个路径标识符的实例的总数。
-
7.
公开(公告)号:US20070271217A1
公开(公告)日:2007-11-22
申请号:US11435017
申请日:2006-05-16
申请人: Lipyeow Lim , George Andrei Mihaila , Min Wang
发明人: Lipyeow Lim , George Andrei Mihaila , Min Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30536 , G06F17/30442 , G06F17/30935 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99944 , Y10S707/99953
摘要: Disclosed are a system, method, and computer readable medium for collecting statistics associated with data in a database. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory as determined is allocated for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated.
摘要翻译: 公开了用于收集与数据库中的数据相关联的统计信息的系统,方法和计算机可读介质。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配所确定的内存量用于收集所定义数据类型的数据的统计信息。 在通过数据库的单次传递中以及已分配的内存量内,对定义的数据类型的数据执行统计信息收集。
-
8.
公开(公告)号:US09117005B2
公开(公告)日:2015-08-25
申请号:US12341309
申请日:2008-12-22
申请人: Lipyeow Lim , George Andrei Mihaila , Min Wang
发明人: Lipyeow Lim , George Andrei Mihaila , Min Wang
CPC分类号: G06F17/30935 , G06F17/30911
摘要: A method, system, and computer readable medium for collecting statistics associated with data in a database are disclosed. The computer readable medium implements the method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory is allocated as determined for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated. The performing includes at least determining a total number of instances of at least one path-identifier associated with a given value within a given set of documents.
摘要翻译: 公开了一种用于收集与数据库中的数据相关联的统计信息的方法,系统和计算机可读介质。 计算机可读介质实现该方法包括确定收集与关系数据库中定义的数据类型相关联的数据的统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配的内存量被确定为收集定义的数据类型的数据的统计信息。 在通过数据库的单次传递中以及已经分配的内存量中,对定义的数据类型的数据执行统计信息收集。 执行包括至少确定与给定文档集合内的给定值相关联的至少一个路径标识符的实例的总数。
-
9.
公开(公告)号:US08229924B2
公开(公告)日:2012-07-24
申请号:US12557804
申请日:2009-09-11
申请人: Lipyeow Lim , George Andrei Mihaila , Min Wang
发明人: Lipyeow Lim , George Andrei Mihaila , Min Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30536 , G06F17/30442 , G06F17/30935 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99944 , Y10S707/99953
摘要: Disclosed are a system, method, and computer readable medium for collecting statistics associated with data in a database. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory as determined is allocated for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated.
摘要翻译: 公开了用于收集与数据库中的数据相关联的统计信息的系统,方法和计算机可读介质。 该方法包括确定为关系数据库中与定义的数据类型相关联的数据收集统计信息所需的存储器量。 定义的数据类型基于使用具有一个或多个根到节点路径的树结构的标记语言。 分配所确定的内存量用于收集所定义数据类型的数据的统计信息。 在通过数据库的单次传递中以及已分配的内存量内,对定义的数据类型的数据执行统计信息收集。
-
-
-
-
-
-
-
-
-