Method of controlling the degree of parallelism when performing parallel processing on an inherently serial computer program
    1.
    发明授权
    Method of controlling the degree of parallelism when performing parallel processing on an inherently serial computer program 失效
    在对固有串行计算机程序进行并行处理时控制并行程度的方法

    公开(公告)号:US06223281B1

    公开(公告)日:2001-04-24

    申请号:US08893077

    申请日:1997-07-15

    IPC分类号: G06F1100

    CPC分类号: G06F8/45

    摘要: An inherently serial program is processed in parallel, thus leading to higher processing speeds, while maintaining a close approximation to the specific result obtained through a serial running of the program. This goal has been attained based on the fact that the desired degree of closeness between a parallel result and the serial result depends on the particular inherently serial program being run and the type of analysis being performed. That is, some inherently serial processes require a “fine-tuned” result while for others a “coarser” result is acceptable. The frequency at which the parallel branches consolidate their respective results is changed accordingly to alter the degree of closeness between the parallel processed result and the serially processed result.

    摘要翻译: 并行处理固有的串行程序,从而导致更高的处理速度,同时保持与通过程序的串行运行获得的特定结果的接近。 基于以下事实已经实现了这一目标:并行结果和串行结果之间的期望程度取决于正在运行的特定固有串行程序和正在执行的分析类型。 也就是说,一些固有的串行过程需要“微调”的结果,而对于其他的,“较粗”的结果是可以接受的。 平行分支合并其各自结果的频率相应地改变以改变并行处理结果和连续处理结果之间的接近程度。

    Method and system for data mining in high dimensional data spaces
    2.
    发明授权
    Method and system for data mining in high dimensional data spaces 失效
    高维数据空间中数据挖掘的方法和系统

    公开(公告)号:US07567972B2

    公开(公告)日:2009-07-28

    申请号:US10787660

    申请日:2004-02-26

    IPC分类号: G06F7/00 G06F17/00

    摘要: A computerized method and system for analyzing a multitude of items in a high dimensional (n-dimensional) data space Dn each described by n item features. The method uses a mining function f with at least one control parameter Pi controlling the target of the data mining function. The method selects a transformation function T for reducing dimensions of the n-dimensional space by space-filling curves mapping said n-dimensional space to a m-dimensional space (m

    摘要翻译: 一种用于分析由n个项目特征描述的高维(n维)数据空间Dn中的多个项目的计算机化方法和系统。 该方法使用具有控制数据挖掘功能的目标的至少一个控制参数Pi的挖掘函数f。 该方法通过将所述n维空间映射到m维空间(m

    INPUT DATA STRUCTURE FOR DATA MINING
    3.
    发明申请
    INPUT DATA STRUCTURE FOR DATA MINING 有权
    用于数据挖掘的输入数据结构

    公开(公告)号:US20070220030A1

    公开(公告)日:2007-09-20

    申请号:US11671623

    申请日:2007-02-06

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30539 G06F2216/03

    摘要: Methods and apparatus, including computer program products, implementing and using techniques for compressing data included in several transactions. Each transaction has at least one item. A unique identifier is assigned to each different item and, if taxonomy is defined, to each different taxonomy parent. Sets of transactions are formed from the several transactions. The sets of transactions are stored using a computer data structure including: a list of identifiers of different items in the set of transactions, information indicating number of identifiers in the list, and bit field information indicating presence of the different items in the set of transactions, said bit field information being organized in accordance with the list for facilitating evaluation of patterns with respect to the set of transactions. A data structure for compressing data included in a set of transactions is also provided.

    摘要翻译: 方法和装置,包括计算机程序产品,用于压缩数据的实现和使用技术,包括在几个事务中。 每个交易至少有一个项目。 每个不同的项目分配一个唯一的标识符,如果定义了分类法,则分配给每个不同的分类标准。 交易集由几个交易组成。 使用计算机数据结构存储事务集合,包括:事务集合中不同项目的标识符列表,指示列表中的标识符数量的信息,以及指示事务集合中不同项目的存在的位字段信息 所述比特字段信息是根据列表进行组织的,以便于相对于该组事务的模式的评估。 还提供了用于压缩包括在一组事务中的数据的数据结构。

    Probabilistic data mining model comparison
    4.
    发明授权
    Probabilistic data mining model comparison 有权
    概率数据挖掘模型比较

    公开(公告)号:US08990145B2

    公开(公告)日:2015-03-24

    申请号:US13214105

    申请日:2011-08-19

    IPC分类号: G06F17/30 G06F17/18 G06K9/62

    CPC分类号: G06F17/18 G06K9/62

    摘要: A first data mining model and a second data mining model are compared. A first data mining model M1 represents results of a first data mining task on a first data set D1 and provides a set of first prediction values. A second data mining model M2 represents results of a second data mining task on a second data set D2 and provides a set of second prediction values. A relation R is determined between said sets of prediction values. For at least a first record of an input data set, a first and second probability distribution is created based on the first and second data mining models applied to the first record. A distance measure d is calculated for said first record using the first and second probability distributions and the relation. At least one region of interest is determined based on said distance measure d.

    摘要翻译: 比较了第一个数据挖掘模型和第二个数据挖掘模型。 第一数据挖掘模型M1表示第一数据集D1上的第一数据挖掘任务的结果,并提供一组第一预测值。 第二数据挖掘模型M2表示第二数据集D2上的第二数据挖掘任务的结果,并提供一组第二预测值。 在所述预测值组之间确定关系R. 对于输入数据集的至少第一记录,基于应用于第一记录的第一和第二数据挖掘模型来创建第一和第二概率分布。 使用第一和第二概率分布以及关系针对所述第一记录计算距离度量d。 基于所述距离测量d确定至少一个感兴趣区域。

    Predictive modeling
    5.
    发明授权
    Predictive modeling 有权
    预测建模

    公开(公告)号:US08738549B2

    公开(公告)日:2014-05-27

    申请号:US13214097

    申请日:2011-08-19

    IPC分类号: G06N5/00

    摘要: A predictive analysis generates a predictive model (Padj(Y|X)) based on two separate pieces of information, a set of original training data (Dorig), and a “true” distribution of indicators (Ptrue(X)). The predictive analysis begins by generating a base model distribution (Pgen(Y|X)) from the original training data set (Dorig) containing tuples (x,y) of indicators (x) and corresponding labels (y). Using the “true” distribution (Ptrue(X)) of indicators, a random data set (D′) of indicator records (x) is generated reflecting this “true” distribution (Ptrue(X)). Subsequently, the base model (Pgen(Y|X)) is applied to said random data set (D′), thus assigning a label (y) or a distribution of labels to each indicator record (x) in said random data set (D′) and generating an adjusted training set (Dadj). Finally, an adjusted predictive model (Padj(Y|X)) is trained based on said adjusted training set (Dadj).

    摘要翻译: 预测分析基于两个单独的信息,一组原始训练数据(Dorig)和“真实”指标分布(Ptrue(X))生成预测模型(Padj(Y | X))。 预测分析从包含指示符(x)和相应标签(y)的元组(x,y)的原始训练数据集(Dorig)生成基本模型分布(Pgen(Y | X))开始。 使用指示符的“真”分布(Ptrue(X)),产生反映该“真”分布(Ptrue(X))的指示符记录(x)的随机数据集(D')。 随后,将基本模型(Pgen(Y | X))应用于所述随机数据集(D'),从而将标签(y)或标签分布分配给所述随机数据集中的每个指示符记录(x) D')并生成调整训练集(Dadj)。 最后,基于所述调整训练集(Dadj)来训练调整后的预测模型(Padj(Y | X))。

    Method, system and program product for determining a time for retraining a data mining model
    6.
    发明授权
    Method, system and program product for determining a time for retraining a data mining model 失效
    用于确定再培训数据挖掘模型的时间的方法,系统和程序产品

    公开(公告)号:US07937350B2

    公开(公告)日:2011-05-03

    申请号:US11935694

    申请日:2007-11-06

    IPC分类号: G06F17/18

    CPC分类号: G06N99/005 G06F2216/03

    摘要: The invention relates to a method for determining a time for retraining a data mining model, including the steps of: calculating multivariate statistics of a training model during a training phase; storing the multivariate statistics in the data mining model; evaluating reliability of the data mining model based on the multivariate statistics and at least one distribution parameter, and deciding to retrain the data mining model based on an arbitrary measure of one or more statistical parameters including an F-test statistical analysis.

    摘要翻译: 本发明涉及一种用于确定再训练数据挖掘模型的时间的方法,包括以下步骤:在训练阶段计算训练模型的多变量统计; 将多变量统计信息存储在数据挖掘模型中; 基于多变量统计和至少一个分布参数来评估数据挖掘模型的可靠性,并且基于包括F检验统计分析的一个或多个统计参数的任意测量来决定重新训练数据挖掘模型。

    Data mining by determining patterns in input data
    7.
    发明授权
    Data mining by determining patterns in input data 失效
    通过确定输入数据中的模式进行数据挖掘

    公开(公告)号:US07882128B2

    公开(公告)日:2011-02-01

    申请号:US11671600

    申请日:2007-02-06

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30539 G06F2216/03

    摘要: Methods and apparatus, including computer program products, implementing and using techniques for pattern detection in input data containing several transactions, each transaction having at least one item. Filter conditions for interesting patterns are received, and a first set of filter conditions applicable in connection with generation of candidate patterns is determined. An evaluated candidate pattern is selected as a parent candidate pattern, and evaluation information about the parent candidate pattern is maintained. Child candidate patterns are generated by extending the parent candidate pattern and taking into account the first set of filter conditions. The child candidate patterns are evaluated with respect to the input data together in sets of similar candidate patterns and based on the evaluation information about the parent candidate pattern. At least one child candidate pattern successfully passing the evaluation step is recursively used as a parent candidate pattern.

    摘要翻译: 方法和装置,包括计算机程序产品,用于在包含多个事务的输入数据中进行模式检测的实现和使用技术,每个事务具有至少一个项目。 接收感兴趣图案的滤波条件,并且确定适用于生成候选图案的第一组滤波条件。 选择评估候选模式作为父候选模式,并且保持关于父候选模式的评估信息。 通过扩展父候选模式并考虑到第一组过滤条件来生成子候选模式。 基于相似的候选模式的集合,并且基于关于父候选模式的评估信息,相对于输入数据对子候选模式进行评估。 至少一个成功通过评估步骤的子候选模式被递归地用作父候选模式。

    System and method of transforming data for use in data analysis tools
    8.
    发明授权
    System and method of transforming data for use in data analysis tools 失效
    用于数据分析工具的数据转换系统和方法

    公开(公告)号:US08655918B2

    公开(公告)日:2014-02-18

    申请号:US11924840

    申请日:2007-10-26

    IPC分类号: G06F7/00

    CPC分类号: G06Q10/087 G06Q30/02

    摘要: A process of transforming data residing in databases, such as relational databases, into forms suitable as input to data analysis tools, such as predictive modeling tools includes the steps of defining a business process problem to be solved and identifying data requirements. For example, the business process problem may relate to predicting a customer's propensity to make purchases in the future or a store's requirements for inventory in the future. In the process, a computer implemented method is used for automatically transforming data for data analysis such as predictive modeling. Database metadata that describe database tables, their interrelationships, dimensional information, fact tables and measures are accessed. A mining transformation profile is created to encapsulate aggregations and transformation on data stored in relational databases in order to convert the data to forms suitable for predictive mining tools. The mining transformation profile specifies data transformations relative to the data base metadata. Executable data transformation codes is then generated from the database metadata and the mining transformation profile. Execution of this code results in aggregation and transformation of data residing in a database for input to a data analysis tool such as a predictive modeling tool. The data transformation code can be used by, for example, the predictive modeling tool to generate an output that provides a solution to a business process problem.

    摘要翻译: 将数据库(例如关系数据库)中驻留的数据转换为适合作为数据分析工具(例如预测建模工具)的输入的形式的过程包括以下步骤:定义要解决的业务流程问题并识别数据需求。 例如,业务流程问题可能与预测客户未来进行购买的倾向或商店对库存的需求有关。 在此过程中,使用计算机实现的方法来自动转换数据进行数据分析,如预测建模。 访问描述数据库表,它们的相互关系,维度信息,事实表和度量的数据库元数据。 创建挖掘转换配置文件以将聚合和变换封装在关系数据库中存储的数据上,以将数据转换为适合预测挖掘工具的表单。 挖掘转换配置文件指定相对于数据库元数据的数据转换。 然后从数据库元数据和挖掘转换配置文件生成可执行的数据转换代码。 执行此代码导致驻留在数据库中的数据的聚合和变换,以输入到诸如预测建模工具的数据分析工具。 数据转换代码可以由例如预测建模工具用于生成提供业务流程问题解决方案的输出。

    Input data structure for data mining
    9.
    发明授权
    Input data structure for data mining 有权
    数据挖掘的输入数据结构

    公开(公告)号:US08250105B2

    公开(公告)日:2012-08-21

    申请号:US11671623

    申请日:2007-02-06

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30539 G06F2216/03

    摘要: Methods and apparatus, including computer program products, implementing and using techniques for compressing data included in several transactions. Each transaction has at least one item. A unique identifier is assigned to each different item and, if taxonomy is defined, to each different taxonomy parent. Sets of transactions are formed from the several transactions. The sets of transactions are stored using a computer data structure including: a list of identifiers of different items in the set of transactions, information indicating number of identifiers in the list, and bit field information indicating presence of the different items in the set of transactions, said bit field information being organized in accordance with the list for facilitating evaluation of patterns with respect to the set of transactions. A data structure for compressing data included in a set of transactions is also provided.

    摘要翻译: 方法和装置,包括计算机程序产品,用于压缩数据的实现和使用技术,包括在几个事务中。 每个交易至少有一个项目。 每个不同的项目分配一个唯一的标识符,如果定义了分类法,则分配给每个不同的分类标准。 交易集由几个交易组成。 使用计算机数据结构存储事务集合,包括:事务集合中不同项目的标识符列表,指示列表中的标识符数量的信息,以及指示事务集合中不同项目的存在的位字段信息 所述比特字段信息是根据列表进行组织的,以便于相对于该组事务的模式的评估。 还提供了用于压缩包括在一组事务中的数据的数据结构。

    Modeling user access to computer resources
    10.
    发明授权
    Modeling user access to computer resources 有权
    建模用户对计算机资源的访问

    公开(公告)号:US08214364B2

    公开(公告)日:2012-07-03

    申请号:US12124274

    申请日:2008-05-21

    IPC分类号: G06F17/30

    CPC分类号: G06F21/552 G06F21/316

    摘要: Embodiments of the invention provide a method for detecting changes in behavior of authorized users of computer resources and reporting the detected changes to the relevant individuals. The method includes evaluating actions performed by each user against user behavioral models and business rules. As a result of the analysis, a subset of users may be identified and reported as having unusual or suspicious behavior. In response, the management may provide feedback indicating that the user behavior is due to the normal expected business needs or that the behavior warrants further review. The management feedback is available for use by machine learning algorithms to improve the analysis of user actions over time. Consequently, investigation of user actions regarding computer resources is facilitated and data loss is prevented more efficiently relative to the prior art approaches with only minimal disruption to the ongoing business processes.

    摘要翻译: 本发明的实施例提供了一种用于检测计算机资源的授权用户的行为变化并将检测到的变化报告给相关个人的方法。 该方法包括评估每个用户针对用户行为模型和业务规则执行的动作。 作为分析的结果,可以识别和报告用户的一部分具有不寻常或可疑行为。 作为回应,管理层可以提供反馈意见,指出用户行为是由于正常的预期业务需求或行为值得进一步审查。 管理反馈可供机器学习算法使用,以改善用户随时间的行为分析。 因此,相对于现有技术方法,对于计算机资源的用户行为的调查被有助于更有效地防止数据丢失,而对正在进行的业务流程的中断只是最小的。