-
公开(公告)号:US20060010110A1
公开(公告)日:2006-01-12
申请号:US11049031
申请日:2005-02-02
申请人: Pyungchul Kim , ZhaoHui Tang , Ioan Crivat , C. MacLennan , Raman Iyer , Irina Gorbach
发明人: Pyungchul Kim , ZhaoHui Tang , Ioan Crivat , C. MacLennan , Raman Iyer , Irina Gorbach
IPC分类号: G06F17/30
CPC分类号: G06F17/30539 , G06F17/30595 , G06Q40/00 , Y10S707/99933
摘要: A system that facilitates data mining comprises a reception component that receives command(s) in a declarative language that relate to utilizing an output of a first data mining model as an input to a second data mining model. An implementation component analyzes the received command(s) and implements the command(s) with respect to the first and second data mining models. In another aspect of the subject invention, the reception component can receive further command(s) in a declarative language with respect to causing one or more of the first and second data mining models to output a prediction, the prediction desirably generated without prediction input, the implementation component causes the one or more of the first and second data mining models to output the prediction.
摘要翻译: 便于数据挖掘的系统包括:接收组件,其以声明性语言接收与将第一数据挖掘模型的输出利用为第二数据挖掘模型的输入相关的命令。 实现组件分析所接收的命令并且针对第一和第二数据挖掘模型实现命令。 在本发明的另一方面,接收组件可以以声明性语言接收另外的命令,以使得第一和第二数据挖掘模型中的一个或多个输出预测,期望地产生而不具有预测输入的预测, 实现组件使第一和第二数据挖掘模型中的一个或多个输出预测。
-
公开(公告)号:US20050021482A1
公开(公告)日:2005-01-27
申请号:US10611119
申请日:2003-06-30
申请人: Pyungchul Kim , C. MacLennan , Zhaohui Tang , Raman Iyer
发明人: Pyungchul Kim , C. MacLennan , Zhaohui Tang , Raman Iyer
CPC分类号: G06F17/30395 , G06F17/30398 , G06F17/30539
摘要: A drill-through feature is provided which provides a universal drill-through to mining model source data from a trained mining model. In order for a user or application to obtain model content information on a given node of a model, a universal function is provided whereby the user specifies the node for a model and data set, and the cases underlying that node for that model and data set are returned. A sampling of underlying cases may be provided, where only a sampling of the cases represented in the node is requested.
摘要翻译: 提供钻取功能,其提供了从受过训练的挖掘模型挖掘模型来源数据的通用钻取。 为了使用户或应用程序获得模型的给定节点上的模型内容信息,提供通用功能,借此用户为模型和数据集指定节点,并为该模型和数据集指定该节点的情况 被归还。 可以提供对基础案例的抽样,其中仅请求节点中表示的案例的抽样。
-
公开(公告)号:US20060020620A1
公开(公告)日:2006-01-26
申请号:US11157602
申请日:2005-06-21
申请人: Raman Iyer , Ioan Crivat , C. MacLennan , Scott Oveson , Rong Guan , ZhaoHui Tang , Pyungchul Kim , Irina Gorbach
发明人: Raman Iyer , Ioan Crivat , C. MacLennan , Scott Oveson , Rong Guan , ZhaoHui Tang , Pyungchul Kim , Irina Gorbach
IPC分类号: G06F17/00
CPC分类号: G06F17/30539 , G06F2216/03
摘要: The subject disclosure pertains to extensible data mining systems, means, and methodologies. For example, a data mining system is disclosed that supports plug-in or integration of non-native mining algorithms, perhaps provided by third parties, such that they function the same as built-in algorithms. Furthermore, non-native data mining viewers may also be seamlessly integrated into the system for displaying the results of one or more algorithms including those provided by third parties as well as those built-in. Still further yet, support is provided for extending data mining languages to include user-defined functions (UDFs).
摘要翻译: 主题公开涉及可扩展数据挖掘系统,手段和方法。 例如,公开了一种数据挖掘系统,其支持可能由第三方提供的非本地挖掘算法的插件或集成,使得它们与内置算法相同。 此外,非本地数据挖掘查看器还可以无缝地集成到系统中,用于显示包括由第三方提供的那些算法的一个或多个算法的结果以及内置的算法。 此外,还提供了用于扩展数据挖掘语言以包括用户定义的功能(UDF)的支持。
-
公开(公告)号:US20050021489A1
公开(公告)日:2005-01-27
申请号:US10624278
申请日:2003-07-22
申请人: C. MacLennan , Zhaohui Tang , Pyungchul Kim , Raman Iyer
发明人: C. MacLennan , Zhaohui Tang , Pyungchul Kim , Raman Iyer
IPC分类号: G06F7/00
CPC分类号: G06Q30/02 , G06F16/2465 , G06F2216/03
摘要: A mining structure is created which contains processed data from a data set. This data may be used to train one or more models. In addition to the selection of data to be used by model from data set, processing parameters are set, in one embodiment. For example, the discretization of a continuous variable into buckets, the number of buckets, and/or the sub-range corresponding to each bucket is set when the mining structure is created. The mining structure is processed, which causes the processing and storage of data from data set in the mining structure. After processing, the mining structure can be used by one or more models.
摘要翻译: 创建一个挖掘结构,其中包含来自数据集的已处理数据。 该数据可用于训练一个或多个模型。 除了从数据集中选择要由模型使用的数据之外,在一个实施例中,设置处理参数。 例如,当创建采矿结构时,设置连续变量到桶的离散化,桶的数量和/或对应于每个桶的子范围。 对采矿结构进行处理,对采矿结构中数据集的数据进行处理和存储。 处理后,采矿结构可以由一个或多个型号使用。
-
公开(公告)号:US07188090B2
公开(公告)日:2007-03-06
申请号:US10611119
申请日:2003-06-30
申请人: Pyungchul Kim , C. James MacLennan , Zhaohui Tang , Raman Iyer
发明人: Pyungchul Kim , C. James MacLennan , Zhaohui Tang , Raman Iyer
CPC分类号: G06F17/30395 , G06F17/30398 , G06F17/30539
摘要: A drill-through feature is provided which provides a universal drill-through to mining model source data from a trained mining model. In order for a user or application to obtain model content information on a given node of a model, a universal function is provided whereby the user specifies the node for a model and data set, and the cases underlying that node for that model and data set are returned. A sampling of underlying cases may be provided, where only a sampling of the cases represented in the node is requested.
摘要翻译: 提供钻取功能,其提供了从受过训练的挖掘模型挖掘模型来源数据的通用钻取。 为了使用户或应用程序获得模型的给定节点上的模型内容信息,提供通用功能,借此用户为模型和数据集指定节点,并为该模型和数据集指定该节点的情况 被归还。 可以提供对基础案例的抽样,其中仅请求节点中表示的案例的抽样。
-
公开(公告)号:US20070214164A1
公开(公告)日:2007-09-13
申请号:US11373319
申请日:2006-03-10
申请人: C. MacLennan , Ioan Crivat , ZhaoHui Tang , Raman Iyer
发明人: C. MacLennan , Ioan Crivat , ZhaoHui Tang , Raman Iyer
IPC分类号: G06F7/00
CPC分类号: G06F17/30943 , G06F2216/03 , Y10S707/99933
摘要: A standard mechanism for directly accessing unstructured data types (e.g., image, audio, video, gene sequencing and text data) in accordance with data mining operations is provided. The subject innovation can enable access to unstructured data directly from within the data mining engine or tool. Accordingly, the innovation enables multiple vendors to provide algorithms for mining unstructured data on a data mining platform (e.g., an SQL-brand server), thereby increasing adoption. As well, the subject innovation allows users to directly mine unstructured data that is not fixed-length, without pre-processing and tokenizing the data external to the data mining engine. In accordance therewith, the innovation can provide a mechanism to expand declarative language content types to include an “unstructured” data type thereby enabling a user and/or application to affirmatively designate mining data as an unstructured type.
摘要翻译: 提供了一种用于根据数据挖掘操作直接访问非结构化数据类型(例如,图像,音频,视频,基因排序和文本数据)的标准机制。 主题创新可以直接从数据挖掘引擎或工具中访问非结构化数据。 因此,该创新使得多个供应商能够提供用于在数据挖掘平台(例如,SQL品牌服务器)上挖掘非结构化数据的算法,从而增加采用。 此外,本创新允许用户直接挖掘不固定长度的非结构化数据,而不需要对数据挖掘引擎外部的数据进行预处理和标记。 根据此,创新可以提供一种机制来扩展声明性语言内容类型以包括“非结构化”数据类型,从而使得用户和/或应用程序肯定地将挖掘数据指定为非结构化类型。
-
公开(公告)号:US20060026167A1
公开(公告)日:2006-02-02
申请号:US11069342
申请日:2005-03-01
申请人: Mosha Pasumansky , Marius Dumitru , Adrian Dumitrascu , Cristian Petculescu , Akshai Mirchandani , Paul Sanders , T.K. Anand , Richard Tkachuk , Raman Iyer , Thomas Conlon , Alexander Berger , Sergei Gringauze , Ioan Crivat , C. MacLennan , Rong Guan
发明人: Mosha Pasumansky , Marius Dumitru , Adrian Dumitrascu , Cristian Petculescu , Akshai Mirchandani , Paul Sanders , T.K. Anand , Richard Tkachuk , Raman Iyer , Thomas Conlon , Alexander Berger , Sergei Gringauze , Ioan Crivat , C. MacLennan , Rong Guan
IPC分类号: G06F17/30
CPC分类号: G06F17/30893
摘要: The subject invention relates to systems and methods that extend the network data access capabilities of mark-up language protocols. In one aspect, a network data transfer system is provided. The system includes a protocol component that employs a computerized mark-up language to facilitate data interactions between network components, whereby the data interactions were previously limited or based on a statement command associated with the markup language. An extension component operates with the protocol component to support the data transactions, where the extension component supplies at least one other command from the statement command to facilitate the data interactions.
摘要翻译: 本发明涉及扩展标记语言协议的网络数据访问能力的系统和方法。 一方面,提供一种网络数据传送系统。 该系统包括协议组件,其采用计算机化的标记语言来促进网络组件之间的数据交互,由此先前限制数据交互或基于与标记语言相关联的语句命令。 扩展组件与协议组件一起运行以支持数据事务,其中扩展组件从语句命令提供至少一个其他命令,以促进数据交互。
-
公开(公告)号:US20070220034A1
公开(公告)日:2007-09-20
申请号:US11377024
申请日:2006-03-16
申请人: Raman Iyer , C. MacLennan , Ioan Crivat
发明人: Raman Iyer , C. MacLennan , Ioan Crivat
IPC分类号: G06F7/00
CPC分类号: G06F16/2465
摘要: A realtime training model update architecture for data mining models. The architecture facilitates automatic update processes with respect to evolving source/training data. Additionally, model update training can be performed at times other than in realtime. Scheduling can be invoked, for periodic and incremental updates, and refresh intervals applied through the training parameters for the mining structure and/or model. Training can also be triggered by user-defined events such as database notifications, and/or alerts from other operational systems. In support thereof, a data mining model component is provided for training a data mining model on a dataset in realtime, and an update component for incrementally training the data mining model according to predetermined criteria. Additionally, model versioning and version comparison can be employed to detect significant changes and retain updated models. Training data aging/weighting of training data can be applied.
摘要翻译: 数据挖掘模型的实时训练模型更新架构。 该架构有助于针对不断变化的源/训练数据的自动更新过程。 此外,模型更新培训可以在实时以外的时间进行。 可以调用计划,用于定期和增量更新,以及通过采矿结构和/或模型的训练参数应用的刷新间隔。 培训也可以由用户定义的事件(例如数据库通知)和/或来自其他操作系统的警报触发。 为了支持这一点,提供了一种数据挖掘模型组件,用于实时地对数据集上的数据挖掘模型进行训练;以及更新部件,用于根据预定标准逐步地训练数据挖掘模型。 此外,可以使用模型版本控制和版本比较来检测重大变化并保留更新的模型。 训练数据的老化/加权训练数据可以被应用。
-
公开(公告)号:US20060010112A1
公开(公告)日:2006-01-12
申请号:US11069121
申请日:2005-02-28
申请人: Ioan Crivat , C. MacLennan , Raman Iyer , Marius Dumitru
发明人: Ioan Crivat , C. MacLennan , Raman Iyer , Marius Dumitru
IPC分类号: G06F17/30
CPC分类号: G06F17/30539 , G06F17/30421 , G06F17/30595 , Y10S707/99932 , Y10S707/99934 , Y10S707/99944
摘要: Architecture that facilitates syntax processing for data mining statements. The system includes a syntax engine that receives as an input a query statement which, for example, is a data mining request. The statement can be generated from many different sources, e.g., a client application and/or a server application, and requests query processing of a data source (e.g., a relational database) to return a result set. The syntax engine includes a binding component that converts the query statement into an encapsulated statement in accordance with a predefined grammar. The encapsulated statement includes both data and data operations to be performed on the data of the data source, and which is understood by the data source. An execution component processes the encapsulated statement against the data source to return the desired result set.
摘要翻译: 促进数据挖掘语句的语法处理的架构。 该系统包括语法引擎,其作为输入接收诸如数据挖掘请求的查询语句。 语句可以从许多不同的来源(例如客户端应用程序和/或服务器应用程序)生成,并且请求数据源(例如,关系数据库)的查询处理以返回结果集。 语法引擎包括一个绑定组件,它根据预定义的语法将查询语句转换成封装语句。 封装语句包括要对数据源的数据执行的数据和数据操作,数据源可以理解。 执行组件根据数据源处理封装语句以返回所需的结果集。
-
公开(公告)号:US20070214135A1
公开(公告)日:2007-09-13
申请号:US11371477
申请日:2006-03-09
申请人: Ioan Crivat , Raman Iyer , C. MdcLennan
发明人: Ioan Crivat , Raman Iyer , C. MdcLennan
IPC分类号: G06F17/30
CPC分类号: G06F17/30539
摘要: A system that effectuates fetching a complete set of relational data into a mining services server and subsequently defining desired partitions upon the fetched data is provided. In accordance with the innovation, the data can be locally cached and partitioned therefrom. Accordingly, upon the same mining structure (e.g., cache) that has been partitioned, the novel innovation can build mining models for each partition. In other words, the innovation can employ the concept of mining structure as a data cache while manipulating only partitions of this cache in certain operations. The innovation can be employed in scenarios where a user wants to train a mining model using only data points that satisfy a particular Boolean condition, a user wants to split the training set into multiple partitions (e.g., training/testing) and/or a user wants to perform a data mining procedure known as “N-fold cross validation.”
摘要翻译: 提供了一种能够将完整的关系数据集提取到采矿服务服务器中并随后在获取的数据上定义所需分区的系统。 根据创新,数据可以被本地缓存并从中分割。 因此,在已经被划分的相同挖掘结构(例如,高速缓存)上,新颖的创新可以为每个分区建立挖掘模型。 换句话说,创新可以采用挖掘结构的概念作为数据高速缓存,同时在某些操作中仅操纵该高速缓存的分区。 该创新可以在用户想要仅使用满足特定布尔条件的数据点来训练挖掘模型的情况下使用,用户希望将训练集合分成多个分区(例如,训练/测试)和/或用户 想要执行称为“N-fold交叉验证”的数据挖掘过程。
-
-
-
-
-
-
-
-
-