Electronic mail data cleaning
    1.
    发明申请
    Electronic mail data cleaning 失效
    电子邮件数据清理

    公开(公告)号:US20070130263A1

    公开(公告)日:2007-06-07

    申请号:US11293469

    申请日:2005-12-02

    IPC分类号: G06F15/16

    CPC分类号: G06Q10/107

    摘要: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

    摘要翻译: 级联处理方法用于清理噪声电子邮件或其他短信数据。 首先对嘈杂数据执行非文本过滤,以过滤掉数据中的非文本项。 然后对已过滤的数据执行文本归一化,以提供清除的数据。 清洁的数据可以用于各种其他应用或处理系统中的一种或多种。

    Text mining method
    2.
    发明申请
    Text mining method 审中-公开
    文本挖掘方法

    公开(公告)号:US20050283357A1

    公开(公告)日:2005-12-22

    申请号:US10970586

    申请日:2004-10-21

    IPC分类号: G06F17/28 G06F17/30

    CPC分类号: G06F16/313

    摘要: A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.

    摘要翻译: 提供了一种执行数据挖掘的方法。 该方法包括选择非结构化文本的至少一个数据源。 此外,选择转换以识别非结构化文本中的术语列表。 建立运行时路径以将数据源连接到转换,以将标识的术语列表加载到目标数据库中。

    Electronic mail data cleaning
    3.
    发明授权
    Electronic mail data cleaning 失效
    电子邮件数据清理

    公开(公告)号:US07590608B2

    公开(公告)日:2009-09-15

    申请号:US11293469

    申请日:2005-12-02

    IPC分类号: G06N5/00 G06F17/00

    CPC分类号: G06Q10/107

    摘要: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

    摘要翻译: 级联处理方法用于清理噪声电子邮件或其他短信数据。 首先对嘈杂数据执行非文本过滤,以过滤掉数据中的非文本项。 然后对已过滤的数据执行文本归一化,以提供清除的数据。 清洁的数据可以用于各种其他应用或处理系统中的一种或多种。

    Unstructured data in a mining model language
    4.
    发明申请
    Unstructured data in a mining model language 失效
    挖掘模型语言中的非结构化数据

    公开(公告)号:US20070214164A1

    公开(公告)日:2007-09-13

    申请号:US11373319

    申请日:2006-03-10

    IPC分类号: G06F7/00

    摘要: A standard mechanism for directly accessing unstructured data types (e.g., image, audio, video, gene sequencing and text data) in accordance with data mining operations is provided. The subject innovation can enable access to unstructured data directly from within the data mining engine or tool. Accordingly, the innovation enables multiple vendors to provide algorithms for mining unstructured data on a data mining platform (e.g., an SQL-brand server), thereby increasing adoption. As well, the subject innovation allows users to directly mine unstructured data that is not fixed-length, without pre-processing and tokenizing the data external to the data mining engine. In accordance therewith, the innovation can provide a mechanism to expand declarative language content types to include an “unstructured” data type thereby enabling a user and/or application to affirmatively designate mining data as an unstructured type.

    摘要翻译: 提供了一种用于根据数据挖掘操​​作直接访问非结构化数据类型(例如,图像,音频,视频,基因排序和文本数据)的标准机制。 主题创新可以直接从数据挖掘引擎或工具中访问非结构化数据。 因此,该创新使得多个供应商能够提供用于在数据挖掘平台(例如,SQL品牌服务器)上挖掘非结构化数据的算法,从而增加采用。 此外,本创新允许用户直接挖掘不固定长度的非结构化数据,而不需要对数据挖掘引擎外部的数据进行预处理和标记。 根据此,创新可以提供一种机制来扩展声明性语言内容类型以包括“非结构化”数据类型,从而使得用户和/或应用程序肯定地将挖掘数据指定为非结构化类型。

    Extensible data mining framework
    5.
    发明申请
    Extensible data mining framework 有权
    可扩展数据挖掘框架

    公开(公告)号:US20060020620A1

    公开(公告)日:2006-01-26

    申请号:US11157602

    申请日:2005-06-21

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30539 G06F2216/03

    摘要: The subject disclosure pertains to extensible data mining systems, means, and methodologies. For example, a data mining system is disclosed that supports plug-in or integration of non-native mining algorithms, perhaps provided by third parties, such that they function the same as built-in algorithms. Furthermore, non-native data mining viewers may also be seamlessly integrated into the system for displaying the results of one or more algorithms including those provided by third parties as well as those built-in. Still further yet, support is provided for extending data mining languages to include user-defined functions (UDFs).

    摘要翻译: 主题公开涉及可扩展数据挖掘系统,手段和方法。 例如,公开了一种数据挖掘系统,其支持可能由第三方提供的非本地挖掘算法的插件或集成,使得它们与内置算法相同。 此外,非本地数据挖掘查看器还可以无缝地集成到系统中,用于显示包括由第三方提供的那些算法的一个或多个算法的结果以及内置的算法。 此外,还提供了用于扩展数据挖掘语言以包括用户定义的功能(UDF)的支持。

    Web service platform for keyword technologies
    6.
    发明申请
    Web service platform for keyword technologies 有权
    关键词技术的Web服务平台

    公开(公告)号:US20080256561A1

    公开(公告)日:2008-10-16

    申请号:US11787371

    申请日:2007-04-16

    IPC分类号: G06F9/44

    CPC分类号: G06F9/4492

    摘要: The present web service platform includes a set of application program interfaces (APIs) and a framework for adding services that correspond to the APIs. The web service platform may also support a stored procedure (sproc) that allows combining results from two or more services before transmitting results to an application. The services relate to keyword technologies.

    摘要翻译: 目前的Web服务平台包括一组应用程序接口(API)和用于添加与API对应的服务的框架。 Web服务平台还可以支持存储过程(sproc),其允许在将结果发送到应用之前组合来自两个或更多个服务的结果。 这些服务涉及到关键字技术。

    Systems and methods that facilitate data mining
    7.
    发明授权
    Systems and methods that facilitate data mining 失效
    促进数据挖掘的系统和方法

    公开(公告)号:US07398268B2

    公开(公告)日:2008-07-08

    申请号:US11049031

    申请日:2005-02-02

    IPC分类号: G06F17/30

    摘要: A system that facilitates data mining comprises a reception component that receives command(s) in a declarative language that relate to utilizing an output of a first data mining model as an input to a second data mining model. An implementation component analyzes the received command(s) and implements the command(s) with respect to the first and second data mining models. In another aspect of the subject invention, the reception component can receive further command(s) in a declarative language with respect to causing one or more of the first and second data mining models to output a prediction, the prediction desirably generated without prediction input, the implementation component causes the one or more of the first and second data mining models to output the prediction.

    摘要翻译: 便于数据挖掘的系统包括:接收组件,其以声明性语言接收与将第一数据挖掘模型的输出利用为第二数据挖掘模型的输入相关的命令。 实现组件分析所接收的命令并且针对第一和第二数据挖掘模型实现命令。 在本发明的另一方面,接收组件可以以声明性语言接收另外的命令,以使得第一和第二数据挖掘模型中的一个或多个输出预测,期望地产生而不具有预测输入的预测, 实现组件使第一和第二数据挖掘模型中的一个或多个输出预测。

    Modeling sequence and time series data in predictive analytics
    8.
    发明申请
    Modeling sequence and time series data in predictive analytics 有权
    预测分析中的建模序列和时间序列数据

    公开(公告)号:US20060010142A1

    公开(公告)日:2006-01-12

    申请号:US11116832

    申请日:2005-04-28

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30539 G06F17/30548

    摘要: The subject invention relates to systems and methods to extend the capabilities of declarative data modeling languages. In one aspect, a declarative data modeling language system is provided. The system includes a data modeling language component that generates one or more data mining models to extract predictive information from local or remote databases. A language extension component facilitates modeling capability in the data modeling language by providing a data sequence model or a time series model within the data modeling language to support various data mining applications.

    摘要翻译: 本发明涉及扩展声明式数据建模语言能力的系统和方法。 在一个方面,提供了一种声明式数据建模语言系统。 该系统包括数据建模语言组件,其生成一个或多个数据挖掘模型以从本地或远程数据库提取预测信息。 语言扩展组件通过在数据建模语言中提供数据序列模型或时间序列模型来促进数据建模语言中的建模能力,以支持各种数据挖掘应用程序。

    Modeling sequence and time series data in predictive analytics
    9.
    发明授权
    Modeling sequence and time series data in predictive analytics 有权
    预测分析中的建模序列和时间序列数据

    公开(公告)号:US07747641B2

    公开(公告)日:2010-06-29

    申请号:US11116832

    申请日:2005-04-28

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30539 G06F17/30548

    摘要: The subject invention relates to systems and methods to extend the capabilities of declarative data modeling languages. In one aspect, a declarative data modeling language system is provided. The system includes a data modeling language component that generates one or more data mining models to extract predictive information from local or remote databases. A language extension component facilitates modeling capability in the data modeling language by providing a data sequence model or a time series model within the data modeling language to support various data mining applications.

    摘要翻译: 本发明涉及扩展声明式数据建模语言能力的系统和方法。 在一个方面,提供了一种声明式数据建模语言系统。 该系统包括数据建模语言组件,其生成一个或多个数据挖掘模型以从本地或远程数据库提取预测信息。 语言扩展组件通过在数据建模语言中提供数据序列模型或时间序列模型来促进数据建模语言中的建模能力,以支持各种数据挖掘应用程序。

    Unstructured data in a mining model language
    10.
    发明授权
    Unstructured data in a mining model language 失效
    挖掘模型语言中的非结构化数据

    公开(公告)号:US07593927B2

    公开(公告)日:2009-09-22

    申请号:US11373319

    申请日:2006-03-10

    IPC分类号: G06F7/00 G06F17/30

    摘要: A standard mechanism for directly accessing unstructured data types (e.g., image, audio, video, gene sequencing and text data) in accordance with data mining operations is provided. The subject innovation can enable access to unstructured data directly from within the data mining engine or tool. Accordingly, the innovation enables multiple vendors to provide algorithms for mining unstructured data on a data mining platform (e.g., an SQL-brand server), thereby increasing adoption. As well, the subject innovation allows users to directly mine unstructured data that is not fixed-length, without pre-processing and tokenizing the data external to the data mining engine. In accordance therewith, the innovation can provide a mechanism to expand declarative language content types to include an “unstructured” data type thereby enabling a user and/or application to affirmatively designate mining data as an unstructured type.

    摘要翻译: 提供了一种用于根据数据挖掘操​​作直接访问非结构化数据类型(例如图像,音频,视频,基因排序和文本数据)的标准机制。 主题创新可以直接从数据挖掘引擎或工具中访问非结构化数据。 因此,该创新使得多个供应商能够提供用于在数据挖掘平台(例如,SQL品牌服务器)上挖掘非结构化数据的算法,从而增加采用。 此外,本创新允许用户直接挖掘不固定长度的非结构化数据,而不需要对数据挖掘引擎外部的数据进行预处理和标记。 根据此,创新可以提供一种机制来扩展声明性语言内容类型以包括“非结构化”数据类型,从而使得用户和/或应用程序肯定地将挖掘数据指定为非结构化类型。