Rapid Automatic Keyword Extraction for Information Retrieval and Analysis
    1.
    发明申请
    Rapid Automatic Keyword Extraction for Information Retrieval and Analysis 有权
    快速自动关键词提取信息检索与分析

    公开(公告)号:US20110060747A1

    公开(公告)日:2011-03-10

    申请号:US12555916

    申请日:2009-09-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

    摘要翻译: 快速自动关键词提取的信息检索和分析方法和系统。 实施例可以包括通过分隔符,停止词或两者来解析单个文档中的单词以识别候选关键字。 然后根据共同发生程度,共同发生频率或两者的函数计算候选关键字中每个单词的单词分数。 基于候选关键字中的单词的分数的函数,针对每个候选关键字计算关键词分数。 然后至少部分地基于具有最高关键词分数的候选关键词,将候选关键词的一部分提取为关键字。

    System and method for use in text analysis of documents and records
    3.
    发明授权
    System and method for use in text analysis of documents and records 有权
    用于文件和记录文本分析的系统和方法

    公开(公告)号:US06665661B1

    公开(公告)日:2003-12-16

    申请号:US09672599

    申请日:2000-09-29

    IPC分类号: G06F1730

    摘要: Methods and systems are provided that enable text in various sections of data records to be separately catalogued, indexed, or vectorized for analysis in a text visualization and mining system. A text processing system receives a plurality of data records, where each data record has one or a plurality of attribute fields associated with the records. The attributes fields containing textual information are identified. The specific textual content of each attribute field is identified. An index is generated that associates the textual content contained in each attribute field with the attribute field containing the textual content. The index is operable for use in text processing. The plurality of data records may be located in a data table and the textual information may be contained within cells of the data table. In another aspect, a plurality of data records is received, where at least some of the data records contain text terms. A first method is applied to weight text terms of the data records in a first manner to aid in distinguishing records from each other in response to selection of the first method. A second method is applied to weight text terms of the data records in a second manner to aid in distinguishing records from each other in response to selection of the second method. A vector is generated to distinguish each of the data records based on the text terms weighted by either the first or second method.

    摘要翻译: 提供了方法和系统,使数据记录的各个部分的文本可以单独编目,索引或向量化,以便在文本可视化和挖掘系统中进行分析。 文本处理系统接收多个数据记录,其中每个数据记录具有与记录相关联的一个或多个属性字段。 标识包含文本信息的属性字段。 识别每个属性字段的特定文本内容。 生成将每个属性字段中包含的文本内容与包含文本内容的属性字段相关联的索引。 该索引可操作用于文本处理。 多个数据记录可以位于数据表中,并且文本信息可以包含在数据表的单元内。 在另一方面,接收多个数据记录,其中至少一些数据记录包含文本术语。 应用第一种方法以第一种方式对数据记录的文本术语进行加权,以帮助响应于第一种方法的选择来区分记录。 应用第二种方法以第二种方式对数据记录的文本术语进行加权,以帮助响应于第二种方法的选择来区分记录。 生成矢量以基于由第一或第二方法加权的文本项来区分每个数据记录。

    Rapid automatic keyword extraction for information retrieval and analysis
    4.
    发明授权
    Rapid automatic keyword extraction for information retrieval and analysis 有权
    快速自动关键词提取,用于信息检索和分析

    公开(公告)号:US08131735B2

    公开(公告)日:2012-03-06

    申请号:US12555916

    申请日:2009-09-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

    摘要翻译: 快速自动关键词提取的信息检索和分析方法和系统。 实施例可以包括通过分隔符,停止词或两者来解析单个文档中的单词以识别候选关键字。 然后根据共同发生程度,共同发生频率或两者的函数计算候选关键字中每个单词的单词分数。 基于候选关键字中的单词的分数的函数,针对每个候选关键字计算关键词分数。 然后至少部分地基于具有最高关键词分数的候选关键词,将候选关键词的一部分提取为关键字。

    Data visualization methods, data visualization devices, data visualization apparatuses, and articles of manufacture
    6.
    发明授权
    Data visualization methods, data visualization devices, data visualization apparatuses, and articles of manufacture 有权
    数据可视化方法,数据可视化装置,数据可视化装置和制品

    公开(公告)号:US09069847B2

    公开(公告)日:2015-06-30

    申请号:US11256225

    申请日:2005-10-21

    IPC分类号: G06F17/00 G06F17/30

    摘要: Data visualization methods, data visualization devices, data visualization apparatuses, and articles of manufacture are described according to some aspects. In one aspect, a data visualization method includes accessing a plurality of initial documents at a first moment in time, first processing the initial documents providing processed initial documents, first identifying a plurality of first associations of the initial documents using the processed initial documents, generating a first visualization depicting the first associations, accessing a plurality of additional documents at a second moment in time after the first moment in time, second processing the additional documents providing processed additional documents, second identifying a plurality of second associations of the additional documents and at least some of the initial documents, wherein the second identifying comprises identifying using the processed initial documents and the processed additional documents, and generating a second visualization depicting the second associations.

    摘要翻译: 根据一些方面描述数据可视化方法,数据可视化设备,数据可视化设备和制品。 一方面,数据可视化方法包括在第一时刻访问多个初始文档,首先处理提供处理的初始文档的初始文档,首先使用处理的初始文档识别初始文档的多个第一关联,生成 描绘第一关联的第一可视图,在第一时刻之后的第二时刻访问多个附加文档,第二处理提供经处理的附加文档的附加文档,第二识别附加文档的多个第二关联,以及在 至少一些初始文档,其中所述第二识别包括使用所处理的初始文档和所处理的附加文档进行识别,以及生成描绘所述第二关联的第二可视化。

    Systems and methods for improving concept landscape visualizations as a data analysis tool
    7.
    发明授权
    Systems and methods for improving concept landscape visualizations as a data analysis tool 有权
    将概念景观可视化提升为数据分析工具的系统和方法

    公开(公告)号:US06940509B1

    公开(公告)日:2005-09-06

    申请号:US09675515

    申请日:2000-09-29

    IPC分类号: G06T11/20 G06T11/00

    CPC分类号: G06T11/206

    摘要: Systems and methods provide several enhancements for the viewing, analysis, and generation of landscape views in a data analysis system, including: allowing a user to select from multiple methods to generate a landscape view, providing labels for peaks of a landscape, enabling the user to replace labels displayed on the landscape view, enabling a landscape view to be recalculated based on the replacement labels, and allowing a user to switch or morph between two landscape views generated by different methods. Such methods or systems generate graphical landscape map visualizations from a set of data records.

    摘要翻译: 系统和方法为数据分析系统中的景观视图的查看,分析和生成提供了几个增强功能,包括:允许用户从多种方法中选择生成横向视图,为景观的峰值提供标签,使用户 以替换横向视图中显示的标签,可以根据替换标签重新计算横向视图,并允许用户在由不同方法生成的两个横向视图之间进行切换或变形。 这样的方法或系统从一组数据记录生成图形横向地图可视化。

    Computation and Analysis of Significant Themes
    8.
    发明申请
    Computation and Analysis of Significant Themes 审中-公开
    重要主题的计算与分析

    公开(公告)号:US20110004465A1

    公开(公告)日:2011-01-06

    申请号:US12568365

    申请日:2009-09-28

    IPC分类号: G06F17/27

    CPC分类号: G06F17/277 G06F16/35

    摘要: Systems and computer-implemented processes for computation and analysis of significant themes in a corpus of documents. The computation and analysis of significant themes can be executed on a processor and involves generating a lexical unit document association (LUDA) vector for each lexical unit that has been provided and quantifying similarities between each unique pair of lexical units. The LUDA vector characterizes a measure of association between its corresponding lexical unit and documents in the corpus. The lexical units can then be grouped into clusters such that each cluster contains a set of lexical units that are most similar as determined by the LUDA vectors and a predetermined clustering threshold.

    摘要翻译: 系统和计算机实现的程序,用于计算和分析文档语料库中的重要主题。 可以在处理器上执行重要主题的计算和分析,并且涉及为已经提供的每个词汇单元生成词汇单元文档关联(LUDA)向量,并量化每个唯一的词汇单元对之间的相似性。 LUDA向量表征其对应的词汇单元与语料库中的文档之间的关联度量。 然后可以将词汇单元分组成群集,使得每个群集包含由LUDA向量和预定群集阈值确定的最相似的一组词法单元。

    Data import system for data analysis system
    9.
    发明授权
    Data import system for data analysis system 有权
    数据导入系统用于数据分析系统

    公开(公告)号:US06718336B1

    公开(公告)日:2004-04-06

    申请号:US09672622

    申请日:2000-09-29

    IPC分类号: G06F1730

    摘要: A data import system enables access to data of multiple types from multiple data sources of different formats and provides an interface for importing data into a data analysis system. The interface enables a user to customize the formatting of the data as the data is being imported into a data analysis system. A user may select first user defined options for operating on a first data set received during a data importation process. An intermediate representation of the data set is generated based on the user first defined options. A user may specify second user defined options based on the intermediate representation during the data importation process. The second user defined options are processed to produce a final data representation of the data set to be used for analysis of the data. The intermediate representation may be a data table. The processing of a data set may include merging a first and second data set to produce the final data representation. The second user defined options may enable a user to select a basic operation for merging the data sets or to select a non-basic operation for merging the data sets. The basic operation may combine data sets in response to a user's selection of a first graphical interface control, and the non-basic operation may combine the data sets based on user selection of at least two graphical interface controls from a group of graphical interface controls.

    摘要翻译: 数据导入系统可以访问来自不同格式的多个数据源的多种类型的数据,并提供用于将数据导入数据分析系统的接口。 该界面使用户能够在将数据导入数据分析系统时自定义数据的格式。 用户可以选择用于在数据导入过程期间接收的第一数据集上操作的第一用户定义的选项。 基于用户首先定义的选项生成数据集的中间表示。 用户可以在数据导入过程期间基于中间表示来指定第二用户定义的选项。 处理第二个用户定义的选项以产生要用于数据分析的数据集的最终数据表示。 中间表示可以是数据表。 数据集的处理可以包括合并第一和第二数据集以产生最终数据表示。 第二用户定义的选项可以使得用户能够选择用于合并数据集的基本操作或者选择用于合并数据集的非基本操作。 基本操作可以响应于用户对第一图形界面控件的选择来组合数据集,并且非基本操作可以基于来自一组图形界面控件的至少两个图形界面控件的用户选择来组合数据集。