SYSTEMS AND METHODS FOR ANALYZING ELECTRONIC TEXT
    1.
    发明申请
    SYSTEMS AND METHODS FOR ANALYZING ELECTRONIC TEXT 有权
    用于分析电子文本的系统和方法

    公开(公告)号:US20100145940A1

    公开(公告)日:2010-06-10

    申请号:US12331271

    申请日:2008-12-09

    IPC分类号: G06F17/21 G06F17/30

    CPC分类号: G06F17/30864 G06F17/30705

    摘要: Systems and methods for systematically analyzing an electronic text are described. In one embodiment, the method includes receiving the electronic text from a plurality of sources. The method also includes determining an at least one term of interest to be identified in the electronic text. The method further includes identifying a plurality of locations within the electronic text including the at least one term of interest. The method also includes for each location within a plurality of locations, creating a snippet from a text segment around the at least one term of interest at the location within the electronic text. The method further includes creating multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category. The method also includes determining co-occurrences between the multiple taxonomies to determine associations between categories of a different taxonomies of the multiple taxonomies.

    摘要翻译: 描述用于系统地分析电子文本的系统和方法。 在一个实施例中,该方法包括从多个源接收电子文本。 该方法还包括确定要在电子文本中识别的至少一个感兴趣的术语。 该方法还包括识别电子文本内的多个位置,包括至少一个感兴趣的术语。 所述方法还包括针对多个位置内的每个位置,从电子文本内的位置处的所述至少一个感兴趣的术语周围的文本段创建代码片段。 该方法还包括从片段为至少一个感兴趣的术语创建多个分类法,其中分类法包括至少一个类别。 该方法还包括确定多重分类法之间的共同出现以确定多种分类法的不同分类法的类别之间的关联。

    Systems and methods for analyzing electronic text
    2.
    发明授权
    Systems and methods for analyzing electronic text 有权
    用于分析电子文本的系统和方法

    公开(公告)号:US08606815B2

    公开(公告)日:2013-12-10

    申请号:US12331271

    申请日:2008-12-09

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864 G06F17/30705

    摘要: Systems and methods for systematically analyzing an electronic text are described. In one embodiment, the method includes receiving the electronic text from a plurality of sources. The method also includes determining an at least one term of interest to be identified in the electronic text. The method further includes identifying a plurality of locations within the electronic text including the at least one term of interest. The method also includes for each location within a plurality of locations, creating a snippet from a text segment around the at least one term of interest at the location within the electronic text. The method further includes creating multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category. The method also includes determining co-occurrences between the multiple taxonomies to determine associations between categories of a different taxonomies of the multiple taxonomies.

    摘要翻译: 描述用于系统地分析电子文本的系统和方法。 在一个实施例中,该方法包括从多个源接收电子文本。 该方法还包括确定要在电子文本中识别的至少一个感兴趣的术语。 该方法还包括识别电子文本内的多个位置,包括至少一个感兴趣的术语。 所述方法还包括针对多个位置内的每个位置,从电子文本内的位置处的所述至少一个感兴趣的术语周围的文本段创建代码片段。 该方法还包括从片段为至少一个感兴趣的术语创建多个分类法,其中分类法包括至少一个类别。 该方法还包括确定多重分类法之间的共同出现以确定多种分类法的不同分类法的类别之间的关联。

    Using rule induction to identify emerging trends in unstructured text streams
    3.
    发明授权
    Using rule induction to identify emerging trends in unstructured text streams 失效
    使用规则归纳来识别非结构化文本流中的新趋势

    公开(公告)号:US08712926B2

    公开(公告)日:2014-04-29

    申请号:US12126829

    申请日:2008-05-23

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: A method for identifying emerging concepts in unstructured text streams comprises: selecting a subset V of documents from a set U of documents; generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint inasmuch as each document of U is included in only one category of the partition; and generating a descriptive label for each of the disjoint categories from the Boolean combination of terms for that category.

    摘要翻译: 用于识别非结构化文本流中新出现的概念的方法包括:从文档集合U中选择文档的子集V; 生成将集合U分成多个类别的术语的至少一个布尔组合,所述多个类别表示所选择的子集V的广义的,统计学上的模型,其中,所述类别是不相交的,因为U的每个文档仅包括在 划分; 以及从该类别的术语的布尔组合中为每个不相交类别生成描述性标签。

    Information mining using domain specific conceptual structures
    4.
    发明授权
    Information mining using domain specific conceptual structures 有权
    信息挖掘使用领域特定的概念结构

    公开(公告)号:US08805843B2

    公开(公告)日:2014-08-12

    申请号:US12132515

    申请日:2008-06-03

    IPC分类号: G06F17/30 G06F7/00

    摘要: A method and analytics tools for information mining incorporating domain specific knowledge and conceptual structures are disclosed, the method including: providing a first set of documents related to a first topic of interest; using a first taxonomy to categorize the first set of documents into a set of categories; providing a second set of documents related to a second topic of interest; categorizing the second set of documents according to the set of categories of the first set of documents; using an element of domain knowledge to re-categorize the first set of documents; and examining a category to identify a document of interest.

    摘要翻译: 公开了一种包含领域特定知识和概念结构的信息挖掘的方法和分析工具,该方法包括:提供与感兴趣的第一主题相关的第一组文档; 使用第一分类法将第一组文档分类为一组类别; 提供与第二个感兴趣的话题有关的第二组文件; 根据第一组文件的类别集合对第二组文件进行分类; 使用领域知识的要素重新分类第一组文件; 并检查类别以识别感兴趣的文档。

    Methodologies and analytics tools for identifying potential licensee markets
    5.
    发明授权
    Methodologies and analytics tools for identifying potential licensee markets 失效
    识别潜在被许可人市场的方法和分析工具

    公开(公告)号:US07711649B2

    公开(公告)日:2010-05-04

    申请号:US12418862

    申请日:2009-04-06

    IPC分类号: G06F17/60

    CPC分类号: G06Q30/02

    摘要: A method is disclosed for use with at least one initial document describing a technical concept suitable for licensing, the method comprising: retrieving a set of intellectual property documents from a data warehouse; partitioning the set of intellectual property documents into a plurality of document categories; classifying the set of intellectual property documents by an industry parameter; constructing a contingency table that includes a listing of industry classifications for each of the document categories, and identifying documents within a particular one of the document categories that have different industry classifications so as to identify at least one potential new licensee industry of the technical concept described in the initial document.

    摘要翻译: 公开了一种与描述适合许可的技术概念的至少一个初始文件一起使用的方法,所述方法包括:从数据仓库检索一组知识产权文档; 将知识产权文档集分为多个文档类别; 通过行业参数对知识产权文件进行分类; 构建应急表,其中包括每个文档类别的行业分类列表,以及识别具有不同行业分类的特定文档类别中的文档,以便识别描述的技术概念的至少一个潜在的新被许可人行业 在初始文件中。

    USING RULE INDUCTION TO IDENTIFY EMERGING TRENDS IN UNSTRUCTURED TEXT STREAMS
    6.
    发明申请
    USING RULE INDUCTION TO IDENTIFY EMERGING TRENDS IN UNSTRUCTURED TEXT STREAMS 失效
    使用规则诱导来识别未经修订的文本流程中的新兴趋势

    公开(公告)号:US20090292660A1

    公开(公告)日:2009-11-26

    申请号:US12126829

    申请日:2008-05-23

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: A method for identifying emerging concepts in unstructured text streams comprises: selecting a subset V of documents from a set U of documents; generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint inasmuch as each document of U is included in only one category of the partition; and generating a descriptive label for each of the disjoint categories from the Boolean combination of terms for that category.

    摘要翻译: 用于识别非结构化文本流中新出现的概念的方法包括:从文档集合U中选择文档的子集V; 生成将集合U分成多个类别的术语的至少一个布尔组合,所述多个类别表示所选择的子集V的广义的,统计学上的模型,其中,所述类别是不相交的,因为U的每个文档仅包括在 划分; 以及从该类别的术语的布尔组合中为每个不相交类别生成描述性标签。

    METHODOLOGIES AND ANALYTICS TOOLS FOR LOCATING EXPERTS WITH SPECIFIC SETS OF EXPERTISE
    7.
    发明申请
    METHODOLOGIES AND ANALYTICS TOOLS FOR LOCATING EXPERTS WITH SPECIFIC SETS OF EXPERTISE 有权
    定位专家专家的方法与分析工具

    公开(公告)号:US20080301105A1

    公开(公告)日:2008-12-04

    申请号:US12134098

    申请日:2008-06-05

    IPC分类号: G06F7/06 G06F17/30

    摘要: A method and analytics tools for locating experts with specific sets of expertise are disclosed, the method including providing a collection of documents P0; generating categories representing fields of expertise derived from the collection of documents P0; refining the taxonomy of the categories by applying user domain knowledge; extracting structured fields from the collection of documents P0; constructing a contingency table having a first axis defined by the extracted structured fields and a second axis defined by the categories; and using the contingency table to identify a set of experts having a related expertise. The method may also include a network graph analysis that aids visualization of the relationship between people and expertise.

    摘要翻译: 公开了一种用于定位具有特定专业知识的专家的方法和分析工具,该方法包括提供文档集合P0; 生成代表从文献P0收集的专业领域的类别; 通过应用用户域知识来提炼类别的分类; 从文件集合P0提取结构化字段; 构造具有由所提取的结构化场定义的第一轴和由所述类别定义的第二轴的应变表; 并使用应变表来识别一组具有相关专长的专家。 该方法还可以包括有助于可视化人与专业知识之间的关系的网络图分析。

    Versioning data warehouses
    8.
    发明授权
    Versioning data warehouses 有权
    版本化数据仓库

    公开(公告)号:US08078570B2

    公开(公告)日:2011-12-13

    申请号:US12434378

    申请日:2009-05-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30592

    摘要: A method, system, and computer program product are disclosed. Exemplary embodiments of the method, system, and computer program product may include hardware, process steps, and computer program instructions for supporting versioning in a data warehouse. The data warehouse may include a data warehouse engine for creating a data warehouse including a fact table and temporary tables. Updated or new data records may be transferred into the data warehouse and bulk loaded into the temporary tables. The updated or new data records may be evaluated for attributes matching existing data records. A version number may be assigned to data records and data records may be marked as being the most current version. Updated and new data records may be bulk loaded from the temporary tables into the fact table when a version number or a version status is calculated.

    摘要翻译: 公开了一种方法,系统和计算机程序产品。 方法,系统和计算机程序产品的示例性实施例可以包括用于支持数据仓库中的版本控制的硬件,处理步骤和计算机程序指令。 数据仓库可以包括用于创建包括事实表和临时表的数据仓库的数据仓库引擎。 更新的或新的数据记录可能会被传输到数据仓库中并批量加载到临时表中。 可以针对与现有数据记录匹配的属性来评估更新的或新的数据记录。 可以将版本号分配给数据记录,并且可以将数据记录标记为最新版本。 当计算版本号或版本状态时,更新的新数据记录可能会从临时表批量加载到事实表中。

    Methodologies and analytics tools for locating experts with specific sets of expertise
    9.
    发明授权
    Methodologies and analytics tools for locating experts with specific sets of expertise 失效
    用于定位具有特定专业知识的专家的方法和分析工具

    公开(公告)号:US07792786B2

    公开(公告)日:2010-09-07

    申请号:US11674606

    申请日:2007-02-13

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method and analytics tools for locating experts with specific sets of expertise are disclosed, the method including providing a collection of documents P0; generating categories representing fields of expertise derived from the collection of documents P0; refining the taxonomy of the categories by applying user domain knowledge; extracting structured fields from the collection of documents P0; constructing a contingency table having a first axis defined by the extracted structured fields and a second axis defined by the categories; and using the contingency table to identify a set of experts having a related expertise. The method may also include a network graph analysis that aids visualization of the relationship between people and expertise.

    摘要翻译: 公开了一种用于定位具有特定专业知识的专家的方法和分析工具,该方法包括提供文档集合P0; 生成代表从文献P0收集的专业领域的类别; 通过应用用户域知识来提炼类别的分类; 从文件集合P0提取结构化字段; 构造具有由所提取的结构化场定义的第一轴和由所述类别定义的第二轴的应变表; 并使用应变表来识别一组具有相关专长的专家。 该方法还可以包括有助于可视化人与专业知识之间的关系的网络图分析。

    Methodologies and analytics tools for identifying white space opportunities in a given industry
    10.
    发明授权
    Methodologies and analytics tools for identifying white space opportunities in a given industry 有权
    用于识别给定行业中的空白机会的方法和分析工具

    公开(公告)号:US08060505B2

    公开(公告)日:2011-11-15

    申请号:US11674598

    申请日:2007-02-13

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method for analyzing predefined subject matter in a patent database being for use with a set of target patents, each target patent related to the predefined subject matter, the method comprising: creating a feature space based on frequently occurring terms found in the set of target patents; creating a partition taxonomy based on a clustered configuration of the feature space; editing the partition taxonomy using domain expertise to produce an edited partition taxonomy; creating a classification taxonomy based on structured features present in the edited partition taxonomy; creating a contingency table by comparing the edited partition taxonomy and the classification taxonomy to provide entries in the contingency table; and identifying all significant relationships in the contingency table to help determine the presence of any white space.

    摘要翻译: 一种用于分析专利数据库中预定主题的方法,用于与一组目标专利一起使用,每个目标专利与预定义的主题相关,所述方法包括:基于在目标集合中发现的经常出现的项来创建特征空间 专利; 基于特征空间的集群配置创建分区分类; 使用领域专业知识编辑分区分类,以产生编辑的分区分类; 根据编辑的分区分类中存在的结构化特征创建分类分类; 通过比较编辑的分区分类法和分类分类法来创建应急表,以提供应急表中的条目; 并确定应急表中的所有重要关系,以帮助确定任何空白的存在。