HOLISTIC DISAMBIGUATION FOR ENTITY NAME SPOTTING
    1.
    发明申请
    HOLISTIC DISAMBIGUATION FOR ENTITY NAME SPOTTING 有权
    用于实体名称点播的HOLISTIC DISAMBIGATION

    公开(公告)号:US20100223292A1

    公开(公告)日:2010-09-02

    申请号:US12394078

    申请日:2009-02-27

    IPC分类号: G06F17/30 G06F17/27

    CPC分类号: G06F17/278

    摘要: A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.

    摘要翻译: 一种方法通过确定对应于单个模糊实体名称的多个节点中的每个节点的激活水平值来解决数据语料库中的歧义发现实体名称。 可以通过输入与节点对应的外部领域知识来修改每个节点的激活水平,以增加节点的激活值,发现对应于节点的实体名称以增加节点的激活值,搜索数据语料库 发现新发布的实体名称以增加节点的激活值,并且通过消除假阳性来搜索数据语料库来减少或去激活节点的激活值。 将不明确的实体名称分配给确定具有最高激活电平的节点,然后将其输出给用户。

    SYSTEM FOR MONITORING GLOBAL ONLINE OPINIONS VIA SEMANTIC EXTRACTION
    2.
    发明申请
    SYSTEM FOR MONITORING GLOBAL ONLINE OPINIONS VIA SEMANTIC EXTRACTION 有权
    通过语义提取监测全球在线意见的系统

    公开(公告)号:US20100223226A1

    公开(公告)日:2010-09-02

    申请号:US12394646

    申请日:2009-02-27

    IPC分类号: G06N5/04

    CPC分类号: G06Q30/02

    摘要: A system for transforming domain specific unstructured data into structured data including an intake platform controlled by feed back from a control platform. The intake platform includes an intake acquisition module for acquiring data building baseline data related to a domain and problem of interest, an intake pre-processing module, an intake language module, an intake application descriptors module, and an intake adjudication module. The control platform includes a control data acquisition module, a control data consistency collator, a control auditor, a control event definition and policy repository, an error resolver, and an output that outputs results of the workflow into structured data enabled to be used in data analytics.

    摘要翻译: 一种将域特定非结构化数据转换成结构化数据的系统,包括通过从控制平台反馈控制的进气平台。 进气平台包括用于获取与感兴趣的领域和问题相关的数据建立基线数据的进气采集模块,进气预处理模块,进气语言模块,进气应用描述模块和进气判定模块。 控制平台包括一个控制数据采集模块,一个控制数据一致性整理器,一个控制审核员,一个控制事件定义和策略存储库,一个错误解析器和一个输出,该工作流将结果数据输出到能够在数据中使用的结构化数据 分析

    System for monitoring global online opinions via semantic extraction
    3.
    发明授权
    System for monitoring global online opinions via semantic extraction 有权
    通过语义提取来监测全球在线意见的系统

    公开(公告)号:US08352412B2

    公开(公告)日:2013-01-08

    申请号:US12394646

    申请日:2009-02-27

    IPC分类号: G06F17/00 G06N7/00 G06N7/08

    CPC分类号: G06Q30/02

    摘要: A system for transforming domain specific unstructured data into structured data including an intake platform controlled by feed back from a control platform. The intake platform includes an intake acquisition module for acquiring data building baseline data related to a domain and problem of interest, an intake pre-processing module, an intake language module, an intake application descriptors module, and an intake adjudication module. The control platform includes a control data acquisition module, a control data consistency collator, a control auditor, a control event definition and policy repository, an error resolver, and an output that outputs results of the workflow into structured data enabled to be used in data analytics.

    摘要翻译: 一种将域特定非结构化数据转换成结构化数据的系统,包括通过从控制平台反馈控制的进气平台。 进气平台包括用于获取与感兴趣的领域和问题相关的数据建立基线数据的进气采集模块,进气预处理模块,进气语言模块,进气应用描述模块和进气判定模块。 控制平台包括一个控制数据采集模块,一个控制数据一致性整理器,一个控制审核员,一个控制事件定义和策略存储库,一个错误解析器和一个输出,该工作流将结果数据输出到能够在数据中使用的结构化数据 分析

    Holistic disambiguation for entity name spotting
    4.
    发明授权
    Holistic disambiguation for entity name spotting 有权
    整体排除实体名称的歧义

    公开(公告)号:US08856119B2

    公开(公告)日:2014-10-07

    申请号:US12394078

    申请日:2009-02-27

    IPC分类号: G06F17/30 G06F7/00 G06F17/27

    CPC分类号: G06F17/278

    摘要: A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.

    摘要翻译: 一种方法通过确定对应于单个模糊实体名称的多个节点中的每个节点的激活水平值来解决数据语料库中的歧义发现实体名称。 可以通过输入与节点对应的外部领域知识来修改每个节点的激活水平,以增加节点的激活值,发现对应于节点的实体名称以增加节点的激活值,搜索数据语料库 发现新发布的实体名称以增加节点的激活值,并且通过消除假阳性来搜索数据语料库来减少或去激活节点的激活值。 将不明确的实体名称分配给确定具有最高激活电平的节点,然后将其输出给用户。

    DATA DEDUPLICATION FOR STREAMING SEQUENTIAL DATA STORAGE APPLICATIONS
    5.
    发明申请
    DATA DEDUPLICATION FOR STREAMING SEQUENTIAL DATA STORAGE APPLICATIONS 有权
    用于流式排序数据存储应用的数据分配

    公开(公告)号:US20110185149A1

    公开(公告)日:2011-07-28

    申请号:US12695127

    申请日:2010-01-27

    IPC分类号: G06F12/10 G06F12/00

    摘要: Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.

    摘要翻译: 提供流存储应用中的重复数据删除压缩技术。 所公开的重复数据删除过程提供重复数据删除存档,其能够将存档存储到流存储介质和从流存储介质提取。 一个实施方式涉及通过以下方式将存储在数据存储库中的完全顺序数据压缩到顺序流存储:将完全顺序数据分解成数据块; 每个数据块的散列内容并将每个散列与用于匹配的存储器内查找表进行比较,所述存储器内查找表存储在完全顺序数据的压缩期间遇到的所有散列; 对于没有散列匹配的每个数据块,将数据块添加为用于压缩完全顺序数据的新数据块; 以及使用存储器内查找表将重复数据块编码成数据段。

    Method and apparatus for data compression
    6.
    发明授权
    Method and apparatus for data compression 失效
    用于数据压缩的方法和装置

    公开(公告)号:US08380688B2

    公开(公告)日:2013-02-19

    申请号:US12613597

    申请日:2009-11-06

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: H03M7/30

    摘要: A method, system, and article for compressing an input stream of uncompressed data. The input stream is divided into one or more data segments. A hash is applied to a first data segment, and an offset and length are associated with this first segment. This hash, together with the offset and length data for the first segment, is stored in a hash table. Thereafter, a subsequent segment within the input stream is evaluated and compared with all other hash entries in the hash table, and a reference is written to a prior hash for an identified duplicate segment. The reference includes a new offset location for the subsequent segment. Similarly, a new hash is applied to an identified non-duplicate segment, with the new hash and its corresponding offset stored in the hash table. A compressed output stream of data is created from the hash table retained on storage media.

    摘要翻译: 用于压缩未压缩数据的输入流的方法,系统和文章。 输入流被分成一个或多个数据段。 散列应用于第一数据段,并且偏移和长度与该第一段相关联。 该散列与第一段的偏移量和长度数据一起存储在散列表中。 此后,对输入流中的后续段进行评估,并与散列表中的所有其他哈希条目进行比较,并将引用写入到所识别的重复段的先前散列。 该引用包括用于后续段的新的偏移位置。 类似地,将新的散列应用于所识别的非重复段,其中新的散列及其对应的偏移存储在散列表中。 压缩的输出数据流从保留在存储介质上的散列表创建。

    Method and Apparatus for Data Compression
    7.
    发明申请
    Method and Apparatus for Data Compression 失效
    数据压缩方法与装置

    公开(公告)号:US20110113016A1

    公开(公告)日:2011-05-12

    申请号:US12613597

    申请日:2009-11-06

    IPC分类号: G06F17/30 G06F7/00 G06F12/16

    CPC分类号: H03M7/30

    摘要: A method, system, and article for compressing an input stream of uncompressed data. The input stream is divided into one or more data segments. A hash is applied to a first data segment, and an offset and length are associated with this first segment. This hash, together with the offset and length data for the first segment, is stored in a hash table. Thereafter, a subsequent segment within the input stream is evaluated and compared with all other hash entries in the hash table, and a reference is written to a prior hash for an identified duplicate segment. The reference includes a new offset location for the subsequent segment. Similarly, a new hash is applied to an identified non-duplicate segment, with the new hash and its corresponding offset stored in the hash table. A compressed output stream of data is created from the hash table retained on storage media.

    摘要翻译: 用于压缩未压缩数据的输入流的方法,系统和文章。 输入流被分成一个或多个数据段。 散列应用于第一数据段,并且偏移和长度与该第一段相关联。 该散列与第一段的偏移量和长度数据一起存储在散列表中。 此后,对输入流中的后续段进行评估,并与散列表中的所有其他哈希条目进行比较,并将引用写入到所识别的重复段的先前散列。 该引用包括用于后续段的新的偏移位置。 类似地,将新的散列应用于所识别的非重复段,其中新的散列及其对应的偏移存储在散列表中。 压缩的输出数据流从保留在存储介质上的散列表创建。

    Data deduplication for streaming sequential data storage applications
    8.
    发明授权
    Data deduplication for streaming sequential data storage applications 有权
    流顺序数据存储应用程序的重复数据删除

    公开(公告)号:US08407193B2

    公开(公告)日:2013-03-26

    申请号:US12695127

    申请日:2010-01-27

    IPC分类号: G06F17/00

    摘要: Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.

    摘要翻译: 提供流存储应用中的重复数据删除压缩技术。 所公开的重复数据删除过程提供重复数据删除存档,其能够将存档存储到流存储介质和从流存储介质提取。 一个实施方式涉及通过以下方式将存储在数据存储库中的完全顺序数据压缩到顺序流存储:将完全顺序数据分解成数据块; 每个数据块的散列内容并将每个散列与用于匹配的存储器内查找表进行比较,所述存储器内查找表存储在完全顺序数据的压缩期间遇到的所有散列; 对于没有散列匹配的每个数据块,将数据块添加为用于压缩完全顺序数据的新数据块; 以及使用存储器内查找表将重复数据块编码成数据段。

    Operating System and File System Independent Incremental Data Backup
    9.
    发明申请
    Operating System and File System Independent Incremental Data Backup 有权
    操作系统和文件系统独立的增量数据备份

    公开(公告)号:US20110113012A1

    公开(公告)日:2011-05-12

    申请号:US12614134

    申请日:2009-11-06

    IPC分类号: G06F17/30

    CPC分类号: G06F11/1451 G06F2201/815

    摘要: Embodiments of the invention relate to creating an operating system and file system independent incremental data backup. A first data backup of a source system and second version of the data on the source system is received. A second data backup of the second version of the data is created by determining differences between the first data backup and the second version of the data. Each portion of the second version of the data that is the same as a portion of the first data backup is referenced in the second data backup. Each portion of the second version of the data that is different than all portions of the first data backup is included in the second data backup. The second data backup is appended to the first data backup to create an incremental data backup.

    摘要翻译: 本发明的实施例涉及创建操作系统和文件系统独立的增量数据备份。 收到源系统的第一个数据备份和源系统数据的第二个版本。 通过确定第一个数据备份和数据的第二个版本之间的差异来创建数据的第二个版本的第二个数据备份。 与第一数据备份的一部分相同的数据的第二版本的每个部分在第二数据备份中被引用。 与第一数据备份的所有部分不同的数据的第二版本的每个部分被包括在第二数据备份中。 第二个数据备份附加到第一个数据备份以创建增量数据备份。

    Method and framework to support indexing and searching taxonomies in large scale full text indexes
    10.
    发明授权
    Method and framework to support indexing and searching taxonomies in large scale full text indexes 有权
    支持大规模全文索引分类和搜索索引的方法和框架

    公开(公告)号:US08600997B2

    公开(公告)日:2013-12-03

    申请号:US11241687

    申请日:2005-09-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30734

    摘要: A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.

    摘要翻译: 一种对位于分类法中的多个实体进行索引的系统和方法,所述实体包括术语集合,包括在索引结构中接收术语; 为一个实体建立关于定义与各个条款相关联的实体和数据的术语集的位置的实体的发布列表; 并且在包括在每个位置处的相应实体的名称的组的数据的实体的位置处索引包括在该组内的实体的组的名称。 发布列表的构建包括将术语的位置和与该术语相关联的数据存储在该术语的发布列表中的条目中。 该方法包括对包括该术语的组的名称的别名进行索引,并使用反向列表索引将数据与索引项的每次出现相关联。