SYSTEM AND METHOD FOR SEARCH INDEXING
    3.
    发明申请

    公开(公告)号:US20170116240A1

    公开(公告)日:2017-04-27

    申请号:US15401980

    申请日:2017-01-09

    申请人: FUJITSU LIMITED

    IPC分类号: G06F17/30

    摘要: A system includes circuitry configured to: read a plurality of character information and a plurality of identifiers that are included in a text file; determine whether a character information among the plurality of character information is included between the at least one pair of identifiers among the plurality of identifiers in the text file; and associate the character information with the at least one pair of identifiers when it is determined that the character information is included between the at least one pair of identifiers.

    IDENTIFYING ENTITY MAPPINGS ACROSS DATA ASSETS
    5.
    发明申请
    IDENTIFYING ENTITY MAPPINGS ACROSS DATA ASSETS 有权
    识别实体数据资源映射

    公开(公告)号:US20170075984A1

    公开(公告)日:2017-03-16

    申请号:US14853823

    申请日:2015-09-14

    IPC分类号: G06F17/30

    摘要: Entity mappings that produce matching entities for a first data asset having attributes and a second data asset having attributes are generated by: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first data asset with the attribute values of the attributes of the second data asset, using the matching attribute values to generate matching attribute pairs, and using the matching attribute pairs to identify entity mappings; computing an entity mapping score for each of the entity mappings based on a combination of factors; ranking the entity mappings based on each entity mapping score; and using some of the ranked entity mappings to determine whether a same real-world entity is described by the first data asset and the second data asset.

    摘要翻译: 通过以下方式生成具有属性的第一数据资产产生匹配实体的实体映射和具有属性的第二数据资产:生成实体映射,其产生具有属性值属性的第一数据资产的匹配实体,以及具有属性属性的第二数据资产 值:通过使用匹配的属性值来匹配第一数据资产的属性的属性值与第二数据资产的属性的属性值,以生成匹配的属性对,并且使用匹配的属性对来识别实体映射; 基于因素的组合计算每个实体映射的实体映射分数; 基于每个实体映射分数对实体映射进行排名; 并使用一些排名的实体映射来确定是否由第一数据资产和第二数据资产描述相同的真实世界实体。

    Text sample entry group formulation
    6.
    发明授权
    Text sample entry group formulation 有权
    文本样本输入组公式

    公开(公告)号:US09535983B2

    公开(公告)日:2017-01-03

    申请号:US14066505

    申请日:2013-10-29

    IPC分类号: G06F17/30

    摘要: Storing text samples in a manner that the text samples may be quickly searched. The text samples are assigned a text sample identifier and are each parsed to thereby extract text components from the text samples. Text components that have the same content are assigned the same text component identifier. For each parsed text component, a text component entry is created that includes the assigned text component identifier as well as the text sample identifier for the text sample from which the text component was parsed. A text sample entry group is created for each text sample that contains the text component entries in sequence for the text components found within the text sample. The text sample entry groups are stored so as to be scannable during a future search.

    摘要翻译: 以可以快速搜索文本样本的方式存储文本样本。 为文本样本分配一个文本样本标识符,并分别对其进行解析,从而从文本样本中提取文本组件。 具有相同内容的文本组件被分配相同的文本组件标识符。 对于每个已解析的文本组件,将创建一个文本组件条目,其中包含所分配的文本组件标识符以及从中分析文本组件的文本样本的文本样本标识符。 为每个文本样本创建文本样本条目组,其中包含文本样本中找到的文本组件的文本组件条目。 存储文本样本条目组,以便在将来搜索期间可扫描。

    Incremental maintenance of inverted indexes for approximate string matching
    7.
    发明授权
    Incremental maintenance of inverted indexes for approximate string matching 有权
    用于近似字符串匹配的反向索引的增量维护

    公开(公告)号:US09514172B2

    公开(公告)日:2016-12-06

    申请号:US13595270

    申请日:2012-08-27

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Encoding and accessing position data
    8.
    发明授权
    Encoding and accessing position data 有权
    编码和访问位置数据

    公开(公告)号:US09507827B1

    公开(公告)日:2016-11-29

    申请号:US13173532

    申请日:2011-06-30

    申请人: Tomi Poutanen

    发明人: Tomi Poutanen

    IPC分类号: G06F7/00 G06F17/30

    摘要: In one embodiment, a data structure comprises: a primary index comprising one or more position-block references; and one or more position blocks sequentially following the primary index, wherein: each one of the position-block references corresponds to one of the position blocks; and each one of the position blocks comprises: a secondary index comprising one or more position-data references; and one or more sets of positions sequentially following the secondary index, wherein each one of the position-data references corresponds to of one of the sets of positions in the position block. In one embodiment, an instance of the data structure is stored in a computer-readable memory and accessible by an application executed by a process.

    摘要翻译: 在一个实施例中,数据结构包括:主索引,其包括一个或多个位置块引用; 和一个或多个顺序地跟随主要索引的位置块,其中:位置块参考中的每一个对应于位置块中的一个; 并且每个所述位置块包括:包括一个或多个位置数据引用的辅助索引; 以及顺序地跟随次级索引的一组或多组位置,其中位置数据引用中的每一个对应于位置块中的一组位置。 在一个实施例中,数据结构的实例被存储在计算机可读存储器中,并且可由进程执行的应用访问。

    Information generating computer product, apparatus, and method; and information search computer product, apparatus, and method
    9.
    发明授权
    Information generating computer product, apparatus, and method; and information search computer product, apparatus, and method 有权
    信息生成计算机产品,仪器和方法; 和信息搜索计算机产品,仪器和方法

    公开(公告)号:US09501557B2

    公开(公告)日:2016-11-22

    申请号:US13686228

    申请日:2012-11-27

    申请人: FUJITSU LIMITED

    发明人: Masahiro Kataoka

    IPC分类号: G06F7/00 G06F17/30 G06F17/00

    摘要: A computer-readable recording medium stores a program causing a computer to execute an information generating process that includes tabulating an appearance frequency for each designated word in an object file group in which character strings are described; identifying for each designated word and based on the appearance frequency tabulated for the designated word, a rank in descending order up to a target appearance rate for the designated words; detecting in an object file selected from the object file group, specific designated words among the identified ranks; and generating for each of the detected specific designated words, index information that indicates the presence/absence of the specific designated word in each object file among the object file group.

    摘要翻译: 计算机可读记录介质存储使计算机执行信息生成处理的程序,所述程序包括列出描述了字符串的目标文件组中每个指定单词的出现频率; 识别每个指定单词,并根据为指定单词列出的出现频率,按指定单词的目标出现率降序排列; 在从所述目标文件组中选择的对象文件中检测所识别的等级中的特定指定单词; 并且针对每个所检测到的特定指定单词生成指示目标文件组中每个目标文件中特定指定单词的存在/不存在的索引信息。

    DYNAMIC THRESHOLD GATES FOR INDEXING QUEUES
    10.
    发明申请
    DYNAMIC THRESHOLD GATES FOR INDEXING QUEUES 有权
    用于指导队伍的动态门槛

    公开(公告)号:US20160259785A1

    公开(公告)日:2016-09-08

    申请号:US14635093

    申请日:2015-03-02

    IPC分类号: G06F17/30

    摘要: Electronic files are selectively assigned to a plurality of different indexing queues by one or more dynamic throughput threshold gates based on characteristics of the different indexing queues as well as the static file characteristics associated with each of the files. The files are then indexed. Upon detecting a change in a dynamic characteristic of one or more indexed files, the throughput threshold gate(s) are then modified to obtain, maintain or modify a desired throughput for one or more of the indexing queues.

    摘要翻译: 基于不同索引队列的特性以及与每个文件相关联的静态文件特征,电子文件通过一个或多个动态吞吐量阈值门选择性地分配给多个不同的索引队列。 然后将文件编入索引。 一旦检测到一个或多个索引文件的动态特性的变化,则修改吞吐量阈值门以获得,维护或修改一个或多个索引队列的期望吞吐量。