DICTIONARY CREATION DEVICE, WORD GATHERING METHOD AND RECORDING MEDIUM
    2.
    发明申请
    DICTIONARY CREATION DEVICE, WORD GATHERING METHOD AND RECORDING MEDIUM 审中-公开
    词典创作设备,词汇记录方法和记录介质

    公开(公告)号:US20120303359A1

    公开(公告)日:2012-11-29

    申请号:US13515135

    申请日:2010-12-03

    IPC分类号: G06F17/21

    CPC分类号: G06F17/2735 G06F16/353

    摘要: When gathering words through a dictionary growth process, a dictionary growth unit (102) stores information indicating through what process of input and output a word has been gathered in a gathering process memory unit (107). Then, a clustering unit (103) classifies the word that has been gathered by the dictionary growth process into clusters on the basis of information recorded in the gathering process memory unit (107). Next, a type determination unit (104) determines whether a word comprising a cluster is of the same type as a seed word or of a different type, for each cluster into which the word has been classified, on the basis of information recorded in the gather process memory unit (107). In addition, an output unit (105) associates information indicating the gathered word, the cluster to which the word belongs and whether the cluster is of the same type as the seed word or of a different type, and displays such.

    摘要翻译: 当通过字典增长过程收集单词时,词典生成单元(102)存储指示通过什么进程输入和输出一个单词被收集在收集处理存储单元(107)中的信息。 然后,聚类单元(103)根据记录在采集处理存储单元(107)中的信息,将由字典成长处理收集的单词分类成簇。 接下来,类型确定单元(104)基于记录在该文件中的信息,确定包含群集的单词是否与种子单词或不同类型的单词相对应, 收集过程存储单元(107)。 此外,输出单元(105)将指示所收集的单词,单词所属的集群与集群是否与种子单词或不同类型相同的类型的信息相关联,并且显示这样的信息。

    Dictionary creation device, word gathering method and recording medium

    公开(公告)号:US09600468B2

    公开(公告)日:2017-03-21

    申请号:US13515181

    申请日:2010-12-03

    IPC分类号: G06F17/30 G06F17/27

    CPC分类号: G06F17/2735 G06F17/30731

    摘要: A boundary word identification unit (103) identifies a boundary word belonging to a plurality of categories among words gathered in dictionary growth processing. Then, a category membership degree calculation unit (104) calculates, for each category to which the boundary word belongs, a category membership degree indicating a degree to which the boundary word belongs to the category on the basis of information recorded in a gathering process memory unit (108). Next, a category update unit (105) determines the category to which the boundary word belongs on the basis of the category membership degree calculated by the category membership degree calculation unit (104) and updates information stored in a gathered-by-category word memory unit (109) so that the determination result is reflected.

    Document evaluation apparatus, document evaluation method, and computer-readable recording medium using missing patterns
    4.
    发明授权
    Document evaluation apparatus, document evaluation method, and computer-readable recording medium using missing patterns 有权
    文献评估装置,文件评价方法以及使用缺失图案的计算机可读记录介质

    公开(公告)号:US09249287B2

    公开(公告)日:2016-02-02

    申请号:US14002692

    申请日:2013-02-18

    摘要: In order to accurately learn a function for evaluating documents, even in the case where sample documents having missing feature values are included as training data, a document evaluation apparatus is provided with a data classification unit (3) that classifies a set of sample documents based on missing patterns of a first feature vector, a first learning unit (4) that uses feature values that are not missing in the first feature vector and evaluation values to learn a first function for calculating a first score which is a weighted evaluation value for each classification, a feature vector generation unit (5) that computes a feature value corresponding to each classification using the first score, and generates a second feature vector having the computed feature values, and a second learning unit (6) that uses the second feature vector and the evaluation values to learn a second function for calculating a second score for evaluating documents targeted for evaluation.

    摘要翻译: 为了准确地学习用于评价文档的功能,即使在作为训练数据包含具有缺失特征值的样本文档的情况下,文档评价装置设置有数据分类单元(3),用于对一组样本文档进行分类 在第一特征向量的缺失模式上,使用在第一特征向量中不缺失的特征值的第一学习单元(4)和评估值来学习用于计算作为每个的加权评估值的第一分数的第一函数 分类,使用第一分数计算与每个分类对应的特征值的特征向量生成单元(5),并生成具有计算出的特征量的第二特征向量,以及第二学习单元(6),其使用第二特征向量 以及评估值,以学习用于计算用于评估用于评估的文档的第二分数的第二函数。

    UNEXPECTEDNESS DETERMINATION SYSTEM, UNEXPECTEDNESS DETERMINATION METHOD AND PROGRAM
    5.
    发明申请
    UNEXPECTEDNESS DETERMINATION SYSTEM, UNEXPECTEDNESS DETERMINATION METHOD AND PROGRAM 审中-公开
    独立性确定系统,独立性确定方法和程序

    公开(公告)号:US20130282727A1

    公开(公告)日:2013-10-24

    申请号:US13978811

    申请日:2012-01-06

    IPC分类号: G06F17/30

    摘要: The present invention more suitably determines whether a combination of words is an unexpected combination by the use of a smaller corpus. Disclosed is an unexpectedness determination system provided with: category identifying means which identifies a category to which a word belongs; category co-occurrence frequency identifying means which identifies a category co-occurrence frequency between two categories; unexpectedness index calculating means which calculates an index representing a degree of unexpectedness of a combination of two words. The category identifying means identifies a first category, to which an inputted first word belongs, and a second category, to which an inputted second word belongs, the category co-occurrence frequency identifying means identifies the category co-occurrence frequencies between the first category and categories other than the first category, and the unexpectedness index calculating means calculates an index representing the degree of unexpectedness of a combination of the first word and the second word on the basis of the category co-occurrence frequency identified by the category co-occurrence frequency identifying means.

    摘要翻译: 本发明更合适地通过使用较小的语料库来确定单词的组合是否是意外的组合。 公开了一种意外确定系统,其具有:识别单词所属的类别的类别识别装置; 类别同现频率识别装置,其识别两个类别之间的类别同现频率; 意外指标计算装置,其计算表示两个词组合的意外程度的指标。 类别识别装置识别输入的第一个字所属的第一类别和所输入的第二个字属于的第二类别,类别同现频率识别装置识别第一类和第二类之间的类别同现频率 除了第一类别之外的类别,并且意外指标计算装置基于由类别同现频率识别的类别同现频率来计算表示第一个字和第二个字的组合的意外程度的指标 识别手段。

    LINKAGE INFORMATION OUTPUT APPARATUS, LINKAGE INFORMATION OUTPUT METHOD AND COMPUTER-READABLE RECORDING MEDIUM
    6.
    发明申请
    LINKAGE INFORMATION OUTPUT APPARATUS, LINKAGE INFORMATION OUTPUT METHOD AND COMPUTER-READABLE RECORDING MEDIUM 有权
    链接信息输出装置,链接信息输出方法和计算机可读记录介质

    公开(公告)号:US20130007021A1

    公开(公告)日:2013-01-03

    申请号:US13583805

    申请日:2010-12-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3061 G06F17/30882

    摘要: A linkage information output apparatus includes: a linkage information retrieval unit for acquiring, upon receiving source information, destination information linked with the source information, a frequency of occurrence of the source information, a frequency of occurrence of linked each of the destination information, and a frequency of occurrence of a link of the source information and each of the destination information from a linkage information accumulation unit; a recognition degree calculation unit calculating, based on each acquired frequency of occurrence, a recognition degree of the source information, a recognition degree of each acquired destination information, and a recognition degree of each link; and a high interest information narrowing unit selecting destination information to output from among each destination information based on a combination of two or more among a recognition degree of the source information, a recognition degree of the destination information, and a recognition degree of the link.

    摘要翻译: 连动信息输出装置包括:联动信息检索单元,用于在接收到源信息时获取与源信息相关联的目的地信息,源信息的出现频率,链接的每个目的地信息的发生频率,以及 来自链接信息存储单元的源信息和每个目的地信息的链接的出现频率; 识别度计算单元,基于每个获取的出现频率,计算源信息的识别度,每个获取的目的地信息的识别度和每个链接的识别度; 以及高感兴趣度信息缩小单元,基于源信息的识别度,目的地信息的识别度和链接的识别度之间的两个或更多个的组合,从每个目的地信息中选择目的地信息。

    INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
    7.
    发明申请
    INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM 有权
    信息处理设备,信息处理方法和计算机可读记录介质

    公开(公告)号:US20120303611A1

    公开(公告)日:2012-11-29

    申请号:US13522278

    申请日:2010-12-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: The information processing device 1 processes document collections having tags permitting semantic class identification appended to each document and comprises a search unit 2, which creates multiple semantic class units containing one, two, or more semantic classes based on a taxonomy that identifies relationships between semantic classes, and a frequency calculation unit 3 which, for each of the semantic class units, identifies documents that match that semantic class unit in the document collections and, for these matching documents, calculates a first frequency that represents the frequency of occurrence in a designated document collection and a second frequency that represents the frequency of occurrence in non-designated document collections. Once the calculations have been performed, the search unit 2 identifies any of the semantic class units based on the first frequency and the second frequency of the matching documents.

    摘要翻译: 信息处理装置1处理具有附加到每个文档的允许语义类标识的标签的标签的文档集合,并且包括搜索单元2,其基于识别语义类之间的关系的分类法创建包含一个,两个或更多个语义类的多个语义类单元 以及频率计算单元3,对于每个语义类单元,识别与文档集合中的该语义类单元相匹配的文档,并且对于这些匹配文档,计算表示指定文档中出现频率的第一频率 收集,第二个频率代表非指定文件收集中的发生频率。 一旦执行了计算,搜索单元2基于匹配文档的第一频率和第二频率来识别任何语义类单元。

    Document management and retrieval system and document management and retrieval method
    8.
    发明授权
    Document management and retrieval system and document management and retrieval method 有权
    文件管理和检索系统及文件管理和检索方法

    公开(公告)号:US09454597B2

    公开(公告)日:2016-09-27

    申请号:US12741302

    申请日:2008-11-06

    IPC分类号: G06F7/00 G06F17/30

    摘要: A document management & retrieval system is configured to: store, for each word in a set of words, appearance positions of the each word in a set of documents as a word index; store, for each tag in a set of tags attached to words, a set of words that appear to a right and left of the each tag, and also store, as a tag LR index, appearance positions of the each tag in a set of documents with a combination of the each tag and a word appearing to its right or a combination of the each tag and a word appearing to its left as a key; and, in a tag search where a query phrase contains words and a tag next to each other, refer to the index with a tag and the word to the right or left of the tag as a key, thereby reducing the size of a document list to be read without needing to have a tag name as a secondary key. A tag is updated by just updating two places in the tag LR index.

    摘要翻译: 文件管理和检索系统被配置为:对于一组单词中的每个单词,将一组文档中的每个单词的外观位置存储为单词索引; 存储附加到单词的一组标签中的每个标签,出现在每个标签的右侧和左侧的一组单词,并且还将作为标签LR索引的每个标签的外观位置存储在一组 具有每个标签和右边出现的单词的组合的文档或每个标签的组合以及作为关键字出现在其左侧的单词; 并且在查询短语包含单词和彼此相邻的标签的标签搜索中,将具有标签的索引和标签右侧或左侧的单词作为关键字,从而减小文档列表的大小 无需将标签名称作为次要密钥进行读取。 通过更新标签LR索引中的两个位置来更新标签。

    RELIABILITY CALCULATION APPARATUS, RELIABILITY CALCULATION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
    9.
    发明申请
    RELIABILITY CALCULATION APPARATUS, RELIABILITY CALCULATION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM 审中-公开
    可靠性计算装置,可靠性计算方法和计算机可读记录介质

    公开(公告)号:US20140114930A1

    公开(公告)日:2014-04-24

    申请号:US14127592

    申请日:2012-12-19

    IPC分类号: G06F17/30

    CPC分类号: G06F16/93 G06Q50/10

    摘要: In order to calculate a reliability that serves as an index of reliableness of an evaluator who evaluated a document, a reliability calculation apparatus (2) is provided with a reliability calculation unit (21) that specifies an evaluation by each evaluator with respect to each author, based on first information specifying respective correspondence relationships between documents targeted for evaluation, evaluators who evaluated the documents and contents of the evaluations, and second information specifying respective correspondence relationships between the documents and authors of the documents, and calculates the reliability of each evaluator, based on the specified evaluation with respect to each author.

    摘要翻译: 为了计算作为评价文档的评价者的可靠性的指标的可靠性,可靠性计算装置(2)具有可靠性计算部(21),该可靠性计算部(21)针对各作者指定了各评价者的评价 基于指定评价对象的文档和评价内容的评价者的评价用对象关系的第一信息以及规定文件与作者之间的对应关系的第二信息,并计算各评价者的可靠性, 基于对每位作者的具体评估。

    DATA STRUCTURE, INDEX CREATION DEVICE, DATA SEARCH DEVICE, INDEX CREATION METHOD, DATA SEARCH METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
    10.
    发明申请
    DATA STRUCTURE, INDEX CREATION DEVICE, DATA SEARCH DEVICE, INDEX CREATION METHOD, DATA SEARCH METHOD, AND COMPUTER-READABLE RECORDING MEDIUM 有权
    数据结构,索引创建设备,数据搜索设备,索引创建方法,数据搜索方法和计算机可读记录介质

    公开(公告)号:US20130262470A1

    公开(公告)日:2013-10-03

    申请号:US13824740

    申请日:2011-06-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30622

    摘要: In an inverted list of each node in a taxonomy, among each node, an inverted list of the highest node is a list of integer values indicating an identifier of search subject data, and an inverted list of a node other than the highest node, in place of the identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than the node. Furthermore, a list of integer values in an inverted list of each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before the integer value in the block is converted into a bit string of a variable length integer code.

    摘要翻译: 在分类法中的每个节点的倒排列表中,在每个节点中,最高节点的反转列表是指示搜索主题数据的标识符的整数值的列表,以及除最高节点之外的节点的反转列表, 标识符的位置是指示对应于比该节点高1的节点的反转列表中的位置的整数值的列表。 此外,每个节点的反转列表中的整数值的列表被划分为两个或更多个块,并且正好在该块中的整数值之前的整数值和整数值之间的差分值被转换为 可变长度整数代码。