DATA PROCESSING METHOD, DATA PROCESSING SYSTEM, AND PROGRAM
    22.
    发明公开
    DATA PROCESSING METHOD, DATA PROCESSING SYSTEM, AND PROGRAM 审中-公开
    DATENVERARBEITUNGSVERFAHREN,DATENVERARBEITUNGSSYSTEM UND PROGRAMM

    公开(公告)号:EP1429258A1

    公开(公告)日:2004-06-16

    申请号:EP02746128.4

    申请日:2002-07-19

    IPC分类号: G06F17/28

    摘要: [Object] Provided is a support system or a method for efficiently enabling generation of candidate synonyms, when a thesaurus usable in text mining is created.
    [Constitution] A candidate synonym acquisition device 130 acquires a set of candidate synonyms similar to an input word for each writer from data 110 for each writer, and acquires a set of candidate synonyms similar to the input word from a collective data 120. A generated candidate synonym set 140 is inputted to a candidate synonym determination device 150 to evaluate the candidate synonyms of the collective data 120. In the evaluation, the status of "absolute" is given to a word matching a word ranked first in the candidate synonyms for each writer and the status of "negative" is given to words matching words ranked second and lower therein.

    摘要翻译: ÄObjectÜ提供的是一种支持系统或一种方法,用于在创建可用于文本挖掘的同义词库时有效地启用候选同义词生成。 候选同义词获取装置130从每个写入器的数据110获取类似于每个写入器的输入字的一组候选同义词,并且从集合数据120获取类似于输入字的一组候选同义词。 生成的候选同义词集140被输入到候选同义词确定装置150,以评估集体数据120的候选同义词。在评估中,给出与候选同义词首先排列的词匹配的单词的“绝对”状态 对于每个作者,“负”的状态被给予匹配在其中第二和第二的词的匹配词。

    STATISTICAL THESAURUS, METHOD OF FORMING SAME, AND USE THEREOF IN QUERY EXPANSION IN AUTOMATED TEXT SEARCHING
    23.
    发明公开
    STATISTICAL THESAURUS, METHOD OF FORMING SAME, AND USE THEREOF IN QUERY EXPANSION IN AUTOMATED TEXT SEARCHING 失效
    统计词库的制造方法以及用于自动文本搜索查询扩展使用

    公开(公告)号:EP0901660A4

    公开(公告)日:2001-07-04

    申请号:EP97908789

    申请日:1997-03-07

    IPC分类号: G06F17/30

    摘要: A statistical thesaurus is built dynamically, from the same text collection that is being searched, allowing improved generation of expanded query terms. The thesaurus is dynamic in that thesaurus records are collected, ranked, accessed, and applied dynamically. Thesaurus "records" are actually formed as indexed documents arranged in "collections". The collections are preferably distinguished based on text source. Each record has terms assembled in indexed groups which inherently reflect a ranking based on relevance to an initial query. After an initial query is received, the appropriate collection(s) of records may be searched by a conventional search and retrieval engine, the searches inherently returning records ranked by degree of relevance due the record indexing scheme. A record ranking scheme avoids contamination of relevant records by less relevant records. The record selection and the expansion query term generation processes are each divided into parallel threads. The separate threads correspond to respective text sources to enable the improved expansion query term generation to be provided in real time.

    Method and apparatus for integrating a dynamic lexicon into a full-text information retrieval system
    24.
    发明公开
    Method and apparatus for integrating a dynamic lexicon into a full-text information retrieval system 失效
    将动态LEXICON整合到全文信息检索系统中的方法和装置

    公开(公告)号:EP0520488A3

    公开(公告)日:1993-10-13

    申请号:EP92110807.2

    申请日:1992-06-26

    IPC分类号: G06F15/401

    摘要: An information retrieval system including a plurality of indices representative of information stored in the information retrieval system and a dynamic lexicon is disclosed. The system includes memory having a database stored therein, the database being logically divided to include the plurality of indices, an information database having information objects stored therein and a dynamic lexicon which includes a plurality of data items and groups of data items that appear in the information database. A predetermined time variable represents the last time the plurality of indices were reindexed. After changes are made to the lexicon, a time stamp is attached to each one of the plurality of changes to the lexicon to indicate when the change was made to the lexicon. At some specified time interval later, the reindexing process is invoked. This process involves selecting a subset of the plurality of changes made to the lexicon after the predetermined time variable, locating all information objects in the information database that are affected by the plurality of changes to the lexicon, reindexing the portions of the plurality of indices representative of the information objects affected by the changes to the lexicon to reflect the changes in the lexicon, and then updating the predetermined time variable to indicate changes to the lexicon have been processed. The foregoing process is repeated until all changes to the lexicon after the predetermined time have been applied to the plurality of indices.

    METHODS FOR MANAGING APPLICATIONS USING SEMANTIC MODELING AND TAGGING AND DEVICES THEREOF
    25.
    发明公开
    METHODS FOR MANAGING APPLICATIONS USING SEMANTIC MODELING AND TAGGING AND DEVICES THEREOF 审中-公开
    使用语义建模和标签管理应用程序的方法及其装置

    公开(公告)号:EP2973047A1

    公开(公告)日:2016-01-20

    申请号:EP14762904.2

    申请日:2014-03-14

    申请人: PTC Inc.

    IPC分类号: G06F17/30

    摘要: The present disclosure provides a system and method for managing data using semantic tags. The method may include providing a data model corresponding to a first set of tangible objects where the data model includes a first template class having both properties describing the set of tangible object and a set of semantic tags corresponding to the properties. The method may include receiving a class definition for a second template class for a second set of tangible objects where the second template class inherits, by the class definition, the properties and the sematic tags for the second set of tangible objects.

    摘要翻译: 本公开提供了一种使用语义标签来管理数据的系统和方法。 该方法可以包括提供对应于第一组有形对象的数据模型,其中数据模型包括具有描述该组有形对象的属性和对应于该属性的一组语义标签的第一模板类。 该方法可以包括接收第二组有形对象的第二模板类的类定义,其中第二模板类通过类定义继承第二组有形对象的属性和选择标签。

    METHOD AND APPARATUS FOR IDENTIFYING SYNONYMS AND USING SYNONYMS TO SEARCH
    27.
    发明公开
    METHOD AND APPARATUS FOR IDENTIFYING SYNONYMS AND USING SYNONYMS TO SEARCH 审中-公开
    方法和设备用于识别同义词和近义词的使用进行搜索

    公开(公告)号:EP2425353A4

    公开(公告)日:2014-05-28

    申请号:EP10769390

    申请日:2010-04-23

    IPC分类号: G06F17/30 G06F17/27

    摘要: A method and an apparatus for identifying synonym and utilizing such synonym to conduct search is disclosed. The disclosed method includes: obtaining arbitrary two words to be identified; determining whether a shortest edit distance between the two words less than or equal to an edit distance threshold; determining whether the two words to be identified exist in a preset knowledge database, and if an answer is yes then searching a smallest granularity type with highest weight value for each word in the knowledge database; and if the two word have the same smallest granularity type with highest weight value, then determining such two words are synonyms, or non-synonym otherwise. The disclosed techniques greatly improve accuracy of synonym identification and guarantee effect of synonym identification.

    Related-word registration device, information processing device, related-word registration method, program for related-word registration device, and recording medium
    28.
    发明公开
    Related-word registration device, information processing device, related-word registration method, program for related-word registration device, and recording medium 有权
    装置,用来登记注册相关联的词语的关联的单词的方法,程序装置,用于登记相关联的词和记录介质

    公开(公告)号:EP2650805A2

    公开(公告)日:2013-10-16

    申请号:EP13175515.9

    申请日:2011-11-07

    申请人: Rakuten, Inc.

    发明人: Hirate, Yu

    IPC分类号: G06F17/30

    摘要: A search query of a search word entered is received by the user, the received search queries are stored in accordance with reception order in a search query storing means (12a), a preceding search query whose reception order is earlier than that of the received search query is extracted from the search query storing means on the basis of a preset search query extracting condition, a preceding search word constructing the extracted preceding search query and a search word constructing the received search query are stored as a character string set in a character string set storing means (12d), a character string set having the search word which is the same or similar to the preceding search word are extracted from the character string set storing means in accordance with a preset character string set extraction start condition (S51); a character set is specified as a related word from the extracted character string set on the basis of a preset registration condition (S53), and the specified character string set is registered as related words into a related-word database (S54).

    摘要翻译: 输入的用户接收的搜索词的搜索查询,接收到的搜索查询存储在雅舞蹈与接收顺序在搜索查询存储单元(12A),一个preceding-搜索查询其接收顺序比所接收的搜索的早期 查询被从搜索查询中存储的预先设定的搜索查询提取条件的基础上装置提取,一个preceding-搜索词构建提取preceding-的搜索查询和搜索词构建接收的搜索查询存储为字符串设定的文字串 设置存储装置(12D),字符串设置具有搜索词的所有其是相同或相似的preceding-搜索字从设定存储装置中雅舞蹈与预先设定的字符串设定提取开始条件(S51)的字符串中提取; 字符集被指定为从预设注册条件(S53)的基础上设定的提取的字符串相关的字,并且指定的字符串组被登记为相关字转换为一个关联词数据库(S54)。

    RELATED-WORD REGISTRATION DEVICE, INFORMATION PROCESSING DEVICE, RELATED-WORD REGISTRATION METHOD, PROGRAM FOR RELATED-WORD REGISTRATION DEVICE, AND RECORDING MEDIUM
    29.
    发明公开
    RELATED-WORD REGISTRATION DEVICE, INFORMATION PROCESSING DEVICE, RELATED-WORD REGISTRATION METHOD, PROGRAM FOR RELATED-WORD REGISTRATION DEVICE, AND RECORDING MEDIUM 有权
    注册系统相关的字,信息处理装置,简化注册手续相关的词语程序注册设备相关的字和记录介质

    公开(公告)号:EP2639705A1

    公开(公告)日:2013-09-18

    申请号:EP11839828.8

    申请日:2011-11-07

    申请人: Rakuten, Inc.

    发明人: HIRATE Yu

    IPC分类号: G06F17/30

    摘要: A related-word candidate group (12b) obtained by extracting candidates of a related word on the basis of a predetermined condition from a search query log (12a) is generated (S1 to S4), a search query of a search word entered by the user is received (S10), a partial character string is generated from a character string of the search word (S13), on the basis of the partial character strings, a candidate character string is extracted from the related-word candidate group (S14), a suitability score of the candidate character string is calculated (S16), the candidate character strings are ranked in order of the scores (S17), a reference line L1 of a suitability score for the ranking is generated on the basis of the suitability score and the ranking (S18), a candidate character string whose suitability score is apart from the reference line by a preset threshold or larger is extracted as a registration character string to be registered as a related word (S19), and the extracted registration character string and the search word are registered as related words into the related-word DB 12c (S20).

    摘要翻译: 一个相关字的候选组(12B)通过从(12A)被产生(S1至S4)中,由所输入的搜索词的搜索查询的搜索查询日志预定条件的基础上,提取一个相关字的候补获得 用户被接收(S10),从搜索字(S13)的字符串生成的部分字符串,部分文字串的基础上,一个候选字符串被从关联词候选组中提取(S14) 中,候选字符串的合适性分数的计算(S16),所述候选字符串在分数(S17),一个适宜分数的排名的基准线L1的适合性分数的基础上产生的大的起依次 和排名(S18),一个候选字符串是谁的适合性分数是离开该基准线由一预设的阈值或更大的被提取为一个登记字符串要被登记为一个相关字(S19),并且注册提取 字符串,检索词被注册为关联词为关联词DB 12C(S20)。

    INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
    30.
    发明公开
    INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM 审中-公开
    信息公开信息,信息收集,信息收集,信息收集

    公开(公告)号:EP1574968A1

    公开(公告)日:2005-09-14

    申请号:EP03778809.8

    申请日:2003-12-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30737 G06F17/2795

    摘要: The invention makes it possible to detect the characteristics of text data, and to analogize potential hidden meaning in the text data.
    A word-cutting unit 3 performs a word-cutting process on the text data input from the input unit 1, a syntax-analysis unit 4 performs syntax analysis and a thesaurus-creation unit 5 creates thesauruses from the results, then after performing word cutting and syntax analysis again, a thesaurus-sorting unit 7 performs sorting, and a frequency-of-appearance unit calculates the frequency of appearance of the thesauruses, a correlation-coefficient-calculation unit 11 calculates correlation coefficients between thesauruses, a correlation-coefficient-total-calculation unit 13 for each thesaurus calculates the total of the correlation coefficients for each thesaurus, the graph-creation-display unit 15 creates a graph based on the frequency of appearance and total of the correlation coefficients for each thesaurus.

    摘要翻译: 本发明可以检测文本数据的特征,并且将文本数据中潜在的隐含意义类似。 切割单元3对从输入单元1输入的文本数据进行切割处理,语法分析单元4执行语法分析,词库创建单元5从结果创建词典,然后 在再次执行字切割和语法分析之后,词库分类单元7执行排序,并且出现频率单位计算词典的出现频率,相关系数计算单元11计算词典之间的相关系数,a 相关系数 - 总计算单元13为每个辞典者计算每个词库的相关系数的总和,图形创建显示单元15基于出现的频率和每个词库的相关系数的总和创建图形。