COMPUTER IMPLEMENTED METHOD FOR INDEXING AND RETRIEVING DOCUMENTS IN DATABASE AND INFORMATION RETRIEVAL SYSTEM
    1.
    发明公开
    COMPUTER IMPLEMENTED METHOD FOR INDEXING AND RETRIEVING DOCUMENTS IN DATABASE AND INFORMATION RETRIEVAL SYSTEM 审中-公开
    计算机实现的索引方法,寻找恢复在数据库中存储RE-搜索文档文件及系统

    公开(公告)号:EP2248051A1

    公开(公告)日:2010-11-10

    申请号:EP09715807.5

    申请日:2009-02-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30681 G10L15/26

    摘要: An information retrieval system stores and retrieves documents using particles and a particle-based language model. A set of particles for a collection of documents in a particular language is constructed from training documents such that a perplexity of the particle-based language model is substantially lower than the perplexity of a word-based language model constructed from the same training documents. The documents can then be converted to document particle graphs from which particle-based keys are extracted to form an index to the documents. Users can then retrieve relevant documents using queries also in the form of particle graphs.

    摘要翻译: 信息检索系统存储和检索利用粒子和基于粒子的语言模型的文档。 一种在特定语言文档的收集组颗粒从训练文档构建研究做了基于粒子的语言模型的困惑比从同样的锻炼文档构建一个基于词的语言模型的困惑大大降低。 然后,文件可以被转换为记录从其中基于粒子的键粒子图表来索引到文档中提取的形式。 因此,用户可以检索,然后使用查询在粒子图表的形式相关文件。

    CONCEPT BASED CROSS MEDIA INDEXING AND RETRIEVAL OF SPEECH DOCUMENTS
    4.
    发明公开
    CONCEPT BASED CROSS MEDIA INDEXING AND RETRIEVAL OF SPEECH DOCUMENTS 审中-公开
    网络媒体的条款和语言文档需要建立索引

    公开(公告)号:EP2030132A2

    公开(公告)日:2009-03-04

    申请号:EP07777361.2

    申请日:2007-06-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30746 G06F17/30681

    摘要: Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value.

    A morphological/phonetic method for ranking word similarities
    8.
    发明公开
    A morphological/phonetic method for ranking word similarities 失效
    用于排列词类似的形态/电话方法

    公开(公告)号:EP0271664A3

    公开(公告)日:1991-11-27

    申请号:EP87115183.3

    申请日:1987-10-16

    IPC分类号: G06F17/20

    摘要: A computer method is disclosed for ranking word similarities which is applicable to a variety of dictionary applications such as synonym generation, linguistic analysis, document characterization, etc. The method is based upon transforming an input word string into a key word which is invariant for certain types of errors in the input word, such as the doubling of letters, consonant/vowel transpositions, consonant/consonant transpositions. The specific mapping technique is a morphological mapping which generates keys which will have similarities that can be detected during a subsequent ranking procedure. The mapping is defined such that unique consonants of the input word are listed in their original order followed by the unique vowels for the input words, also in their original order. The keys thus generated will be invariant for consonant/vowel transpositions or doubled letters. The utility of the keys is further improved by arranging the consonants in the keys in alphabetical order followed by arranging the vowels in the keys in alphabetical order. The resultant mapping is insensitive to consonant/consonant transpositions, as well as consonant/vowel transpositions and doubled letters. The method then continues by applying a ranking technique which makes use of a compound measure of similarity for ranking the key words. By first measuring the number of basic operations needed to convert an input-derived key word into a dictionary-derived key word (the higher the number, the less similar are the words) and then secondly measuring the length of identical character segments in each pair of key words being matched (the longer the length, the greater the similarity), there is developed a scoring system for ranking the similarity of an input word to dictionary-derived key words, which ignores misspellings in the input word

    METHOD AND SYSTEM FOR GENERATING PHONETICALLY SIMILAR MASKED DATA
    10.
    发明公开
    METHOD AND SYSTEM FOR GENERATING PHONETICALLY SIMILAR MASKED DATA 审中-公开
    用于产生通用掩蔽数据的方法和系统

    公开(公告)号:EP3258399A1

    公开(公告)日:2017-12-20

    申请号:EP17176384.0

    申请日:2017-06-16

    IPC分类号: G06F17/30 G06Q10/10

    摘要: A method and system is provided for generating a group of phonetically similar masked data. The present application provides a method and system for generating a group of phonetically similar masked data; comprises preprocessing of input dataset values comprising a list of fictitious data values to be used as masked data; determining a plurality of groups of phonetically similar data values present in the dataset list; and deriving metaphone for each input data value to be masked; generating a first numeric code from derived metaphone value of input data value to be masked; selecting one group of phonetically similar data values out of the plurality of groups of phonetically similar data values based on the generated first numeric code; and generating a second numeric code from input data value for selecting a masked value from a plurality of fictitious data group.

    摘要翻译: 提供了一种方法和系统,用于生成一组发音相似的屏蔽数据。 本申请提供了一种用于生成一组语音相似的被屏蔽数据的方法和系统; 包括对输入数据集值的预处理,所述输入数据集值包括将被用作掩码数据的虚拟数据值的列表; 确定存在于数据集列表中的多组语音相似数据值; 并为每个要被掩蔽的输入数据值导出metaphone; 从要被掩蔽的输入数据值的派生metaphone值生成第一数字代码; 基于所生成的第一数字代码从所述多个语音相似数据值组中选择一组语音相似的数据值; 以及从输入数据值中产生用于从多个虚构数据组中选择一个掩蔽值的第二数字代码。