System and method for the recognition of organic chemical names in text documents
    1.
    发明申请
    System and method for the recognition of organic chemical names in text documents 失效
    用于识别文本文件中有机化学名称的系统和方法

    公开(公告)号:US20050065776A1

    公开(公告)日:2005-03-24

    申请号:US10670675

    申请日:2003-09-24

    CPC classification number: G06F17/278

    Abstract: This invention provides a method, a system and a computer program for recognizing technical terms. In the preferred embodiment the technical terms are chemical names, and in a most preferred embodiment the technical terms are organic chemical names. A computer program product stores in a computer readable form a set of computer program instructions for directing at least one computer to process a text document. The set of computer program instructions include instructions for assigning corresponding associated parts of speech to words found in the document. The instructions for assigning include instructions to apply a plurality of regular expressions, rules and a plurality of dictionaries to recognize organic chemical name fragments, to combine recognized organic chemical name fragments into a complete organic chemical name, and to assign the complete organic chemical name with one part of speech. The regular expressions include a plurality of patterns, individual ones of which are comprised of at least one of characters, numbers and punctuation. For example, the punctuation can comprise at least one of parenthesis, square bracket, hyphen, colon and semi-colon, and the characters can comprise at least one of upper case C, O, R, N and H, and further comprise strings of at least one of lower case xy, ene, ine, yl, ane and oic.

    Abstract translation: 本发明提供一种用于识别技术术语的方法,系统和计算机程序。 在优选实施方案中,技术术语是化学名称,并且在最优选的实施方案中,技术术语是有机化学名称。 计算机程序产品以计算机可读形式存储用于指导至少一台计算机处理文本文档的一组计算机程序指令。 该组计算机程序指令包括用于将相应的相关词组分配给文档中找到的单词的指令。 用于分配的指令包括应用多个正则表达式,规则和多个词典来识别有机化学名称片段的指令,将已识别的有机化学名称片段合并成完整的有机化学名称,并将完整的有机化学名称与 一部分讲话。 正则表达式包括多个模式,其中各个模式由字符,数字和标点符号中的至少一个组成。 例如,标点符号可以包括括号,方括号,连字符,冒号和分号中的至少一个,并且字符可以包括大写C,O,R,N和H中的至少一个,并且还包括 小写xy,ene,ine,yl,ane和oic中的至少一个。

    System and method for the indexing of organic chemical structures mined from text documents
    2.
    发明申请
    System and method for the indexing of organic chemical structures mined from text documents 失效
    从文本文件开采有机化学结构索引的系统和方法

    公开(公告)号:US20050203898A1

    公开(公告)日:2005-09-15

    申请号:US10797359

    申请日:2004-03-09

    CPC classification number: G06F19/707

    Abstract: Disclosed is a method, a computer program product and a system for processing documents that contain chemical names. The system has a unit to partition document text and to assign semantic meaning to words; a unit to recognize any substructures present in the chemical name fragments; and a unit to determine structural connectivity information of the chemical name fragments and recognized substructures and to store the determined structural connectivity information in a searchable index. The system further includes a unit to search a text index using at least one of a fragment name and a substructure name and to search the structure index by at least one of fragment connectivity and substructure connectivity. At an intersection of the search results from the structure index and the text index, the system operates to identify at least one document that contains a reference to a corresponding chemical compound.

    Abstract translation: 公开了一种用于处理含有化学名称的文件的方法,计算机程序产品和系统。 该系统具有分隔文档文本并为语义分配语义的单位; 识别化学名称片段中存在的任何亚结构的单元; 以及用于确定化学名称片段和识别的子结构的结构连接性信息并将确定的结构连接性信息存储在可搜索的索引中的单元。 该系统还包括使用片段名称和子结构名称中的至少一个来搜索文本索引的单元,并且通过片段连接性和子结构连接性中的至少一个来搜索结构索引。 在结构索引和文本索引的搜索结果的交集处,系统操作以识别包含对相应化合物的引用的至少一个文档。

Patent Agency Ranking