ANALYSING DATA FILES
    1.
    发明公开
    ANALYSING DATA FILES 审中-公开
    分析数据文件

    公开(公告)号:EP1072003A1

    公开(公告)日:2001-01-31

    申请号:EP99918135.7

    申请日:1999-04-21

    IPC分类号: G06F17/30

    摘要: Data files (205) are categorised in order to facilitate the searching for information. The analysis is performed in order to identify items which may be considered as having high value without actually being directly specified. Occurrences of unspecified candidate items are identified (207) in contexts for a preferred specified category. Occurrences of unspecified candidate items are identified (209) in non-preferred contexts. The preferred occurrences are processed (211) with the non-preferred occurrences for each candidate item in order to select candidate items as being high value items. In the preferred embodiment, data relating to companies is identified without specific company names being defined.

    METHOD AND APPARATUS FOR GENERATING MACHINE-READABLE ASSOCIATION FILES
    2.
    发明公开
    METHOD AND APPARATUS FOR GENERATING MACHINE-READABLE ASSOCIATION FILES 审中-公开
    方法和设备,用于创建计算机可读相关文件

    公开(公告)号:EP1073974A1

    公开(公告)日:2001-02-07

    申请号:EP99918127.4

    申请日:1999-04-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705

    摘要: Asociation files (153, 154, 155) are generated that are suitable for determining whether a data file (151) belongs to a predetermined category (A, B). A plurality of included files (156) belonging to the category are stored in combination with a plurality of excluded files (157) not belonging to the category. Included files (156) are processed to identify candidate terms for an association file (155). The suitability of candidate terms is assessed with references to occurrences in the included files (156) in addition, the suitability is also assessed with reference to occurrences in the excluded files (157) so as to provide definition terms for an association file. Thus, if a term identified as a candidate also appears frequently in the excluded files (157) it is likely to be assessed as unsuitable for inclusion within the new association file.