SYSTEM AND METHOD FOR PROVIDING A SEARCHABLE LIBRARY OF ELECTRONIC DOCUMENTS TO A USER

    公开(公告)号:US20080065609A1

    公开(公告)日:2008-03-13

    申请号:US11873681

    申请日:2007-10-17

    IPC分类号: G06F17/30

    CPC分类号: G06F16/284 Y10S707/99936

    摘要: A method and system for publishing a plurality of books for user access to information includes selecting a plurality of books, converting each book from a publisher's digital form, e.g., by training a tool to detect characteristic features (such as layout, typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or data information of the book tagged with the features. This produces a searchable library database arranged, for example, as an xml database indexed by book structure such that a user may remotely, over the internet or other network, access the database, search desired content, and view an image of a portion of the book with the desired data. The system includes a user registration module to identify an authorized user, and may maintain a personal bookshelf for the user. A search engine may score search results based on their position in the hierarchy or other factors, determining degree of relevance of text or data information located by the search engine. The other factors may include position of located search data in the hierarchy, identification of search data in the user's personal library or in a prior search by the user, or degree of match of data identified in the search. An interface with a commercially available search engine may operate to adapt the search. When provided a search query by a user, it may search for an exact match and score hits for relevance, and in the event an exact match is not found, operate to expand the query and return hits in order of rank together with an indication of the expanded search. The user may thus ascertain a degree of likely relevance of returned text or data information. The relational database may include hyperlinks to section headings and related data passages, such that a user accessing a page of a book may immediately view related data and context of a page. The relational database is indexed by logical subunits of the book such that expanded searches for Boolean combinations or proximity of elements span page breaks of book text to identify all instances of the desired search data. The search engine may expand a search if all hits have low ranking, and may suppress hits of low ranking when the search produces hits of high ranking. In further embodiments, the search engine may search tables, drawings and formulae of the converted book file.

    System and method for providing a searchable library of electronic documents to a user
    2.
    发明授权
    System and method for providing a searchable library of electronic documents to a user 有权
    用于向用户提供可搜索的电子文档库的系统和方法

    公开(公告)号:US07287214B1

    公开(公告)日:2007-10-23

    申请号:US09734494

    申请日:2000-12-11

    IPC分类号: G06F15/00 G06F17/00

    摘要: A method and system for publishing a plurality of books for user access includes selecting a plurality of books, converting each book from a layout or publication digital data form, by providing and applying a tool to detect characteristic features (such as layout, typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or data information of the book tagged with the features. This produces a searchable publishing database indexed by book structure such that a user may remotely, access the database, search desired text data, and view an image of a portion of the book with the desired data. A search engine may score search results based on their position in the hierarchy or other factors, determining degree of relevance of text or data information located by the search engine.

    摘要翻译: 用于发布用于用户访问的多本书籍的方法和系统包括选择多本书籍,通过提供和应用工具来检测特征(例如布局,字体和形式),从布局或出版物数字数据形式转换每本书 分级或组织功能,如章节标题,标题,图纸和表格),以及提取使用功能标记的书籍的文本或数据信息。 这产生了由书籍结构索引的可搜索的发布数据库,使得用户可以远程地,访问数据库,搜索期望的文本数据,并且利用期望的数据查看图书的一部分的图像。 搜索引擎可以基于其在层次结构中的位置或其他因素来评分搜索结果,确定由搜索引擎定位的文本或数据信息的相关度。