摘要:
A method and system for publishing a plurality of books for user access to information includes selecting a plurality of books, converting each book from a publisher's digital form, e.g., by training a tool to detect characteristic features (such as layout, typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or data information of the book tagged with the features. This produces a searchable library database arranged, for example, as an xml database indexed by book structure such that a user may remotely, over the internet or other network, access the database, search desired content, and view an image of a portion of the book with the desired data. The system includes a user registration module to identify an authorized user, and may maintain a personal bookshelf for the user. A search engine may score search results based on their position in the hierarchy or other factors, determining degree of relevance of text or data information located by the search engine. The other factors may include position of located search data in the hierarchy, identification of search data in the user's personal library or in a prior search by the user, or degree of match of data identified in the search. An interface with a commercially available search engine may operate to adapt the search. When provided a search query by a user, it may search for an exact match and score hits for relevance, and in the event an exact match is not found, operate to expand the query and return hits in order of rank together with an indication of the expanded search. The user may thus ascertain a degree of likely relevance of returned text or data information. The relational database may include hyperlinks to section headings and related data passages, such that a user accessing a page of a book may immediately view related data and context of a page. The relational database is indexed by logical subunits of the book such that expanded searches for Boolean combinations or proximity of elements span page breaks of book text to identify all instances of the desired search data. The search engine may expand a search if all hits have low ranking, and may suppress hits of low ranking when the search produces hits of high ranking. In further embodiments, the search engine may search tables, drawings and formulae of the converted book file.
摘要:
A method and system for publishing a plurality of books for user access includes selecting a plurality of books, converting each book from a layout or publication digital data form, by providing and applying a tool to detect characteristic features (such as layout, typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or data information of the book tagged with the features. This produces a searchable publishing database indexed by book structure such that a user may remotely, access the database, search desired text data, and view an image of a portion of the book with the desired data. A search engine may score search results based on their position in the hierarchy or other factors, determining degree of relevance of text or data information located by the search engine.