摘要:
A generic and expandable document aspect system and method for searching, browsing, presenting, and interacting with data assembled from document contents and related external data is provided. New varieties of document aspects are added to existing installations and can be accessed by users without requiring upgrades to server or clients, for example by using plug-in technology.
摘要:
An end user, by way of a submission interface, instructs an engine to select particular collections of documents to process. The engine processes all the text from within all the documents from within the selected collections. The result of the processing of such text is a distilled data set. Such distillate data set is accessed through APIs by a browser. Different views of the accessed distillate data set may be presented to the end user via the browser allowing them to more effectively assess the utility of the presented data and thereby responsively tune the presented data set with regard to their particular research task. One or more of such views may be used to create a new document from sentences, paragraphs, chapters or documents from the distillate data set that correspond to the one or more views for presentation to the end user.
摘要:
A method and system for making information available of a computer system and compensating information owners or creators for access to said information.In various aspects, the invention provides a mechanism for giving users meaningful access to information via a computer system and network while protecting the interests of publishers and creators in information. The invention provides a solution for information including, but not limited to: text, graphics, photos, executable files, data tables, audio, video, and three dimensional data. In a further aspect, the invention comprises a new method for allowing a user to review a document while connected to a network but prevents the user from downloading, printing, or copying the document unless a fee is paid. In a further aspect, the invention comprises a new method for allowing a user to review documents at a first cost basis (which may be free), but only provides other access to documents, such as copying, printing, or downloading on a second cost basis. In a further aspect, the invention comprises a new method for allowing a user to purchase a selectable portion of a document at a price based on the amount of material selected where that amount of material can include a portion of a document, an entire document, or an anthology of components of multiple documents.
摘要:
Methods and systems for analyzing an image, such as a newspaper or magazine pager or the like including text by mapping the image to determine regions of text and analyzing portions of the image in accordance with characteristics of selected regions of the text to develop a desired ordering of at least the selected regions in accordance with a textual relationship between the selected regions. The desired order may be related to the order in which the selected regions, and or words therein, are to be presented in a different format appropriate for a specific use, such by a human reader, for transferring the text over a network, for use in a database or by a search function, word processor or printer. Normalizing, columnizing, regionalizing, frameset building and article tracing functions may be used to develop the desired order in related regions in an article within the image.
摘要:
Methods and systems for analyzing an image, such as a newspaper or magazine pager or the like including text by mapping the image to determine regions of text and analyzing portions of the image in accordance with characteristics of selected regions of the text to develop a desired ordering of at least the selected regions in accordance with a textual relationship between the selected regions. The desired order may be related to the order in which the selected regions, and or words therein, are to be presented in a different format appropriate for a specific use, such by a human reader, for transferring the text over a network, for use in a database or by a search function, word processor or printer. Normalizing, columnizing, regionalizing, frameset building and article tracing functions may be used to develop the desired order in related regions in an article within the image.
摘要:
A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.
摘要:
A generic and expandable document aspect system and method for searching, browsing, presenting, and interacting with data assembled from document contents and related external data is provided. New varieties of document aspects are added to existing installations and can be accessed by users without requiring upgrades to server or clients, for example by using plug-in technology.
摘要:
A mechanism gives users meaningful access to information while protecting the interests of publishers and creators of information including text, graphics, photos, executable files, data tables, audio, video, and three dimensional data and allows a user to review a document while connected to a network but prevents the user from downloading, printing, or copying the document unless a fee is paid. The user is allowed to review documents at a first cost basis, but only provides other access to documents, such as copying, printing, or downloading on a second cost basis. The user is also allowed to purchase a selectable portion of a document at a price based on the amount of material selected where that amount of material can include a portion of a document, an entire document, or an anthology of components of multiple documents.
摘要:
A mechanism gives users meaningful access to information while protecting the interests of publishers and creators of information including text, graphics, photos, executable files, data tables, audio, video, and three dimensional data and allows a user to review a document while connected to a network but prevents the user from downloading, printing, or copying the document unless a fee is paid. The user is allowed to review documents at a first cost basis, but only provides other access to documents, such as copying, printing, or downloading on a second cost basis. The user is also allowed to purchase a selectable portion of a document at a price based on the amount of material selected where that amount of material can include a portion of a document, an entire document, or an anthology of components of multiple documents.
摘要:
A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint. One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.