摘要:
A system and method are provided for extracting main content from a web page. Web page segmentation is performed on a web page to provide affinity-grouped segments. Descriptive features of at least one of the affinity-grouped segments are computed. At least one of the affinity-grouped segments is classified as a main body segment based on the computed descriptive features. Additional affinity-grouped segments are classified as to a document function based on the computed descriptive features. Classified affinity-grouped segments are assembled according to their classified document functions to provide the main content.
摘要:
Segmenting a web page (110) into coherent function blocks (705-1 to 705-8) includes parsing content from the web page (110) into multiple coherent, collectively exhaustive nodes (405-1 to 405-37); calculating at least one matrix (500, 600, 605-1 to 605-4) of affinity values between each of the nodes (405-1 to 405-37); and clustering the nodes (405-1 to 405-37) into functional blocks (705-1 to 705-8) based on the affinity values in the at least one matrix (500, 600, 605-1 to 605-4).
摘要:
Disclosed is a computer-implemented method of determining smarty between first and second elements of an electronic document. The method uses a computer to calculate a plurality of measures of similarity between the first and second elements in at least two representations of the electronic document. A computer program product and system implementing this method are also disclosed.
摘要:
A computer-implemented method for obtaining the rendering co-ordinates of visible text elements on a web page is disclosed. The web page is represented by an input data structure comprising a plurality of text nodes, each of which represents a text element on the web page. The method comprises the following steps: a) using a computer device, wrapping each of the plurality of text nodes in a pair of mark-up language tags; b) using said computer device, obtaining the co-ordinates of a bounding rectangle for each text node using the mark-up language tags; c) using said computer device, attaching an attribute specifying the co-ordinates of the bounding rectangle to each text node; and d) using said computer device, determining whether each text node is invisible, and if it is, excluding it from an output data structure comprising the plurality of text nodes and attached attributes.
摘要:
A method, system, and computer program product for selecting web page content based on user permission for collecting user-selected content within web pages (FIG. 4, 400) may comprise accessing web page data associated with a currently viewed web page (FIG. 4, 400), the web page data comprising a popular selection of content on the currently viewed web page (FIG. 4, 408) (505), with an electronic client device, presenting the popular selection of content of the currently viewed web page (FIG. 4, 400) to a user (535), and prompting the user to agree to the use of the user's selected content within a number of web pages in exchange for use of the popular selection of content on the web page (FIG. 4, 400). The web page content is selected, based on the user's response.
摘要:
A request for print content is received at a network server system. The request includes variable user input. Webpage content is obtained based at least in part on the variable user input. A subset of the webpage content is identified as print content. A print-ready layout of the print content is formed and the print content in the print-ready layout is provided, via network connection, to a client in response to the request.
摘要:
A method of creating an application for the popular selection of content on a web page (FIG. 4, 400) may comprise collecting web page data associated with a web page (FIG. 4, 400), the web page data comprising a selection of content on the web page (FIG. 4, 400) (Block 505), with a processor, determining among the selection of content of the web page, which content is popular (Block 510), and creating an application based on the popular selection of content of the web page (Block 515).
摘要:
A method, system, and computer program product for selecting web page content based on user permission for collecting user-selected content within web pages (FIG. 4, 400) may comprise accessing web page data associated with a currently viewed web page (FIG. 4, 400), the web page data comprising a popular selection of content on the currently viewed web page (FIG. 4, 408) (505), with an electronic client device, presenting the popular selection of content of the currently viewed web page (FIG. 4, 400) to a user (535), and prompting the user to agree to the use of the user's selected content within a number of web pages in exchange for use of the popular selection of content on the web page (FIG. 4, 400). The web page content is selected, based on the user's response.
摘要:
A request for print content is received at a network server system. The request includes variable user input. Webpage content is obtained based at least in part on the variable user input. A subset of the webpage content is identified as print content. A print-ready layout of the print content is formed and the print content in the print-ready layout is provided, via network connection, to a client in response to the request.
摘要:
Systems and methods are provided for transforming a document into interactive media content. A system can include a memory for storing computer executable instructions and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions can include an engine to generate a dynamic composition of the text blocks and visual blocks of the document, based on semantic features of the text blocks and the visual blocks, to provide the interactive media content.