摘要:
Disclosed is a computer-implemented method of determining smarty between first and second elements of an electronic document. The method uses a computer to calculate a plurality of measures of similarity between the first and second elements in at least two representations of the electronic document. A computer program product and system implementing this method are also disclosed.
摘要:
A computer-implemented method for obtaining the rendering co-ordinates of visible text elements on a web page is disclosed. The web page is represented by an input data structure comprising a plurality of text nodes, each of which represents a text element on the web page. The method comprises the following steps: a) using a computer device, wrapping each of the plurality of text nodes in a pair of mark-up language tags; b) using said computer device, obtaining the co-ordinates of a bounding rectangle for each text node using the mark-up language tags; c) using said computer device, attaching an attribute specifying the co-ordinates of the bounding rectangle to each text node; and d) using said computer device, determining whether each text node is invisible, and if it is, excluding it from an output data structure comprising the plurality of text nodes and attached attributes.
摘要:
A system and method for an adaptive threshold Web Page segmenting is disclosed. In one embodiment, a method performed by a physical computing system having one or more processors for segmenting a Web page including a plurality of nodes includes parsing content in the Web page into the plurality of nodes using the physical computing system, obtaining feature values between each pair of nodes using the physical computing system, estimating an adaptive threshold value using the obtained feature values using the physical computing system, and segmenting the Web page by comparing the feature values associated with each pair of nodes with the estimated adaptive threshold value.
摘要:
Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.
摘要:
Semantically ranking content in a website (110) with a computerized ranking device (105) includes: parsing content from the website (110) into multiple autonomous content blocks (415-1 to 415-17) with the computerized ranking device (105) and assigning an importance ranking with said computerized ranking device (105) to each of the content blocks (415-1 to 415-17) based on a degree to which a substance of the content block (415-1 to 415-17) is relevant to one of a plurality of predefined categories.
摘要:
A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
摘要:
A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
摘要:
Recursive data naming is disclosed. A name is provided corresponding to a desired data item. A get procedure is defined, and is used upon the name. The get procedure recursively uses itself upon the metadata name, to retrieve a metadata item associated with the desired data item. The get procedure retrieves the desired data item.
摘要:
The invention provides for encryption of hierarchically structured information. In one embodiment, a method is provided for encrypting hierarchically structured information. The hierarchically structured information includes a particular node and zero or more descendent nodes, each node having a name and zero or more additional data for the node. The name of the particular node is encrypted and the encrypted name is stored. The one or more descendent nodes are stored with the parent-child relationships of the one or more descendent nodes exposed. Additional data for the particular node may be encrypted. Further, some, none or all of the data for the descendent nodes may be encrypted.
摘要:
Techniques for verifying whether an incremental update was correctly applied to a set of hierarchically structured information include determining an overall integrity code for the hierarchically structured information and attaching the overall integrity code to the hierarchically structured information. An incremental update according to the present techniques includes an integrity code that is combined into the overall integrity code attached to the hierarchically structured information when the incremental update is applied to the hierarchically structured information. The integrity code of the incremental update is generated such that when the overall integrity code is recomputed it will match the overall integrity code attached to the hierarchically structured information if the incremental update was correctly applied.