Abstract:
A similarity search may be performed on the image of a person, using visual characteristics and information that is known about the person. The search identifies images of other persons that are similar in appearance to the person in the image.
Abstract:
Methods and systems that label a web page by collecting a set of inbound labels for the web page, estimating a language model for the web page, computing the likelihood of generating each inbound label given the language model and assigning a score to each inbound label based on this likelihood, and assigning a label to the web page based on the score assigned to each of the set of inbound labels. Inbound labels are preferably collected from the set of web documents linking to the web page. Labels assigned are useful in providing labeled links to web pages from top hosts in search results pages.
Abstract:
A method includes generating, a plurality of sets of pairs of records from a set of records, for each attribute-position pair in the set of records. Each attribute-position pair being indicative of a position of an attribute in a record. Further, the method includes forming, electronically, a plurality of groups, each group comprising two attribute-position pairs having different attributes. Further, the method also includes determining, electronically for each group, number of pairs of records that are common in the two attribute-position pairs of that group. Furthermore, the method includes extracting results based on a first group of the plurality of groups if the number of pairs of records that are common in the two attribute-position pairs of the first group is greater than a second threshold, is highest among the plurality of groups, and no group having three or more attribute-position pairs with different attributes is possible.
Abstract:
A method includes generating, electronically, one or more matching patterns for one or more pairs of attribute values. Each pair includes two attribute values. The two attribute values include a first attribute value from a first record and a second attribute value from a second record. The first attribute value and the second attribute value satisfy a first criterion. Further, the method includes identifying, electronically, matching segment between the first attribute value and the second attribute value of a first pair. The method also includes repeating identifying for each pair. Moreover, the method includes computing a similarity score for the first pair using one of the first pair and the matching segment based on the one or more matching patterns and matching segments of the one or more pairs satisfying a second criterion. The method also includes repeating computing for each pair.
Abstract:
Methods and systems that label a web page collect a set of inbound labels for the web page, estimate a language model for the web page, compute the likelihood of generating each inbound label given the language model and assign a score to each inbound label based on this likelihood, and assign a label to the web page based on the score assigned to each of the set of inbound labels. Inbound labels are preferably collected from the set of web documents linking to the web page. Labels assigned are useful in providing labeled links to web pages from top hosts in search result pages.
Abstract:
Embodiments of methods, systems and/or apparatuses relating to data processing in distributed computing environments are disclosed. In particular, methods, systems, and/or apparatuses for determining information similarly and/or performing related statistical techniques which may be implemented or operated in a distributed computing environment are disclosed.
Abstract:
Web pages are efficiently categorized in a data processor without analyzing the content of the web pages. According to at least one embodiment, data is maintained that represents sample URLs grouped into a plurality of clusters. The sample URLs of a cluster are used to produce a URL regular expression pattern (“URL-regex”) that differentiates the sample URLs of the cluster from the sample URLs of other clusters and that covers at least a specified percentage of the sample URLs in the cluster. The process of producing a URL-regex is repeated for each of the clusters producing a URL-regex for each cluster. Web pages are then categorized into one of the clusters by determining which of the URL-regex patterns produced for the clusters match URLs that refer to the web pages. Thus, a web page may be categorized based on a URL that refers to the web page without having to obtain and analyze the content of the web page.
Abstract:
A similarity search may be performed on the image of a person, using visual characteristics and information that is known about the person. The search identifies images of other persons that are similar in appearance to the person in the image.
Abstract:
Embodiments of methods, systems and/or apparatuses relating to data processing in distributed computing environments are disclosed. In particular, methods, systems, and/or apparatuses for determining information similarly and/or performing related statistical techniques which may be implemented or operated in a distributed computing environment are disclosed.
Abstract:
A similarity search may be performed on the image of a person, using visual characteristics and information that is known about the person. The search identifies images of other persons that are similar in appearance to the person in the image.