摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
A method for identifying a brand name is described herein. The method involves obtaining category keywords associated with a category, designating a subgroup of the category keywords as brand name keywords for a particular brand name, receiving a search term, determining that the search term is a brand name keyword, and identifying the particular brand name corresponding to the brand name keyword.
摘要:
Methods, systems and computer readable mediums are provided for indexing network resources. One method includes accessing, using one or more computer systems, a data store of menu items. The method further includes accessing identification information associated with one or more food providers from one or more data sources. One or more network resources are crawled based on the identification information to search for one or more menu items in the data store of menu items associated with corresponding ones of the food providers. Using the one or more computing systems, an index feed is generated, the index feed comprising the identification information of one or more of the food providers, and one or more menu items associated with the identification information of corresponding food providers based on the crawl and search.
摘要:
Methods, systems and computer readable mediums are provided for indexing network resources. One method includes accessing, using one or more computer systems, a data store of menu items. The method further includes accessing identification information associated with one or more food providers from one or more data sources. One or more network resources are crawled based on the identification information to search for one or more menu items in the data store of menu items associated with corresponding ones of the food providers. Using the one or more computing systems, an index feed is generated, the index feed comprising the identification information of one or more of the food providers, and one or more menu items associated with the identification information of corresponding food providers based on the crawl and search.
摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
The invention provides a method and system to compare data objects. Each data object is converted into a directed acyclic graph forest, which comprises one or more directed acyclic graphs. The directed acyclic graph forests corresponding to data objects are then compared to calculate a similarity score between the data objects. The similarity score is then used as a measure to determine the extent of similarity between the data objects.
摘要:
Embodiments are directed towards managing a display of search results by employing a query-classification for a search query to selectively display trust search results that are displayed distinct from non-trust search results. A search query is classified into a query-class. A search is then performed over non-trust sources, and selectively over trust data sources to obtain non-trust and trust search results, respectively. The trust search results are rank ordered based on various categories of search criteria, including, for example, explicit and implicit relationships. Based on the query-class, a different number of trust search results may be displayed. Further, a position for which the trust search results may be displayed may be based on the query-class. Moreover, the non-trust search results displayed distinct or separate from the trust search results to readily distinguish a type of source of the search results.
摘要:
A system for serving advertisements in a networked environment includes a web feed ad server operable to receive web feed information, identify concept terms in the web feed information, match advertisements to the concept terms, and communicate the advertisement to a terminal. Concept terms are identified by comparing terms in the web feed to information in an encyclopedia database, a product listing database, and/or a bidded keyword database. Rewrites associated with the concept terms are generated by a sponsored search ad system. The concept terms and rewrites are placed in a document and communicated to a context matching ad system operable to match an advertisement to the content of the document.
摘要:
The invention provides a method and system to compare data objects. Each data object is converted into a directed acyclic graph forest, which comprises one or more directed acyclic graphs. The directed acyclic graph forests corresponding to data objects are then compared to calculate a similarity score between the data objects. The similarity score is then used as a measure to determine the extent of similarity between the data objects.
摘要:
A method, apparatus, and computer-readable medium are provided for matching items of user-generated content to entities is provided. Items of user-generated content, such as status updates, are gathered. For each of the items, a machine determines a degree to which the item is associated with an entity. In one aspect, items are matched to an entity by matching the content of the items to attributes of the entity. In another aspect, items are matched to an entity by predicting attributes of an author of the items and determining a distance between the predicted attributes of the author and the attributes of the entity. The distance may be a physical distance between locations of the entity and user or a contextual distance between categories for the entity and posts by the author. Items matched to the entity may be displayed on an interface concurrently with information about the entity.