摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
A method for identifying a brand name is described herein. The method involves obtaining category keywords associated with a category, designating a subgroup of the category keywords as brand name keywords for a particular brand name, receiving a search term, determining that the search term is a brand name keyword, and identifying the particular brand name corresponding to the brand name keyword.
摘要:
A system is disclosed for obtaining and aggregating opinions generated by multiple sources with respect to one or more objects. The disclosed system uses observed variables associated with an opinion and a probabilistic model to estimate latent properties of that opinion. With those latent properties, the disclosed system may enable publishers to reliably and comprehensively present object information to interested users.
摘要:
Embodiments are directed towards managing a display of search results by employing a query-classification for a search query to selectively display trust search results that are displayed distinct from non-trust search results. A search query is classified into a query-class. A search is then performed over non-trust sources, and selectively over trust data sources to obtain non-trust and trust search results, respectively. The trust search results are rank ordered based on various categories of search criteria, including, for example, explicit and implicit relationships. Based on the query-class, a different number of trust search results may be displayed. Further, a position for which the trust search results may be displayed may be based on the query-class. Moreover, the non-trust search results displayed distinct or separate from the trust search results to readily distinguish a type of source of the search results.
摘要:
Embodiments are directed towards managing a display of search results by employing a query-classification for a search query to selectively display trust search results that are displayed distinct from non-trust search results. A search query is classified into a query-class. A search is then performed over non-trust sources, and selectively over trust data sources to obtain non-trust and trust search results, respectively. The trust search results are rank ordered based on various categories of search criteria, including, for example, explicit and implicit relationships. Based on the query-class, a different number of trust search results may be displayed. Further, a position for which the trust search results may be displayed may be based on the query-class. Moreover, the non-trust search results displayed distinct or separate from the trust search results to readily distinguish a type of source of the search results.
摘要:
Disclosed are methods and apparatus for extracting information from one or more documents. A training and execution plan is received, and such plan specifies invocation of a trainer operator for initiating training of a trainee operator based on a set of training documents so as to generate a new trained operator that is to then be invoked so as to extract information from one or more unknown documents. The trainee operator is configured to extract information from one or more unknown documents, and each training document is associated with classified information. After receipt of the training and execution plan, the trainer operator is automatically executed to train the trainee operator based on the specified training documents so as to generate a new trained operator for extracting information from documents. The new trained operator is a new version of the trainee operator. After receipt of the training and execution plan, both the trainee operator are automatically retained for later use in extracting information from one or more unknown documents and the new trained operator for later use in extracting information from one or more unknown documents. After receipt of the training and execution plan, the new trained operator is automatically executed on one or more unknown documents so as to extract information from such one or more unknown documents.
摘要:
After receipt of a training and execution plan, a trainer operator is automatically trained based on specified training documents so as to generate a new trained operator for extracting information from documents. The new trained operator is a new version of the trainee operator. Both trainee operators are automatically retained for later use in extracting information from one or more unknown documents. After receipt of the training and execution plan, the new trained operator is automatically executed on one or more unknown documents so as to extract information from such one or more unknown documents.
摘要:
A system is disclosed for obtaining and aggregating opinions generated by multiple sources with respect to one or more objects. The disclosed system uses observed variables associated with an opinion and a probabilistic model to estimate latent properties of that opinion. With those latent properties, the disclosed system may enable publishers to reliably and comprehensively present object information to interested users.