摘要:
The described subject matter provides systems and procedures to make query similarity determinations, wherein the queries are used in information retrieval operations. A same document and/or multiple similar documents are identified that have been selected by a user in response to multiple queries. Responsive to identifying the same document and/or the similar documents, a query cluster is generated that indicates that the queries used to obtain the same and/or similar documents. This is accomplished in a manner that is independent of whether individual ones of the queries are compositionally similar with respect to other ones of the queries.
摘要:
Systems and methods for clustering Web queries are described. In one aspect, one or more of a same document and a plurality of similar documents selected by a user in response to a plurality of queries is identified. Responsive to this identification, a query cluster is generated. The cleric the query cluster indicates that the queries are similar independent of whether individual ones of the queries comprise similar composition with respect to other ones of the queries.
摘要:
Systems and methods for clustering Web queries are described. In one aspect, one or more of a same document and a plurality of similar documents selected by a user in response to a plurality of queries is identified. Responsive to this identification, a query cluster is generated. The cleric the query cluster indicates that the queries are similar independent of whether individual ones of the queries comprise similar composition with respect to other ones of the queries.
摘要:
In community mining based on core objects and affiliated objects, a set of core objects for a community of objects are identified from a plurality of objects. The community is expanded, based on the set of core objects, to include a set of affiliated objects. According to one aspect, a model of a community of objects is obtained by grouping a first collection of a plurality of objects into a center portion, and grouping a second collection of the plurality of objects into one or more concentric portions around the center portion. The groupings of the first and second collections of the objects are identified as the community of objects.
摘要:
In community mining based on core objects and affiliated objects, a set of core objects for a community of objects are identified from a plurality of objects. The community is expanded, based on the set of core objects, to include a set of affiliated objects. According to one aspect, a model of a community of objects is obtained by grouping a first collection of a plurality of objects into a center portion, and grouping a second collection of the plurality of objects into one or more concentric portions around the center portion. The groupings of the first and second collections of the objects are identified as the community of objects.
摘要:
Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
摘要:
A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.
摘要:
The automated social networking graph mining and visualization technique described herein mines social connections and allows creation of a social networking graph from general (not necessarily social-application specific) Web pages. The technique uses the distances between a person's/entity's name and related people's/entities names on one or more Web pages to determine connections between people/entities and the strengths of the connections. In one embodiment, the technique lays out these connections, and then clusters them, in a 2-D layout of a social networking graph that represents the Web connection strengths among the related people's or entities' names, by using a force-directed model.
摘要:
Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.
摘要:
A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.