摘要:
Technologies are described herein for coreference resolution in an ambiguity-sensitive natural language processing system. Techniques for integrating reference resolution functionality into a natural language processing system can processes documents to be indexed within an information search and retrieval system. Ambiguity awareness features, as well as ambiguity resolution functionality, can operate in coordination with coreference resolution. Annotation of coreference entities, as well as ambiguous interpretations, can be supported by in-line markup within text content or by external entity maps. Information expressed within documents can be formally organized in terms of facts, or relationships between entities in the text. Expansion can support applying multiple aliases, or ambiguities, to an entity being indexed so that all of the possibly references or interpretations for that entity are captured into the index. Alternative stored descriptions can support retrieval of a fact by either the original description or a coreferential description.
摘要:
Technologies are described herein for coreference resolution in an ambiguity-sensitive natural language processing system. Techniques for integrating reference resolution functionality into a natural language processing system can processes documents to be indexed within an information search and retrieval system. Ambiguity awareness features, as well as ambiguity resolution functionality, can operate in coordination with coreference resolution. Annotation of coreference entities, as well as ambiguous interpretations, can be supported by in-line markup within text content or by external entity maps. Information expressed within documents can be formally organized in terms of facts, or relationships between entities in the text. Expansion can support applying multiple aliases, or ambiguities, to an entity being indexed so that all of the possibly references or interpretations for that entity are captured into the index. Alternative stored descriptions can support retrieval of a fact by either the original description or a coreferential description.
摘要:
Concept disambiguation is provided for search queries by analyzing search results in conjunction with an ontology of concepts. An ontology of concepts is identified, and at least one document is associated with each concept. The document associated with a concept is representative of the concept and used to generate a concept signature. When a search query is received, it is processed to obtain search results. The search results are used to generate a search results signature, which is compared to the concept signatures to identify one or more concepts that are relevant to the search query.
摘要:
Referring expressions are identified for concepts by analyzing search query and result selection information. An ontology of concepts is identified, and at least one document is associated with each concept. The document associated with a concept is representative of the concept. Search query information from a search engine is analyzed to identify search queries that resulted in user selections of documents associated with the concepts. Referring expressions that refer to the concepts are identified based on the search queries that resulted in user selections of documents corresponding with the concepts. After identifying referring expressions for concepts, search queries may be mapped to referring expressions to identify concepts to which the search queries pertain, and search result pages may be generated based on knowledge of the concepts.
摘要:
The present technology is related to identifying, from within a corpus of documents, a subject (e.g., person, location, date, etc.) that is relevant to a topic and that is usable to enhance a topic-describing document. Documents within the corpus of documents share a link structure, such that some documents include hyperlinks that enable navigation to the topic-describing document, and the topic-describing document includes hyperlinks that enable navigation to other documents. Text of documents within the corpus is parsed to identify the subject, and a context of the subject suggests a degree of relevance of the subject to the topic. An enhancement type of the subject is determined, and a version of the topic-describing document is enhanced to include a presentation of the subject.
摘要:
Computer-readable media and a computing device are described for providing geotemporal search and a search interface therefor. A search interface having a location portion and a timeline portion is provided. A geographic area is selected in the location portion by adjusting the visible area of a map. A temporal window is selected in the timeline portion by adjusting sliders along a timeline to a desired start and end time. The start and end times can be in the past, present, or future. A geotemporal search is executed based on the selected geographic area and temporal window to identify search results having associated metadata indicating a relationship to the selected geographic area and temporal window. One or more search terms are optionally provided to further refine the geotemporal search.
摘要:
Summaries of entities (e.g., people, places, things, concepts, etc.) may provide additional useful information to user. For example, a search engine may provide a summary of an entity within search results. A category (e.g., “writer”, “politician”, etc.) of the entity that is short and concise may be advantageous to provide within a summary of the entity. The category may allow a user to quickly determine whether the information of the entity relates to the intended entity (e.g., search results of an entity as “a writer” vs. search results of an entity as “a politician”). Potential categories and summary text may be extracted from pre-labeled data. The potential categories and summary text may be intersected to determine a set of candidate categories that may be ranked. An entity category having a desired ranked may be determined as the entity category that describes the entity in a desired way.
摘要:
Summaries of entities (e.g., people, places, things, concepts, etc.) may provide additional useful information to user. For example, a search engine may provide a summary of an entity within search results. A category (e.g., “writer”, “politician”, etc.) of the entity that is short and concise may be advantageous to provide within a summary of the entity. The category may allow a user to quickly determine whether the information of the entity relates to the intended entity (e.g., search results of an entity as “a writer” vs. search results of an entity as “a politician”). Potential categories and summary text may be extracted from pre-labeled data. The potential categories and summary text may be intersected to determine a set of candidate categories that may be ranked. An entity category having a desired ranked may be determined as the entity category that describes the entity in a desired way.
摘要:
Architecture that enables an optional display of a longer version of each subsnippet in response to user interactions such as clicking, hovering, or other suitable form of interaction. More specifically, options are provided to display additional text from a search result at the point where a subsnippet (a subsegment in a snippet that is delimited by ellipses) ends. Selecting suitable boundaries for both initial subsnippets and expanded subsnippets enables relevant information to be presented and increased readability.
摘要:
Architecture that enables an optional display of a longer version of each subsnippet in response to user interactions such as clicking, hovering, or other suitable form of interaction. More specifically, options are provided to display additional text from a search result at the point where a subsnippet (a subsegment in a snippet that is delimited by ellipses) ends. Selecting suitable boundaries for both initial subsnippets and expanded subsnippets enables relevant information to be presented and increased readability.