摘要:
The present invention provides a facility for determining, for a semantic relation that does not occur in a lexical knowledge base, whether this semantic relation should be inferred despite its absence from the lexical knowledge base. This semantic relation to be inferred is preferably made up of a first word, a second word, and a relation type relating the meanings of the first and second words. In a preferred embodiment, the facility identifies a salient semantic relation having the relation type of the semantic relation to be inferred and relating the first word to an intermediate word other than the second word. The facility then generates a quantitative measure of the similarity in meaning between the intermediate word and the second word. The facility further generates a confidence weight for the semantic relation to be inferred based upon the generated measure of similarity in meaning between the intermediate word and the second word. The facility may also generate a confidence weight for the semantic relation to be inferred based upon the weights of one or more paths connecting the first and second words.
摘要:
A computer-implemented machine translation system translates text from a first language to a second language. The system includes a plurality of mappings, each mapping indicative of associating a dependency structure of the first language with a dependency structure of the second language, wherein at least some of the mappings correspond to dependency structures of the first language having varying context with some common elements, and associated dependency structures of the second language to the dependency structures of the first language. A module receives input text in a first language and outputs output text in a second language based on accessing the plurality of mappings.
摘要:
The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypernyms that each have an "is a" relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.
摘要:
The present invention provides a facility for determining similarity between two input words utilizing the frequencies with which path patterns occurring between the words occur between words known to be synonyms. A preferred embodiment of the facility utilizes a training phase and a similarity determination phase. In the training phase, the facility first identifies, for a number of pairs of synonyms, the most salient semantic relation paths between each pair of synonyms. The facility then extracts from these semantic relation paths their path patterns, which each comprise a series of directional relation types. The number of times that each path pattern occurs between pairs of synonyms, called the frequency of the path pattern, is counted. In the training phase, the facility identifies the most salient semantic relation paths between the input words, and extracts their path patterns. The facility then averages the frequencies counted in the training phase for the path patterns extracted for the input words in order to obtain a quantitative measure of the similarity between the input words.
摘要:
A lexical knowledge base is compiled automatically from a machine-readable source (such as an on-line dictionary or unstructured text). The preferred embodiment of the invention makes use of “backward linking,” by which inverse semantic relations are discerned from the text and used to augment the knowledge base. By this arrangement, on-line dictionaries and other texts can provide formidable sources of “common sense” knowledge about the world.
摘要:
A computer-implemented method for providing information to an automatic machine translation system to improve translation accuracy is disclosed. The method includes receiving a collection of source text. An attempted translation that corresponds to the collection of source text is received from the automatic machine translation system. A correction input, which is configured to effectuate a correction of at least one error in the attempted translation, is also received. Finally, information is provided to the automatic machine translation system to reduce the likelihood that the error will be repeated in subsequent translations generated by the automatic machine translation system.
摘要:
A computer-implemented machine translation system translates text from a first language to a second language. The system includes a plurality of mappings, each mapping indicative of associating a dependency structure of the first language with a dependency structure of the second language, wherein at least some of the mappings correspond to dependency structures of the first language having varying context with some common elements, and associated dependency structures of the second language to the dependency structures of the first language. A module receives input text in a first language and outputs output text in a second language based on accessing the plurality of mappings.
摘要:
A method of aligning nodes of dependency structures obtained from a bilingual corpus includes a two-phase approach wherein a first phase comprises associating nodes of the dependency structures to form tentative correspondences. The nodes of the dependency structures are then aligned as a function of the tentative correspondences and structural considerations. Mappings are obtained from the aligned dependency structures. The mappings can be expanded with varying types and amounts of local context in order that a more fluent translation can be obtained when translation is performed.
摘要:
The present invention can be used in a natural language processing system to determine a relationship (such as similarity in meaning) between two textual segments. The relationship can be identified or determined based on logical graphs generated from the textual segments. A relationship between first and second logical graphs is determined. This is accomplished regardless of whether there is an exact match between the first and second logical graphs. In one embodiment, the first graph represents an input textual discourse unit. The second graph, in one embodiment, represents information in a lexical knowledge base (LKB). The input graph can be matched against the second graph, if they have similar meaning, even if the two differ lexically or structurally.
摘要:
A machine translation system is trained to generate confidence scores indicative of a quality of a translation result. A source string is translated with a machine translator to generate a target string. Features indicative of translation operations performed are extracted from the machine translator. A trusted entity-assigned translation score is obtained and is indicative of a trusted entity-assigned translation quality of the translated string. A relationship between a subset of the extracted features and the trusted entity-assigned translation score is identified.