摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.
摘要:
A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.
摘要:
An arrangement for adapting statistical parsers to new data using a mathematical transform, particularly a Markov transform. In particular, it is assumed that an initial statistical parser is available and a batch of new data is given. The initial model is mapped to a new model by a Markov matrix, each of whose rows sums to one. In the unsupervised setup, where “true” parses are missing, the transform matrix is obtained by maximizing the log likelihood of the parses of test data decoded using the model before adaptation. The proposed algorithm can be applied to supervised adaptation, as well.
摘要:
A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.
摘要:
A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.
摘要:
A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.