摘要:
A method and system for determining a location of origin and a time period in which a document was written is disclosed. A text is received and a set of linguistic characteristics for the text are identified. A set of possible locations and time periods for the text are determined based on the set of linguistic characteristics. A set of reference documents are used to determine a proximity rating for the text based upon a determination of how close the text is to the reference documents. The potential locations and time periods are ranked and returned for presentation.
摘要:
A method includes receiving a text and identifying a set of linguistic characteristics contained in the text, where linguistic characteristics include grammatical, syntactic, and idiomatic features of the text. The method also includes determining a plurality of locations of origin in which the text was potentially written based on the set of linguistic characteristics. The method also includes retrieving a set of reference documents for each location of origin in the plurality of locations of origin and producing a set of proximity scores by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents, and the text, wherein the proximity checks analyze how often and how close linguistic characteristics are to one another. The method also includes ranking the plurality of locations of origin based on the set of proximity scores and returning a set of one or more ranked locations of origin.
摘要:
Test cases for a text annotator are generated by determining types of inputs to the annotator and analyzing language structures in a corpus to identify sentence types and grammar constructs. An input type can correspond to multiple grammar constructs. Test cases are generated by performing grammar tree transformations on selected fragments from the corpus based on the sentence types and the grammar constructs. Additional test cases are generated by replacing starting phrases in selected fragments with substitute phrases from dictionaries associated with the input types (a dictionary can include a false synonym for an input type for purposes of negative testing). The two generating approaches can be combined, i.e., performing one or more successive (different) grammar tree transformations to yield a sentence which is then subjected to phrase substitution.
摘要:
A method and system for determining a location of origin and a time period in which a document was written is disclosed. A text is received and a set of linguistic characteristics for the text are identified. A set of possible locations and time periods for the text are determined based on the set of linguistic characteristics. A set of reference documents are used to determine a proximity rating for the text based upon a determination of how close the text is to the reference documents. The potential locations and time periods are ranked and returned for presentation.
摘要:
Mechanisms are provided for processing logical relationships in natural language content. A logical parse of a first parse of a natural language content is generated by identifying latent logical operators within the first parse indicative of logical relationships between elements of the natural language content. The logical parse comprises nodes and edges linking nodes. At least one knowledge value is associated with each node in the logical parse. The at least one knowledge value of at least a subset of the nodes in the logical parse is propagated to one or more other nodes in the logical parse based on propagation rules. A reasoning operation is performed on the logical parse to generate a knowledge output indicative of knowledge associated with one or more of the logical relationships between elements of the natural language content.
摘要:
A method and system for determining a location of origin and a time period in which a document was written is disclosed. A text is received and a set of linguistic characteristics for the text are identified. A set of possible locations and time periods for the text are determined based on the set of linguistic characteristics. A set of reference documents are used to determine a proximity rating for the text based upon a determination of how close the text is to the reference documents. The potential locations and time periods are ranked and returned for presentation.
摘要:
Mechanisms are provided for processing logical relationships in natural language content. A logical parse of a first parse of a natural language content is generated by identifying latent logical operators within the first parse indicative of logical relationships between elements of the natural language content. The logical parse comprises nodes and edges linking nodes. At least one knowledge value is associated with each node in the logical parse. The at least one knowledge value of at least a subset of the nodes in the logical parse is propagated to one or more other nodes in the logical parse based on propagation rules. A reasoning operation is performed on the logical parse to generate a knowledge output indicative of knowledge associated with one or more of the logical relationships between elements of the natural language content.
摘要:
A method includes receiving a text and identifying a set of linguistic characteristics contained in the text, where linguistic characteristics include grammatical, syntactic, and idiomatic features of the text. The method also includes determining a plurality of locations of origin in which the text was potentially written based on the set of linguistic characteristics. The method also includes retrieving a set of reference documents for each location of origin in the plurality of locations of origin and producing a set of proximity scores by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents, and the text, wherein the proximity checks analyze how often and how close linguistic characteristics are to one another. The method also includes ranking the plurality of locations of origin based on the set of proximity scores and returning a set of one or more ranked locations of origin.
摘要:
A method includes receiving a text. The method also includes identifying a set of linguistic characteristics contained in the text. The method also includes determining a plurality of time periods in which the text was potentially written based on the set of linguistic characteristics. The method also includes retrieving a set of reference documents for each time period. The method also includes producing a set of proximity scores by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents, and the text, where the proximity checks analyze how often and how close linguistic characteristics are to one another. The method also includes ranking the plurality of time periods based on the set of proximity scores and returning a set of one or more ranked time periods of the plurality of time periods.
摘要:
A method includes receiving a text. The method also includes identifying a set of linguistic characteristics contained in the text. The method also includes determining a plurality of time periods in which the text was potentially written based on the set of linguistic characteristics. The method also includes retrieving a set of reference documents for each time period. The method also includes producing a set of proximity scores by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents, and the text, where the proximity checks analyze how often and how close linguistic characteristics are to one another. The method also includes ranking the plurality of time periods based on the set of proximity scores and returning a set of one or more ranked time periods of the plurality of time periods.