摘要:
Test cases for a text annotator are generated by determining types of inputs to the annotator and analyzing language structures in a corpus to identify sentence types and grammar constructs. An input type can correspond to multiple grammar constructs. Test cases are generated by performing grammar tree transformations on selected fragments from the corpus based on the sentence types and the grammar constructs. Additional test cases are generated by replacing starting phrases in selected fragments with substitute phrases from dictionaries associated with the input types (a dictionary can include a false synonym for an input type for purposes of negative testing). The two generating approaches can be combined, i.e., performing one or more successive (different) grammar tree transformations to yield a sentence which is then subjected to phrase substitution.
摘要:
A first question may be received. A first tag may be identified. The first tag may correspond to a first demographic trait. The first tag may be for use in providing a context for generating a first answer estimate to the first question. The first answer estimate may be generated using natural language processing and based on the first tag.
摘要:
An approach is provided in which an information handling system analyzes correction information corresponding to an answer generated by a question answer system. The correction information includes a correction to the answer and captured variable information utilized by the question answer system to generate the answer. The information handling system selects input variables based upon the correction analysis and generates a test case that includes the selected input variables and the variable information. In turn, the information handling system tests the question answer system using the generated test case.
摘要:
A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
摘要:
A first question may be received. A first tag may be identified. The first tag may correspond to a first demographic trait. The first tag may be for use in providing a context for generating a first answer estimate to the first question. The first answer estimate may be generated using natural language processing and based on the first tag.
摘要:
A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
摘要:
Test cases for a text annotator are generated by determining types of inputs to the annotator and analyzing language structures in a corpus to identify sentence types and grammar constructs. An input type can correspond to multiple grammar constructs. Test cases are generated by performing grammar tree transformations on selected fragments from the corpus based on the sentence types and the grammar constructs. Additional test cases are generated by replacing starting phrases in selected fragments with substitute phrases from dictionaries associated with the input types (a dictionary can include a false synonym for an input type for purposes of negative testing). The two generating approaches can be combined, i.e., performing one or more successive (different) grammar tree transformations to yield a sentence which is then subjected to phrase substitution.
摘要:
A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
摘要:
A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
摘要:
An approach is provided in which an information handling system analyzes correction information corresponding to an answer generated by a question answer system. The correction information includes a correction to the answer and captured variable information utilized by the question answer system to generate the answer. The information handling system selects input variables based upon the correction analysis and generates a test case that includes the selected input variables and the variable information. In turn, the information handling system tests the question answer system using the generated test case.