Abstract:
Systems, apparatuses, and methods for generating a parser training set and ultimately a correct treebank for a corpus of text, based on using an existing parser that was trained on a different corpus. Also disclosed are systems, apparatuses, and methods for improving the operation of a parser in the case of using a less familiar set of training data than is typically used to train conventional parsers. This can be used to generate a more effective and accurate parser for a new corpus (and hence more accurate parse trees) with significantly less effort than would be required if it was necessary to generate a standard size training set.