Abstract:
A method includes training a neural network having parameters on training data, in which the neural network receives an input state and processes the input state to generate a respective score for each decision in a set of decisions. The method includes receiving training data including training text sequences and, for each training text sequence, a corresponding gold decision sequence. The method includes training the neural network on the training data to determine trained values of parameters of the neural network. Training the neural network includes for each training text sequence: maintaining a beam of candidate decision sequences for the training text sequence, updating each candidate decision sequence by adding one decision at a time, determining that a gold candidate decision sequence matching a prefix of the gold decision sequence has dropped out of the beam, and in response, performing an iteration of gradient descent to optimize an objective function.
Abstract:
A computer-implemented method can include receiving a speech input representing a question, converting the speech input to a string of characters, and obtaining tokens each representing a potential word. The method can include determining one or more part-of-speech (POS) tags for each token and determining sequences of the POS tags for the tokens, each sequence of the POS tags including one POS tag per token. The method can include determining one or more parses for each sequence of the POS tags for the tokens and determining a most-likely parse and its corresponding sequence of the POS tags for the tokens to obtain a selected parse and a selected sequence of the POS tags for the tokens. The method can also include determining a most-likely answer to the question using the selected parse and the selected sequence of the POS tags for the tokens and outputting the most-likely answer.
Abstract:
A computer-implemented method can include receiving a speech input representing a question, converting the speech input to a string of characters, and obtaining tokens each representing a potential word. The method can include determining one or more part-of-speech (POS) tags for each token and determining sequences of the POS tags for the tokens, each sequence of the POS tags including one POS tag per token. The method can include determining one or more parses for each sequence of the POS tags for the tokens and determining a most-likely parse and its corresponding sequence of the POS tags for the tokens to obtain a selected parse and a selected sequence of the POS tags for the tokens. The method can also include determining a most-likely answer to the question using the selected parse and the selected sequence of the POS tags for the tokens and outputting the most-likely answer.
Abstract:
A method and system are provided for a part-of-speech tagger that may be particularly useful for resource-poor languages. Use of manually constructed tag dictionaries from dictionaries via bitext can be used as type constraints to overcome the scarcity of annotated data in some instances. Additional token constraints can be projected from a resource-rich source language via word-aligned bitext. Several example models are provided to demonstrate this such as a partially observed conditional random field model, where coupled token and type constraints may provide a partial signal for training. The disclosed method achieves a significant relative error reduction over the prior state of the art.
Abstract:
A source language sentence is tagged with non-lexical tags, such as part-of-speech tags and is parsed using a lexicalized parser trained in the source language. A target language sentence that is a translation of the source language sentence is tagged with non-lexical labels (e.g., part-of speech tags) and is parsed using a delexicalized parser that has been trained in the source language to produce k-best parses. The best parse is selected based on the parse's alignment with lexicalized parse of the source language sentence. The selected best parse can be used to update the parameter vector of a lexicalized parser for the target language.
Abstract:
A dependency parsing method can include determining an index set of possible head-modifier dependencies for a sentence. The index set can include inner arcs and outer arcs, inners arcs representing possible dependency between words in the sentence separated by a distance less than or equal to a threshold and outer arcs representing possible dependency between words in the sentence separated by a distance greater than the threshold. The index set can be pruned to include: (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs when a likelihood that there exists any possible outer arc that is appropriate is greater than the first threshold. The method can include further pruning the pruned index set based on a second parsing algorithm, and determining a most-likely parse for the sentence from the pruned index set.