摘要:
Described is a bimodal data input technology by which handwriting recognition results are combined with speech recognition results to improve overall recognition accuracy. Handwriting data and speech data corresponding to mathematical symbols are received and processed (including being recognized) into respective graphs. A fusion mechanism uses the speech graph to enhance the handwriting graph, e.g., to better distinguish between similar handwritten symbols that are often misrecognized. The graphs include nodes representing symbols, and arcs between the nodes representing probability scores. When arcs in the first and second graphs are determined to match one another, such as aligned in time and associated with corresponding symbols, the probability score in the second graph for that arc is used to adjust the matching probability score in the first graph. Normalization and smoothing may be performed to correspond the graphs to one another and to control the influence of one graph on the other.
摘要:
Described is a bimodal data input technology by which handwriting recognition results are combined with speech recognition results to improve overall recognition accuracy. Handwriting data and speech data corresponding to mathematical symbols are received and processed (including being recognized) into respective graphs. A fusion mechanism uses the speech graph to enhance the handwriting graph, e.g., to better distinguish between similar handwritten symbols that are often misrecognized. The graphs include nodes representing symbols, and arcs between the nodes representing probability scores. When arcs in the first and second graphs are determined to match one another, such as aligned in time and associated with corresponding symbols, the probability score in the second graph for that arc is used to adjust the matching probability score in the first graph. Normalization and smoothing may be performed to correspond the graphs to one another and to control the influence of one graph on the other.
摘要:
A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
摘要:
A forward pass through a sequence of strokes representing a handwritten equation is performed from the first stroke to the last stroke in the sequence. At each stroke, a path score is determined for a plurality of symbol-relation pairs that each represents a symbol and its spatial relation to a predecessor symbol. A symbol graph having nodes and links is constructed by backtracking through the strokes from the last stroke to the first stroke and assigning scores to the links based on the path scores for the symbol-relation pairs. The symbol graph is used to recognize a mathematical expression based in part on the scores for the links and the mathematical expression is stored.
摘要:
Boundary points for speech in an audio signal are determined based on posterior probabilities for the boundary points given a set of possible segmentations of the audio signal. The boundary point posterior probability is determined based on a set of level posterior probabilities that each provide the probability of a sequence of feature vectors given one of the segmentations in the set of possible segmentations.
摘要:
Possible segmentations for an audio signal are scored based on distortions for feature vectors of the audio signal and the total number of segments in the segmentation. The scores are used to select a segmentation and the selected segmentation is used to identify a starting point and an ending point for a speech signal in the audio signal.
摘要:
A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
摘要:
One way of recognizing online handwritten mathematical expressions is to use a one-pass dynamic programming based symbol decoding generation algorithm. This method embeds segmentation into symbol identification to form a unified framework for symbol recognition. Along with decoding, a symbol graph is produced. Besides accurately recognizing handwritten mathematical expressions, this method can produce high quality symbol graphs. This method uses six knowledge source models to help search for possible symbol hypotheses during the decoding process. Here, knowledge source exponential weights and a symbol insertion penalty are used to weigh the various knowledge source model probabilities to increase accuracy.
摘要:
A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
摘要:
A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.