摘要:
According to one embodiment, a machine learning apparatus includes a processing circuit. The processing circuit generates a training sample in a VQA format regarding a VQA task based on a sample in a non-VQA format. The training sample in the VQA format includes a combination of an object, a question text regarding the object and an answer text in response to the question text as elements, and the sample in the non-VQA format includes a combination of an object and a label related to the object as elements. The processing circuit trains a statistical model of the VQA task based on the generated training sample in the VQA format.
摘要:
According to one embodiment, an information processing apparatus includes one or more processors configured to detect a trigger from a voice signal, the trigger indicating start of voice recognition; and to perform voice recognition of a recognition sound section subsequent to a trigger sound section including the detected trigger, referring to a trigger and voice recognition dictionary corresponding to the trigger.
摘要:
According to an embodiment, a speech recognition result output device includes a storage and processing circuitry. The storage is configured to store a language model, for speech recognition. The processing circuitry is coupled to the storage and configured to acquire a phonetic sequence, convert the phonetic sequence into a phonetic sequence feature vector, convert the phonetic sequence feature vector into graphemes using the language model, and output the graphemes.
摘要:
According to an embodiment, an arithmetic operation apparatus for a neural network includes an input layer calculator, a correction unit calculator, a hidden layer calculator, and an output layer calculator. The input layer calculator is configured to convert an input pattern into features as outputs of an input layer. The correction unit calculator is configured to perform calculation on N unit groups corresponding respectively to N classes of the input pattern and including correction units that each multiply a value based on inputs by a weight determined for the corresponding class. The hidden layer calculator is configured to perform calculation in a hidden layer based on the outputs of the input layer, another hidden layer, or the correction unit calculator. The output layer calculator is configured to perform calculation in an output layer based on the calculation for the hidden layer or the outputs of the correction unit calculator.
摘要:
According to an embodiment, an information processing device includes an obtainer, a display controller, a detector, a plurality of correctors, and a selector. The obtainer obtains a keyword. The display controller performs control to display the keyword on a display. The detector detects a gesture operation performed on the display. The plurality of correctors are capable of correcting a correction target keyword to be corrected among one or more keywords displayed on the display and implement mutually different correction methods one of which includes correction in form of deleting the correction target keyword. The selector selects, according to a gesture operation detected by the detector, one of the plurality of correctors as the corrector for correcting the correction target keyword.
摘要:
According to one embodiment, a difference extraction device includes processing circuitry. The processing circuitry acquires a text in which an input notation string is described. The processing circuitry converts the input notation string into a pronunciation string. The processing circuitry executes a pronunciation string conversion process in which the pronunciation string is converted into an output notation string. The processing circuitry extracts a difference by comparing the input notation string and the output notation string with each other.
摘要:
According to one embodiment, the interface-providing apparatus comprises an identifying unit and a generating unit. The identifying unit identifies a keyword from dialogue data including a question text to request information, and a response text in reply thereto. The generating unit generates display information to display a user interface for receiving feedback input relating to a degree of usefulness of a keyword when searching for the requested information.
摘要:
According to one embodiment, an information processing apparatus includes an acquisition unit, a conversion unit, and a display controller. The acquisition unit acquires multimedia data associated with an item of record data having a plurality of items. The conversion unit performs a conversion process from the multimedia data to first display data showing a content of the multimedia data. The display controller displays the first display data when the conversion process is completed, and displays second display data showing a progress status of the conversion process when the conversion process is incomplete in association with the item of the record data.
摘要:
According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.
摘要:
According to one embodiment, a voice keyword detection apparatus includes a memory and a circuit coupled with the memory. The circuit calculates a first score for a first sub-keyword and a second score for a second sub-keyword. The circuit detects the first and second sub-keywords based on the first and second scores. The circuit determines, when the first sub-keyword is detected from one or more first frames, to accept the first sub-keyword. The circuit determines, when the second sub-keyword is detected from one or more second frames, whether to accept the second sub-keyword based on a start time and/or an end time of the one or more first frames and a start time and/or an end time of the one or more second frames.