Abstract:
An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
Abstract:
An example method is provided and includes receiving recorded meeting information, selecting a meeting participant from the recorded meeting information, determining at least one of meeting participant emotion information, meeting participant speaker role information, or meeting participant engagement information based, at least in part, on the meeting information, and determining an interaction map associated with the meeting participant based, at least in part, on at least one of the meeting participant emotion information, the meeting participant speaker role information, or the meeting participant engagement information.
Abstract:
Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
Abstract:
An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
Abstract:
In one embodiment, a method includes obtaining a text representation, and identifying a current topic structure for the text representation. The first topic structure is initially identified as an initial first topic structure. The method also includes identifying at least a first document that has a first document topic structure that is similar to the current first topic structure, refining the current first topic structure based on the first document topic structure, and introducing topic labels in the text representation based on the current first topic structure.
Abstract:
Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
Abstract:
An example method is provided and includes receiving recorded meeting information, selecting a meeting participant from the recorded meeting information, determining at least one of meeting participant emotion information, meeting participant speaker role information, or meeting participant engagement information based, at least in part, on the meeting information, and determining an interaction map associated with the meeting participant based, at least in part, on at least one of the meeting participant emotion information, the meeting participant speaker role information, or the meeting participant engagement information.