摘要:
A method, system and apparatus for facilitating transcription and captioning of multi-media content are presented. The method, system, and apparatus include automatic multi-media analysis operations that produce information which is presented to an operator as suggestions for spoken words, spoken word timing, caption segmentation, caption playback timing, caption mark-up such as non-spoken cues or speaker identification, caption formatting, and caption placement. Spoken word suggestions are primarily created through an automatic speech recognition operation, but may be enhanced by leveraging other elements of the multi-media content, such as correlated text and imagery by using text extracted with an optical character recognition operation. Also included is an operator interface that allows the operator to efficiently correct any of the aforementioned suggestions. In the case of word suggestions, in addition to best hypothesis word choices being presented to the operator, alternate word choices are presented for quick selection via the operator interface. Ongoing operator corrections can be leveraged to improve the remaining suggestions. Additionally, an automatic multi-media playback control capability further assists the operator during the correction process.