System to correct closed captioning display using context from audio/video
摘要:
Embodiments are directed towards the analysis of the audiovisual content to adjust the timing, duration, or positioning of closed captioning so that the closed captioning more closely aligns with the scene being presented. Content that incudes video, audio, and closed captioning is obtained, and the audio is converted to text. A duration and timing for the closed captioning is determined based on a comparison between the closed captioning and the audio text. Scene context is determined for the content based on analysis of the video and the audio, such as by employing trained artificial neural networks. A display position of the closed captioning is determined based on the scene context. The duration and timing of the closed captioning are modified based on the scene context. The video and closed captioning are provided to a content receiver for presentation to a user based on the display position, duration, and timing.
信息查询
0/0