-
公开(公告)号:WO2022020235A1
公开(公告)日:2022-01-27
申请号:PCT/US2021/042174
申请日:2021-07-19
Applicant: NETFLIX, INC.
Inventor: WANG, Yadong , WU, Chih-Wei , TACKE, Kyle , RAO, Shilpa , SEKH, Boney , SWAN, Andrew , SENAPATI, Raja
IPC: G11B27/031 , G11B27/10 , G11B27/28
Abstract: The disclosed computer-implemented method may include (1) accessing a first media data object and a different, second media data object that, when played hack, each render temporally sequenced content, (2) comparing first temporally sequenced content represented by the first media data object with second temporally sequenced content represented by the second media data object to identify a set of common temporal subsequences between the first media data object and the second media data object, (3) identifying a set of edits relative to the set of common temporal subsequences that describe a difference between the temporally sequenced content of the first media data object and the temporally sequenced content of the second media data object, and (4) executing a workflow relating to the first media data object and/or the second media data object based on the set of edits. Various other methods, sy stems, and computer-readable media are also disclosed.
-
公开(公告)号:WO2021262737A1
公开(公告)日:2021-12-30
申请号:PCT/US2021/038515
申请日:2021-06-22
Applicant: NETFLIX, INC.
Inventor: WANG, Yadong , RAO, Shilpa Jois
Abstract: The disclosed computer-implemented method includes analyzing, by a speech detection system, a media file to detect lip movement of a speaker who is visually rendered in media content of the media file. The method additionally includes identifying, by the speech detection system, audio content within the media file, and improving accuracy of a temporal correlation of the speech detection system. The method may involve correlating the lip movement of the speaker with the audio content, and determining, based on the correlation between the lip movement of the speaker and the audio content, that the audio content comprises speech from the speaker. The method may further involve recording, based on the determination that the audio content comprises speech from the speaker, the temporal correlation between the speech and the lip movement of the speaker as metadata of the media file. Various other methods, systems, and computer-readable media are disclosed.
-
公开(公告)号:WO2019143575A1
公开(公告)日:2019-07-25
申请号:PCT/US2019/013536
申请日:2019-01-14
Applicant: NETFLIX, INC.
Inventor: PARTHASARATHI, Murthy , WANG, Yadong , SEKH, Boney
IPC: H04N21/8549 , H04N21/485 , G11B27/00 , H04N21/233
Abstract: In various embodiments, a subtitle application generates a subtitle list for a trailer. In operation, the subtitle application performs matching operation(s) between trailer audio associated with a trailer and source audio associated with an audiovisual program. The subtitle application then maps a subtitle associated with the source audio from a source timeline associated with the source audio to a trailer timeline associated with the trailer audio to generate a mapped subtitle. Subsequently, the subtitle application generates a trailer subtitle list based on the mapped subtitle and at least one additional mapped subtitle. Because the subtitle application generates the trailer subtitle list based on audio comparisons, the subtitle application ensures that the proper subtitles are included in the trailer subtitle list without requiring a subtitler to view the trailer.
-
公开(公告)号:WO2021257316A1
公开(公告)日:2021-12-23
申请号:PCT/US2021/036268
申请日:2021-06-07
Applicant: NETFLIX, INC.
Inventor: WANG, Yadong , RAO, Shilpa Jois , PARTHASARATHI, Murthy
IPC: G10L21/10 , G06N20/00 , G10L15/02 , G10L15/04 , G10L15/08 , G10L15/24 , G10L2015/025 , G10L2021/105 , G10L21/0232
Abstract: The disclosed computer-implemented method may include training a machine-learning algorithm to use look-ahead to improve effectiveness of identifying visemes corresponding to audio signals by, for one or more audio segments in a set of training audio signals, evaluating an audio segment, where the audio segment includes at least a portion of a phoneme, and a subsequent segment that includes contextual audio that comes after the audio segment and potentially contains context about a viseme that maps to the phoneme. The method may also include using the trained machine-learning algorithm to identify one or more probable visemes corresponding to speech in a target audio signal. Additionally, the method may include recording, as metadata of the target audio signal, where a probable viseme occurs within the target audio signal. Various other methods, systems, and computer-readable media are also disclosed.
-
-
-