-
公开(公告)号:US12075187B2
公开(公告)日:2024-08-27
申请号:US17841564
申请日:2022-06-15
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Shilpa Jois Rao
CPC classification number: H04N5/9305 , G10L15/005 , G10L15/04 , G10L15/22 , G10L15/26 , G10L25/57 , G10L25/81 , H04N5/278
Abstract: The disclosed computer-implemented method may include systems and methods for automatically generating sound event subtitles for digital videos. For example, the systems and methods described herein can automatically generate subtitles for sound events within a digital video soundtrack that includes sounds other than speech. Additionally, the systems and methods described herein can automatically generate sound event subtitles as part of an automatic and comprehensive approach that generates subtitles for all sounds within a soundtrack of a digital video—thereby avoiding the need for any manual inputs as part of the subtitling process.
-
公开(公告)号:US11924481B2
公开(公告)日:2024-03-05
申请号:US18186366
申请日:2023-03-20
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Chih-Wei Wu , Kyle Tacke , Shilpa Jois Rao , Boney Sekh , Andrew Swan , Raja Ranjan Senapati
IPC: H04N21/2343 , G06Q10/0631 , G11B27/031 , G11B27/10 , H04N21/234
CPC classification number: H04N21/2343 , G06Q10/06312 , G11B27/031 , G11B27/10 , H04N21/23412 , H04N21/23418
Abstract: The disclosed computer-implemented method may include (1) accessing a first media data object and a different, second media data object that, when played back, each render temporally sequenced content, (2) comparing first temporally sequenced content represented by the first media data object with second temporally sequenced content represented by the second media data object to identify a set of common temporal subsequences between the first media data object and the second media data object, (3) identifying a set of edits relative to the set of common temporal subsequences that describe a difference between the temporally sequenced content of the first media data object and the temporally sequenced content of the second media data object, and (4) executing a workflow relating to the first media data object and/or the second media data object based on the set of edits. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20220115030A1
公开(公告)日:2022-04-14
申请号:US17555175
申请日:2021-12-17
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Shilpa Jois Rao , Murthy Parthasarathi , Kyle Tacke
Abstract: The disclosed computer-implemented method may include obtaining an audio sample from a content source, inputting the obtained audio sample into a trained machine learning model, obtaining the output of the trained machine learning model, wherein the output is a profile of an environment in which the input audio sample was recorded, obtaining an acoustic impulse response corresponding to the profile of the environment in which the input audio sample was recorded, obtaining a second audio sample, processing the obtained acoustic impulse response with the second audio sample, and inserting a result of processing the obtained acoustic impulse response and the second audio sample into an audio track. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20210390949A1
公开(公告)日:2021-12-16
申请号:US16903373
申请日:2020-06-16
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Shilpa Jois Rao , Murthy Parthasarathi
IPC: G10L15/08 , G10L15/04 , G10L15/02 , G10L21/0232 , G06N20/00
Abstract: The disclosed computer-implemented method may include training a machine-learning algorithm to use look-ahead to improve effectiveness of identifying visemes corresponding to audio signals by, for one or more audio segments in a set of training audio signals, evaluating an audio segment, where the audio segment includes at least a portion of a phoneme, and a subsequent segment that includes contextual audio that comes after the audio segment and potentially contains context about a viseme that maps to the phoneme. The method may also include using the trained machine-learning algorithm to identify one or more probable visemes corresponding to speech in a target audio signal. Additionally, the method may include recording, as metadata of the target audio signal, where a probable viseme occurs within the target audio signal. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US12266361B2
公开(公告)日:2025-04-01
申请号:US16911247
申请日:2020-06-24
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Shilpa Jois Rao
Abstract: The disclosed computer-implemented method includes analyzing, by a speech detection system, a media file to detect lip movement of a speaker who is visually rendered in media content of the media file. The method additionally includes identifying, by the speech detection system, audio content within the media file, and improving accuracy of a temporal correlation of the speech detection system. The method may involve correlating the lip movement of the speaker with the audio content, and determining, based on the correlation between the lip movement of the speaker and the audio content, that the audio content comprises speech from the speaker. The method may further involve recording, based on the determination that the audio content comprises speech from the speaker, the temporal correlation between the speech and the lip movement of the speaker as metadata of the media file. Various other methods, systems, and computer-readable media are disclosed.
-
公开(公告)号:US20220021911A1
公开(公告)日:2022-01-20
申请号:US17245252
申请日:2021-04-30
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Chih-Wei Wu , Kyle Tacke , Shilpa Jois Rao , Boney Sekh , Andrew Swan , Raja Ranjan Senapati
IPC: H04N21/2343 , H04N21/234 , G06Q10/06
Abstract: The disclosed computer-implemented method may include (1) accessing a first media data object and a different, second media data object that, when played back, each render temporally sequenced content, (2) comparing first temporally sequenced content represented by the first media data object with second temporally sequenced content represented by the second media data object to identify a set of common temporal subsequences between the first media data object and the second media data object, (3) identifying a set of edits relative to the set of common temporal subsequences that describe a difference between the temporally sequenced content of the first media data object and the temporally sequenced content of the second media data object, and (4) executing a workflow relating to the first media data object and/or the second media data object based on the set of edits. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20210151082A1
公开(公告)日:2021-05-20
申请号:US16747314
申请日:2020-01-20
Applicant: Netflix, Inc.
Inventor: Yadong Wang , Murthy Parthasarathi , Andrew Swan , Raja Ranjan Senapati , Shilpa Jois Rao , Anjali Chablani , Kyle Tacke
IPC: G11B27/036 , G11B27/034 , H04N21/84 , H04N21/81 , H04N21/485 , G10L13/04 , G10L13/08
Abstract: The disclosed computer-implemented method may include accessing an audio track that is associated with a video recording, identifying a section of the accessed audio track having a specific audio characteristic, reducing a volume level of the audio track in the identified section, accessing an audio segment that includes a synthesized voice and inserting the accessed audio segment into the identified section of the audio track, where the inserted segment has a higher volume level than the reduced volume level of the audio track in the identified section. The synthesized voice description can be used to provide additional information to a visually impaired viewer without interrupting the audio track that is associated with the video recording, typically by inserting the synthesized voice description into a segment of the audio track in which there is no dialog. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US10798271B2
公开(公告)日:2020-10-06
申请号:US15863772
申请日:2018-01-05
Applicant: NETFLIX, INC.
Inventor: Murthy Parthasarathi , Andrew Swan , Yadong Wang , Thomas E. Mack
IPC: H04N5/14 , H04N9/82 , H04N21/234 , H04N21/242 , H04N5/445 , H04N21/44 , H04N9/89 , H04N21/43 , H04N21/488
Abstract: In various embodiments, a subtitle timing application detects timing errors between subtitles and shot changes. In operation, the subtitle timing application determines that a temporal edge associated with a subtitle does not satisfy a timing guideline based on a shot change. The shot change occurs within a sequence of frames of an audiovisual program. The subtitle timing application then determines a new temporal edge that satisfies the timing guideline relative to the shot change. Subsequently, the subtitle timing application causes a modification to a temporal location of the subtitle within the sequence of frames based on the new temporal edge. Advantageously, the modification to the subtitle improves a quality of a viewing experience for a viewer. Notably, by automatically detecting timing errors, the subtitle timing application facilitates proper and efficient re-scheduling of subtitles that are not optimally timed with shot changes.
-
-
-
-
-
-
-