-
公开(公告)号:US20240355119A1
公开(公告)日:2024-10-24
申请号:US18305587
申请日:2023-04-24
申请人: ADOBE INC.
发明人: Ioana Croitoru , Trung Huu Bui , Zhaowen Wang , Seunghyun Yoon , Franck Dernoncourt , Hailin Jin
CPC分类号: G06V20/41 , G06V10/774 , G06V20/49 , G06V20/70 , G10L15/04 , G10L15/1815 , G10L15/22 , G10L25/57 , G10L15/16
摘要: One or more aspects of the method, apparatus, and non-transitory computer readable medium include receiving a query relating to a long video. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include generating a segment of the long video corresponding to the query using a machine learning model trained to identify relevant segments from long videos. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include responding to the query based on the generated segment.
-
公开(公告)号:US12119028B2
公开(公告)日:2024-10-15
申请号:US17967364
申请日:2022-10-17
申请人: Adobe Inc.
发明人: Xue Bai , Justin Jonathan Salamon , Aseem Omprakash Agarwala , Hijung Shin , Haoran Cai , Joel Richard Brandt , Lubomira Assenova Dontcheva , Cristin Ailidh Fraser
IPC分类号: G11B27/036 , G06F40/166 , G10L15/26 , G10L25/57 , G11B27/34 , G06F3/0482 , G06F3/04845 , G06F3/0485
CPC分类号: G11B27/036 , G06F40/166 , G10L15/26 , G10L25/57 , G11B27/34 , G06F3/0482 , G06F3/04845 , G06F3/0485
摘要: Embodiments of the present invention provide systems, methods, and computer storage media for identifying candidate boundaries for video segments, video segment selection using those boundaries, and text-based video editing of video segments selected via transcript interactions. In an example implementation, boundaries of detected sentences and words are extracted from a transcript, the boundaries are retimed into an adjacent speech gap to a location where voice or audio activity is a minimum, and the resulting boundaries are stored as candidate boundaries for video segments. As such, a transcript interface presents the transcript, interprets input selecting transcript text as an instruction to select a video segment with corresponding boundaries selected from the candidate boundaries, and interprets commands that are traditionally thought of as text-based operations (e.g., cut, copy, paste) as an instruction to perform a corresponding video editing operation using the selected video segment.
-
公开(公告)号:US12114097B2
公开(公告)日:2024-10-08
申请号:US18190227
申请日:2023-03-27
摘要: In aspects of personal content managed during extended display screen recording, a screen recording system includes a wireless device that provides digital image content for display on an extended display device, and a screen recording session on the wireless device captures the digital image content and audio data. The wireless device implements a content control module that can determine the screen recording session would capture personal content associated with a user of the wireless device. The content control module can initiate a private screen review mode in which the personal content is displayable on a display screen of the wireless device and is prevented from visual display on the extended display device. The content control module can also generate a shareable screen recording that includes the audio data and the digital image content displayed on the extended display device, without including the personal content.
-
公开(公告)号:US12106777B2
公开(公告)日:2024-10-01
申请号:US17940057
申请日:2022-09-08
发明人: Jixiang Hu
IPC分类号: G11B27/02 , G06F3/16 , G06F40/166 , G10L25/57
CPC分类号: G11B27/02 , G06F3/165 , G06F40/166 , G10L25/57
摘要: Embodiments of the present disclosure provide an audio processing method and an electronic device. The method includes: first obtaining text information corresponding to a to-be-processed audio, where the text information includes a to-be-processed text and a playback period corresponding to each field in the to-be-processed text; then receiving a first input on the to-be-processed text; in response to the first input, determining, as a to-be-processed field, a field indicated by the first input in the to-be-processed text; then receiving a second input on the to-be-processed field; obtaining a target audio segment in response to the second input; and finally modifying an audio segment at a playback period corresponding to the to-be-processed field according to the target audio segment, to obtain a target audio.
-
公开(公告)号:US12101516B1
公开(公告)日:2024-09-24
申请号:US17364448
申请日:2021-06-30
IPC分类号: H04N21/233 , G06F40/279 , G06F40/58 , G06V40/10 , G10L15/00 , G10L25/54 , G10L25/57 , H04N21/234 , H04N21/239 , H04N21/25
CPC分类号: H04N21/233 , G06F40/279 , G06F40/58 , G06V40/10 , G10L15/005 , G10L25/54 , G10L25/57 , H04N21/23418 , H04N21/2393 , H04N21/251
摘要: Techniques and apparatus for selecting audio content for a content entity in audio-visual content are described. An example technique involves identifying at least one content entity associated with a content item that is accessible to one or more users in a first language over a communication network. One or more attributes of the at least one content entity are determined. A plurality of audio content samples in a second language are obtained. Each audio content sample includes a different audio sample of a portion of speech of the content entity in the second language. A first audio content sample that satisfies a predetermined condition is determined, based on the plurality of audio content samples and the one or more attributes of the at least one content entity. An indication of the first audio content sample is provided.
-
6.
公开(公告)号:US20240297954A1
公开(公告)日:2024-09-05
申请号:US18572683
申请日:2022-06-24
申请人: KAKEAI, Inc.
发明人: Hidetaka HONDA
摘要: This computer system for assisting communication between two parties is configured to: receive information indicating a response that a first party of the two parties expects a second party of the two parties to make in the communication; receive speech and/or video during the communication; derive an advice relating to the communication on the basis of the information and the speech and/or the video; and provide the derived advice during the communication.
-
公开(公告)号:US20240296694A1
公开(公告)日:2024-09-05
申请号:US18661087
申请日:2024-05-10
申请人: GN Audio A/S
IPC分类号: G06V40/10 , G06T7/20 , G06T7/70 , G06V10/25 , G06V20/40 , G10L17/00 , G10L25/57 , H04N5/262 , H04R1/40 , H04R3/00
CPC分类号: G06V40/10 , G06T7/20 , G06T7/70 , G06V10/25 , G06V20/49 , G10L17/00 , G10L25/57 , H04N5/2628 , G06T2207/10016 , G06T2207/20084 , G06T2207/20132 , G06T2207/30196 , H04R1/406 , H04R3/005
摘要: For video applications, a method for dynamically switching from a current ROI to a target ROI is disclosed, wherein the target ROI only includes active speakers. Advantageously, if the target ROI crops a non-speaker, then the target ROI is expanded to include said non-speaker. Transitioning from the current ROI to the target ROI may be achieved based on a cutover transition technique, or a smooth transition technique. The cutover transition technique achieves the change from the current arrived to the target ROI in a single interval, whereas the smooth transition technique achieves the change over a number of intervals, wherein a percentage of the total change required is allocated to each interval. A system for implementing the above method is also disclosed.
-
公开(公告)号:US20240273139A1
公开(公告)日:2024-08-15
申请号:US18408792
申请日:2024-01-10
发明人: Adi Miller , Haim Somech , Michael Sterenberg
IPC分类号: G06F16/683 , G06F3/0481 , G06F3/0484 , G06F16/332 , G06F40/295 , G06F40/30 , G10L17/22 , G10L25/57 , H04N7/15
CPC分类号: G06F16/685 , G06F3/0481 , G06F3/0484 , G06F16/3329 , G06F40/295 , G06F40/30 , G10L17/22 , G10L25/57 , H04N7/15
摘要: Systems and methods for surfacing natural language queries from one or more transcripts. An example method may include converting received audio to text, through automated speech recognition, to form a transcript of the audio, wherein the transcript includes text of the audio and identifications of speakers associated with portions of the text corresponding to utterances from the respective speakers; generating input signals based on at least the transcript; executing at least one of one or more heuristics or a trained machine-learning (ML) model, using the generated input signals as an input, to generate at least one of a suggested natural language query for searching the transcript or a key moment within the received audio; and causing at least one of the suggested natural language query or the key moment to be surfaced on one or more remote devices.
-
公开(公告)号:US20240249743A1
公开(公告)日:2024-07-25
申请号:US18562663
申请日:2021-05-25
申请人: Google LLC
发明人: Snehitha Singaraju
IPC分类号: G10L21/0364 , G06V10/70 , G06V20/40 , G06V20/50 , G10L21/034 , G10L25/57
CPC分类号: G10L21/0364 , G06V10/768 , G06V20/40 , G06V20/50 , G10L21/034 , G10L25/57
摘要: This document describes systems and methods for enhancing dynamically audio content of a captured scene (104). As part of the described systems and methods, an electronic device (102) may include a content-enhancement manager module (216) that directs the electronic device (102) to perform operations to enhance the audio content. Operations may include determining a context (504) surrounding the capture of the scene, determining an audio focus point (604) within the scene, or determining an intent of a user directing the electronic device (102) to capture the scene (104). Based on one or more of these determinations, the electronic device (102) may use a variety of techniques to enhance the audio content associated with the captured scene so as to present the captured scene (104) with relevant audio content.
-
公开(公告)号:US20240211704A1
公开(公告)日:2024-06-27
申请号:US18069438
申请日:2022-12-21
申请人: Meta Platforms, Inc.
CPC分类号: G06F40/58 , G10L17/20 , G10L19/167 , G10L25/57
摘要: An audio processing system includes: a receiver configured to receive the original audio data; a processor configured to execute the instructions stored in the memory to cause the audio processing system to: separate a background noise audio data, a first speaker audio data, and a second speaker audio data; recognize first speaker speech, convert the first speaker speech to first speaker text, translate the first speaker text to a second language text, and convert the second language text to a second speech; recognize second speaker speech, convert the second speaker speech to second speaker text, translate the second speaker text to the second language text, and convert the second language text of the second speaker to a second speech for the second speaker; and generate encoded audio data; and a transmitter configured to transmit the encoded audio data to a content user device.
-
-
-
-
-
-
-
-
-