-
公开(公告)号:US20170364752A1
公开(公告)日:2017-12-21
申请号:US15624475
申请日:2017-06-15
发明人: Cong ZHOU , Timo KUNKEL , Cristina Michel VASCO
IPC分类号: G06K9/00 , H04N13/00 , H04N5/92 , G06K9/46 , G11B27/031 , G11B27/10 , G10K11/34 , H04R1/32 , H04H60/04
CPC分类号: G06K9/00718 , G06K9/00255 , G06K9/00288 , G06K9/4671 , G10K1/38 , G10K11/34 , G10K2200/10 , G11B27/031 , G11B27/10 , H04H60/04 , H04H60/07 , H04H60/48 , H04H60/58 , H04H60/59 , H04H60/66 , H04N9/802 , H04N13/161 , H04N13/189 , H04R1/326 , H04R3/005 , H04S7/30 , H04S2400/11
摘要: Image data relating to real-world objects or persons is collected from a scene while collecting audio data relating to the real-world objects or persons from the same scene. The audio data is used to derive sound objects corresponding to the real-world objects or persons. The image data is used to derive video objects corresponding to the real-world objects or persons. Based on the sound objects and the video objects, candidate salient objects are generated. A salient object is selected from among the candidate salient objects. Perceptual enhancement operations are performed on the selected salient object.
-
公开(公告)号:US20180234612A1
公开(公告)日:2018-08-16
申请号:US15785977
申请日:2017-10-17
发明人: Timo KUNKEL , Cong ZHOU , Vivek KUMAR , Remi S. AUDFRAY
CPC分类号: H04N5/23203 , B64C39/024 , B64C2201/027 , B64C2201/108 , B64C2201/127 , G06T7/70 , G06T2207/10016 , H04N5/04 , H04N5/23206 , H04N5/23216 , H04N5/23222 , H04N5/23293
摘要: Methods, systems, and computer program products for automatically positioning a content capturing device are disclosed. A vehicle, e.g., an UAV, carries the content capturing device, e.g., a camcorder. The UAV can position the content capturing device at a best location for viewing a subject based on one or more audio or visual cues. The UAV can follow movement of the subject to achieve best audio or visual effect. In some implementations, a controller device carried by the subject can generate one or more signals for the UAV to follow. The controller device may be coupled to a microphone that records audio. The signals can be used to temporally synchronize video captured at the UAV and audio captured by the microphone.
-
公开(公告)号:US20230386486A1
公开(公告)日:2023-11-30
申请号:US18248294
申请日:2021-10-15
发明人: Cong ZHOU , Grant A. DAVIDSON , Mark S. VINTON
IPC分类号: G10L19/022 , G10L25/30 , G10L19/032 , G10L19/04
CPC分类号: G10L19/022 , G10L19/04 , G10L19/032 , G10L25/30
摘要: The present invention relates to a method for predicting transform coefficients representing frequency content of an adaptive block length media signal, by receiving a frame and receiving block length information indicating a number of quantized transform coefficients for each block in the frame, the number of quantized transform coefficients being one of a first or second number, wherein the first number is greater than the second number, determining a first block has the second number of quantized transform coefficients, converting the first block into a converted block having the first number of quantized transform coefficients, conditioning a main neural network trained to predict at least one output variable given at least one conditioning variable, the at least one conditioning variable being based on information regarding the converted block and block length information for the first block, providing at least one predicted transform coefficients from an output stage of the main neural network.
-
公开(公告)号:US20230395086A1
公开(公告)日:2023-12-07
申请号:US18031790
申请日:2021-10-14
发明人: Mark S. VINTON , Cong ZHOU , Roy M. FEJGIN , Grant A. DAVIDSON
IPC分类号: G10L19/032 , G10L19/06
CPC分类号: G10L19/032 , G10L19/06
摘要: Described herein is a method of processing an audio signal using a neural network or using a first and a second neural network. Described is further a method of training said neural network or of jointly training a set of said first and said second neural network. Moreover, described is a method of obtaining and transmitting a latent feature space representation of a perceptual domain audio signal using a neural network and a method of obtaining an audio signal from a latent feature space representation of a perceptual domain audio signal using a neural network. Described are also respective apparatuses and computer program products.
-
公开(公告)号:US20220335925A1
公开(公告)日:2022-10-20
申请号:US17636851
申请日:2020-08-18
发明人: Cong ZHOU , Xiaoyu LIU , Michael Getty HORGAN , Vivek Kumar
IPC分类号: G10L13/033 , G10L13/047
摘要: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.
-
-
-
-