Abstract:
A signature actor determination method for video identification includes setting a list of actors who appear in each of a plurality of videos, generating a plurality of subsets including the actors, and determining that an actor included in a single final set indicating a first video among the plurality of subsets is a signature actor of the first video. Accordingly, video identification is possible by using just a little information.
Abstract:
Provided are a face de-identification method and system and a graphical user interface (GUI) provision method for face de-identification employing facial image generation. According to the face de-identification method and system and the GUI provision method, a facial area including eyes, a nose, and a mouth in a face of a person detected in an input image is replaced with a de-identified facial area generated through deep learning to maintain the face in a natural shape while protecting the person's portrait right. Accordingly, qualitative degradation of content is prevented, and viewers' concentration on the image is increased.
Abstract:
A method for receiving a mono sound source audio signal including phase information as an input, and separating into a plurality of signals may comprise performing initial convolution and down-sampling on the inputted mono sound source audio signal; generating an encoded signal by encoding the inputted signal using at least one first dense block and at least one down-transition layer; generating a decoded signal by decoding the encoded signal using at least one second dense block and at least one up-transition layer; and performing final convolution and resize on the decoded signal.
Abstract:
An apparatus for generating text from an image may comprise: a memory configured to store at least one instruction; and a processor configured to execute the at least one instruction, wherein the processor is further configured to generate encoding information for an image based on the image and extract text information related to content of the image based on a degree of association with the encoding information.
Abstract:
Disclosed are a learning data generation method and apparatus needed to learn animation characters on the basis of deep learning. The learning data generation method needed to learn animation characters on the basis of deep learning may include collecting various images from an external source using wired/wireless communication, acquiring character images from the collected images using a character detection module, clustering the acquired character images, selecting learning data from among the clustered images, and inputting the selected learning data to an artificial neural network for character recognition.
Abstract:
A method for determining a video-related emotion and a method of generating data for learning video-related emotions include separating an input video into a video stream and an audio stream; analyzing the audio stream to detect a music section; extracting at least one video clip matching the music section; extracting emotion information from the music section; tagging the video clip with the extracted emotion information and outputting the video clip; learning video-related emotions by using the at least one video clip tagged with the emotion information to generate a video-related emotion classification model; and determining an emotion related to an input query video by using the video-related emotion classification model to provide the emotion.
Abstract:
The present invention relates to an apparatus and method for identifying music in a content, The present invention includes extracting and storing a fingerprint of an original audio in an audio fingerprint DB; extracting a first fingerprint of a first audio in the content; and searching for a fingerprint corresponding to the fingerprint of the first audio in the audio fingerprint DB, wherein the first audio is audio data in a music section detected from the content.
Abstract:
An apparatus and method of analyzing and identifying a song with high performance identify a subject song in which global and local characteristics of a feature vector are reflected, and quickly identify a cover song in which changes in tempo and key are reflected by using a feature vector extracting part, a feature vector condensing part, and a feature vector comparing part, and by condensing a feature vector sequence into global and local characteristics in which a melody characteristic is reflected.
Abstract:
An apparatus for recognizing a person includes a content separator configured to receive contents and separate the contents into video content and audio content; a video processor configured to recognize a face from an image in the video content received from the content separator and obtain information on a face recognition section by analyzing the video content; an audio processor configured to recognize a speaker from voice data in the audio content received from the content separator and obtain information on a speaker recognition section by analyzing the audio content; and a person recognized section information provider configured to provide information on a section of the contents in which a person appears based on the information on the face recognition section and the information on the speaker recognition section.
Abstract:
An apparatus and method for managing a representative video image, which selects representative images based on human visual aesthetic criteria and creates an album by arranging the selected representative images in an album template with various layouts, based on the region of interest (ROI).