ENHANCED USER EXPERIENCE THROUGH BI-DIRECTIONAL AUDIO AND VISUAL SIGNAL GENERATION

    公开(公告)号:US20220343543A1

    公开(公告)日:2022-10-27

    申请号:US17240510

    申请日:2021-04-26

    Abstract: In various embodiments, a computer-implemented method of training a neural network for creating an output signal of different modality from an input signal is described. In embodiments, the first modality may be a sound signal or a visual image and where the output signal would be a visual image or a sound signal, respectively. In embodiments a model is trained using a first pair of visual and audio networks to train a set of codebooks using known visual signals and the audio signals and using a second pair of visual and audio networks to further train the set of codebooks using the augmented visual signals and the augmented audio signals. Further, the first and the second visual networks are equally weighted and where the first and the second audio networks are equally weighted. In aspects of the present disclosure, the set of codebooks comprise a visual codebook, an audio codebook and a correlation codebook. These codebooks are then used to create an visual image from a sound signal and/or a sound signal from a visual image.

    COMMENT-CENTERED NEWS READER
    4.
    发明申请

    公开(公告)号:US20180150450A1

    公开(公告)日:2018-05-31

    申请号:US15578195

    申请日:2015-05-29

    Abstract: Methods and systems for providing a comments-centered news reader. Configurations allow live comments to be presented along with the news or similar website content. While a user scrolls up and down in a browser presenting a news article on the user's computer device (e.g., mobile device), linked comments are shown in a selected region. The displayed comments automatically change to adapt to what parts (paragraphs, sentences) of the news article that user is currently reading. At the same time, users can publish their own comments without having to proceed to a separate section of the browser, thus saving the viewer actions and improving the user's experience. The user's system or a remote server records the comments along with the article or the place users are in the article when the comment was entered.

    CLASSIFYING AUDIO SCENE USING SYNTHETIC IMAGE FEATURES

    公开(公告)号:US20210216817A1

    公开(公告)日:2021-07-15

    申请号:US16844930

    申请日:2020-04-09

    Abstract: A computing system includes an encoder that receives an input image and encodes the input image into real image features, a decoder that decodes the real image features into a reconstructed image, a generator that receives first audio data corresponding to the input image and generates first synthetic image features from the first audio data, and receives second audio data and generates second synthetic image features from the second audio data, a discriminator that receives both the real and synthetic image features and determines whether a target feature is real or synthetic, and a classifier that classifies a scene of the second audio data based on the second synthetic image features.

    OCTREE-BASED CONVOLUTIONAL NEURAL NETWORK
    7.
    发明申请

    公开(公告)号:US20200042863A1

    公开(公告)日:2020-02-06

    申请号:US16606653

    申请日:2018-04-20

    Abstract: The implementations of the subject matter described herein relate to an octree-based convolutional neural network. In some implementations, there is provided a computer-implemented method for processing a three-dimensional shape. The method comprises obtaining an octree for representing the three-dimensional shape. Nodes of the octree include empty nodes and non-empty nodes. The empty nodes exclude the three-dimensional shape and are leaf nodes of the octree, and the non-empty nodes include at least a part of the three-dimensional shape. The method further comprises for nodes in the octree with a depth associated with a convolutional layer of a convolutional neural network, performing a convolutional operation of the convolutional layer to obtain an output of the convolutional layer.

    TRAINING AND USING A DEEP LEARNING MODEL FOR TRANSCRIPT TOPIC SEGMENTATION

    公开(公告)号:US20250061277A1

    公开(公告)日:2025-02-20

    申请号:US18720606

    申请日:2021-12-15

    Abstract: The disclosure herein describes using a deep learning model to identify topic segments of a communication transcript. A communication transcript including a set of utterances is obtained. The set of utterances is divided into a plurality of utterance windows, wherein each utterance window of the plurality of utterance windows includes a different subset of utterances of the set of utterances, and wherein each utterance of the set of utterances is included in at least one utterance window of the plurality of utterance windows. For each utterance window of the plurality of utterance windows, each utterance in the utterance window is classified as a topic boundary or a non-boundary using a deep learning model. Topic segments of the communication transcript are identified based on utterances of the set of utterances that are classified as topic boundaries. A communication transcript summary is generated using the communication transcript and the identified topic segments.

Patent Agency Ranking