-
公开(公告)号:US20250054491A1
公开(公告)日:2025-02-13
申请号:US18721121
申请日:2021-12-22
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Sayan Dev PATHAK , Hosam Adel KHALIL , Naveen PARIHAR , Piyush BEHRE , Shuangyu CHANG , Christopher Hakan BASOGLU , Sharman W TAN , Eva SHARMA , Jian WU , Yang LIU , Edward C LIN , Amit Kumar AGARWAL
Abstract: Systems and methods are provided for smart audio segmentation using look-ahead based acousto-linguistic features. For example, systems and methods are provided for obtaining audio, processing the audio, identifying a potential segmentation boundary within the audio, and determining whether to generate a segment break at the potential segmentation boundary. One or more look-ahead words occurring after the potential segmentation boundary are identified, wherein an acoustic segmentation score and a language segmentation score associated with the potential segmentation boundary and the one or more look-ahead words are generated. Systems then either refrain from generating a segment break at the potential segmentation boundary or generate the segment break at the potential segmentation boundary based on the acoustic and/or language segmentation score at least meeting or exceeding a segmentation score threshold.
-
公开(公告)号:US20220343543A1
公开(公告)日:2022-10-27
申请号:US17240510
申请日:2021-04-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Sunando SENGUPTA , Alexandros NEOFYTOU , Eric Chris Wolfgang SOMMERLADE , Yang LIU
IPC: G06T9/00 , G06T3/60 , G10L19/012 , G06K9/62 , G10L25/51
Abstract: In various embodiments, a computer-implemented method of training a neural network for creating an output signal of different modality from an input signal is described. In embodiments, the first modality may be a sound signal or a visual image and where the output signal would be a visual image or a sound signal, respectively. In embodiments a model is trained using a first pair of visual and audio networks to train a set of codebooks using known visual signals and the audio signals and using a second pair of visual and audio networks to further train the set of codebooks using the augmented visual signals and the augmented audio signals. Further, the first and the second visual networks are equally weighted and where the first and the second audio networks are equally weighted. In aspects of the present disclosure, the set of codebooks comprise a visual codebook, an audio codebook and a correlation codebook. These codebooks are then used to create an visual image from a sound signal and/or a sound signal from a visual image.
-
公开(公告)号:US20180159804A1
公开(公告)日:2018-06-07
申请号:US15578203
申请日:2015-05-29
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Furu WEI , Ming ZHOU , Yang LIU , Ziqiang CAO , Shaohan HUANG , Li DONG , Lei CUI
CPC classification number: H04L51/04 , G06F17/18 , G06F17/2229 , G06F17/2235 , G06F17/241 , G06F17/277 , G06F17/2775 , G06F17/278 , G06F17/2785 , G06N5/022
Abstract: Methods and systems for linking comments to portions of content items. An example computing device receives information associated with a content item produced by a source system, the content item being accessible to other the computing devices via a network and receives a comment associated with the content item, the comment produced by one of the other computing devices. In response to receiving the information and the comment, the computing device predicts a subsection of the content item to link to the received comment based at least on details associated with the content item and the comment, then makes information associated with the predicted subsection of the content item available to other computing devices requesting access to the content item.
-
公开(公告)号:US20180150450A1
公开(公告)日:2018-05-31
申请号:US15578195
申请日:2015-05-29
Applicant: Furu WEI , Ming ZHOU , Yang LIU , Ziqiang CAO , Shaohan HUANG , Li DONG , Lei CUI , MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Furu WEI , Ming ZHOU , Yang LIU , Ziqiang CAO , Shaohan HUANG , Li DONG , Lei CUI
Abstract: Methods and systems for providing a comments-centered news reader. Configurations allow live comments to be presented along with the news or similar website content. While a user scrolls up and down in a browser presenting a news article on the user's computer device (e.g., mobile device), linked comments are shown in a selected region. The displayed comments automatically change to adapt to what parts (paragraphs, sentences) of the news article that user is currently reading. At the same time, users can publish their own comments without having to proceed to a separate section of the browser, thus saving the viewer actions and improving the user's experience. The user's system or a remote server records the comments along with the article or the place users are in the article when the comment was entered.
-
公开(公告)号:US20230076387A1
公开(公告)日:2023-03-09
申请号:US18050287
申请日:2022-10-27
Applicant: Microsoft Technology Licensing, LLC
Inventor: Furu WEI , Ming ZHOU , Yang LIU , Ziqiang CAO , Shaohan HUANG , Li DONG , Lei CUI
IPC: H04L51/04 , G06F40/30 , G06F40/131 , G06F40/134 , G06F40/169 , G06F40/284 , G06F40/289 , G06F40/295
Abstract: Methods and systems for linking comments to portions of content items. An example computing device receives information associated with a content item produced by a source system, the content item being accessible to other the computing devices via a network and receives a comment associated with the content item, the comment produced by one of the other computing devices. In response to receiving the information and the comment, the computing device predicts a subsection of the content item to link to the received comment based at least on details associated with the content item and the comment, then makes information associated with the predicted subsection of the content item available to other computing devices requesting access to the content item.
-
公开(公告)号:US20210216817A1
公开(公告)日:2021-07-15
申请号:US16844930
申请日:2020-04-09
Applicant: Microsoft Technology Licensing, LLC
Inventor: Eric Chris Wolfgang SOMMERLADE , Yang LIU , Alexandros NEOFYTOU , Sunando SENGUPTA
Abstract: A computing system includes an encoder that receives an input image and encodes the input image into real image features, a decoder that decodes the real image features into a reconstructed image, a generator that receives first audio data corresponding to the input image and generates first synthetic image features from the first audio data, and receives second audio data and generates second synthetic image features from the second audio data, a discriminator that receives both the real and synthetic image features and determines whether a target feature is real or synthetic, and a classifier that classifies a scene of the second audio data based on the second synthetic image features.
-
公开(公告)号:US20200042863A1
公开(公告)日:2020-02-06
申请号:US16606653
申请日:2018-04-20
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Pengshuai WANG , Yang LIU , Xin TONG
IPC: G06N3/04 , G06F17/15 , G06F16/901 , G06F1/20
Abstract: The implementations of the subject matter described herein relate to an octree-based convolutional neural network. In some implementations, there is provided a computer-implemented method for processing a three-dimensional shape. The method comprises obtaining an octree for representing the three-dimensional shape. Nodes of the octree include empty nodes and non-empty nodes. The empty nodes exclude the three-dimensional shape and are leaf nodes of the octree, and the non-empty nodes include at least a part of the three-dimensional shape. The method further comprises for nodes in the octree with a depth associated with a convolutional layer of a convolutional neural network, performing a convolutional operation of the convolutional layer to obtain an output of the convolutional layer.
-
公开(公告)号:US20250061277A1
公开(公告)日:2025-02-20
申请号:US18720606
申请日:2021-12-15
Applicant: Chenguang ZHU , Yang LIU , David HUNG , Nanshan ZENG , Microsoft Technology Licensing, LLC
Inventor: Chenguang ZHU , Yang LIU , David Peace HUNG , Nanshan ZENG
IPC: G06F40/284 , G06F40/30 , G10L15/26
Abstract: The disclosure herein describes using a deep learning model to identify topic segments of a communication transcript. A communication transcript including a set of utterances is obtained. The set of utterances is divided into a plurality of utterance windows, wherein each utterance window of the plurality of utterance windows includes a different subset of utterances of the set of utterances, and wherein each utterance of the set of utterances is included in at least one utterance window of the plurality of utterance windows. For each utterance window of the plurality of utterance windows, each utterance in the utterance window is classified as a topic boundary or a non-boundary using a deep learning model. Topic segments of the communication transcript are identified based on utterances of the set of utterances that are classified as topic boundaries. A communication transcript summary is generated using the communication transcript and the identified topic segments.
-
公开(公告)号:US20240346254A1
公开(公告)日:2024-10-17
申请号:US18133938
申请日:2023-04-12
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yang LIU , Yichong XU , Dan ITER , Chenguang ZHU , Nanshan ZENG , Shuohang WANG , Hiteshi SHARMA
IPC: G06F40/40 , G06F40/186 , G06F40/20 , G06F40/35 , G06N20/00
CPC classification number: G06F40/40 , G06F40/186 , G06F40/20 , G06F40/35 , G06N20/00
Abstract: The techniques described herein enhance the operations of natural language generation systems through training and/or augmentation by a large language model. In a first example, the large language model can execute training operations by processing a training dataset to produce a natural language output. The natural language generation system can analyze the training dataset and the natural language output to generate a natural language output mimicking the output of the large language model. The large language model can then evaluate the output of the natural language generation system to iteratively adjust and improve the quality of natural language outputs. In a second example, the large language can augment a small language model in executing natural language tasks. This is accomplished by retrieving external information using the large language model to generate an augmentation input to provide context and a language framework to the small language model to enhance overall outputs.
-
公开(公告)号:US20240054683A1
公开(公告)日:2024-02-15
申请号:US18383956
申请日:2023-10-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Sunando SENGUPTA , Alexandros NEOFYTOU , Eric Chris Wolfgang SOMMERLADE , Yang LIU
IPC: G06T9/00 , G06T3/60 , G10L19/012 , G10L25/51 , G06F18/21
CPC classification number: G06T9/00 , G06T3/60 , G10L19/012 , G10L25/51 , G06F18/21 , G10L2019/0002
Abstract: In various embodiments, a computer-implemented method of training a neural network for creating an output signal of different modality from an input signal is described. In embodiments, the first modality may be a sound signal or a visual image and where the output signal would be a visual image or a sound signal, respectively. In embodiments a model is trained using a first pair of visual and audio networks to train a set of codebooks using known visual signals and the audio signals and using a second pair of visual and audio networks to further train the set of codebooks using the augmented visual signals and the augmented audio signals. Further, the first and the second visual networks are equally weighted and where the first and the second audio networks are equally weighted.
-
-
-
-
-
-
-
-
-