CONSTRUCTING, EVALUATING, AND IMPROVING A SEARCH STRING FOR RETRIEVING IMAGES INDICATING ITEM USE

    公开(公告)号:US20190205435A1

    公开(公告)日:2019-07-04

    申请号:US15856511

    申请日:2017-12-28

    IPC分类号: G06F17/30 G06T1/00

    摘要: Examples of techniques for constructing, evaluating, and improving a search string for retrieving images are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes constructing, by a processing device, a search string based at least in part on a tuple including an item class, an action, and an actor. The method further includes retrieving, by the processing device, a plurality of images based at least in part on the search string for an item. The method further includes evaluating, by the processing device, the retrieved plurality of images based on a similarity to determine whether the search string is effective at indicating a common item use. The method further includes, based at least in part on determining that the search string is ineffective at indicating the item use, generating, by the processing device, an alternative search string.

    ACOUSTIC MODEL TRAINING
    34.
    发明申请

    公开(公告)号:US20170287469A1

    公开(公告)日:2017-10-05

    申请号:US15479304

    申请日:2017-04-05

    摘要: A method, executed by a computer, includes receiving a channel recording corresponding to a conversation, receiving a transcription for the conversation, generating a conversation-specific language model for the conversation using the transcription, and conducting speech recognition on the channel recording using the conversation-specific language model to provide time boundaries and written language corresponding to utterances within the channel recording. The method further includes determining sentence or phrase boundaries for the transcription, aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording, and training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording. A computer system and computer program product corresponding to the method are also disclosed herein.

    Acoustic model training
    35.
    发明授权

    公开(公告)号:US09697835B1

    公开(公告)日:2017-07-04

    申请号:US15086949

    申请日:2016-03-31

    摘要: A method, executed by a computer, includes receiving a channel recording corresponding to a conversation, receiving a transcription for the conversation, generating a conversation-specific language model for the conversation using the transcription, and conducting speech recognition on the channel recording using the conversation-specific language model to provide time boundaries and written language corresponding to utterances within the channel recording. The method further includes determining sentence or phrase boundaries for the transcription, aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording, and training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording. A computer system and computer program product corresponding to the method are also disclosed herein.

    Combining installed audio-visual sensors with ad-hoc mobile audio-visual sensors for smart meeting rooms
    36.
    发明授权
    Combining installed audio-visual sensors with ad-hoc mobile audio-visual sensors for smart meeting rooms 有权
    将安装的视听传感器与智能会议室的特殊移动视听传感器相结合

    公开(公告)号:US09584758B1

    公开(公告)日:2017-02-28

    申请号:US14952751

    申请日:2015-11-25

    IPC分类号: H04N7/14 H04N7/15 H04L29/06

    摘要: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.

    摘要翻译: 将来自固定视听传感器的数据流与来自个人移动设备的数据流组合的方法,包括与一个或多个个人移动设备中的至少一个形成通信链路; 从所述一个或多个个人移动设备中的至少一个接收音频数据流和/或视频数据流中的至少一个; 确定音频数据流和/或视频数据流中的至少一个的质量,其中保留具有高于阈值质量的质量的音频数据流和/或视频数据流; 以及将所保留的音频数据流和/或视频数据流与来自固定视听传感器的数据流组合。

    Training teacher machine learning models using lossless and lossy branches

    公开(公告)号:US11907845B2

    公开(公告)日:2024-02-20

    申请号:US16994656

    申请日:2020-08-17

    IPC分类号: G06N3/084 G10L15/16 G06N3/045

    CPC分类号: G06N3/084 G06N3/045 G10L15/16

    摘要: Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.

    GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS

    公开(公告)号:US20230153601A1

    公开(公告)日:2023-05-18

    申请号:US17526350

    申请日:2021-11-15

    IPC分类号: G06N3/08 G06N3/04 G10L15/00

    CPC分类号: G06N3/08 G06N3/0454 G10L15/00

    摘要: A computer-implemented method for training a neural transducer for speech recognition is provided. The method includes initializing the neural transducer having a prediction network and an encoder network and a joint network. The method further includes expanding the prediction network by changing the prediction network to a plurality of prediction-net branches. Each of the prediction-net branches is a prediction network for a respective specific sub-task from among a plurality of specific sub-tasks. The method also includes training, by a hardware processor, an entirety of the neural transducer by using training data sets for all of the plurality of specific sub-tasks. The method additionally includes obtaining a trained neural transducer by fusing the plurality of prediction-net branches.