GLOBALLY NORMALIZED NEURAL NETWORKS
    1.
    发明申请

    公开(公告)号:US20170270407A1

    公开(公告)日:2017-09-21

    申请号:US15407470

    申请日:2017-01-17

    Applicant: Google Inc.

    CPC classification number: G06N3/08 G06N3/04 G06N5/003 G06N7/005

    Abstract: A method includes training a neural network having parameters on training data, in which the neural network receives an input state and processes the input state to generate a respective score for each decision in a set of decisions. The method includes receiving training data including training text sequences and, for each training text sequence, a corresponding gold decision sequence. The method includes training the neural network on the training data to determine trained values of parameters of the neural network. Training the neural network includes for each training text sequence: maintaining a beam of candidate decision sequences for the training text sequence, updating each candidate decision sequence by adding one decision at a time, determining that a gold candidate decision sequence matching a prefix of the gold decision sequence has dropped out of the beam, and in response, performing an iteration of gradient descent to optimize an objective function.

    GENERATION OF TIMED TEXT USING SPEECH-TO-TEXT TECHNOLOGY, AND APPLICATIONS THEREOF
    2.
    发明申请
    GENERATION OF TIMED TEXT USING SPEECH-TO-TEXT TECHNOLOGY, AND APPLICATIONS THEREOF 审中-公开
    使用语音文本技术生成定时文本及其应用

    公开(公告)号:US20140142941A1

    公开(公告)日:2014-05-22

    申请号:US14165484

    申请日:2014-01-27

    Applicant: Google Inc.

    CPC classification number: G10L15/26 G10L15/30 G11B27/105 G11B27/28 G11B27/34

    Abstract: Embodiments relate to generation of timed text in web video. In an embodiment, a computer-implemented method generates timed text for online video. In the method, a request to play a timed text track of a video incorporated into a web video service is received from a client computing device. Prior to receipt of the request, audio of the video is processed to determine intermediate timed text data. The intermediate timed text data lacks a complete text transcription of the audio, but includes data to enable the complete text transcription to be generated when playing the video. In response to receipt of the request, a text transcription of the audio is determined using the intermediate data with an automated speech-to-text algorithm. Finally, the text transcription of the audio is sent to the client computing device for display along with the video.

    Abstract translation: 实施例涉及在网络视频中生成定时文本。 在一个实施例中,计算机实现的方法生成在线视频的定时文本。 在该方法中,从客户机计算装置接收播放被并入到web视频服务中的视频的定时文本轨道的请求。 在接收到请求之前,处理视频的音频以确定中间定时文本数据。 中间定时文本数据缺少音频的完整文本转录,但包括数据,以便在播放视频时生成完整的文本转录。 响应于接收到请求,使用具有自动语音到文本算法的中间数据确定音频的文本转录。 最后,将音频的文本转录发送到客户端计算设备以与视频一起显示。

Patent Agency Ranking