-
公开(公告)号:US11023690B2
公开(公告)日:2021-06-01
申请号:US16398836
申请日:2019-04-30
发明人: Takuya Yoshioka , Andreas Stolcke , Zhuo Chen , Dimitrios Basile Dimitriadis , Nanshan Zeng , Lijuan Qin , William Isaac Hinthorn , Xuedong Huang
摘要: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.
-
公开(公告)号:US10812921B1
公开(公告)日:2020-10-20
申请号:US16399122
申请日:2019-04-30
发明人: William Isaac Hinthorn , Lijuan Qin , Nanshan Zeng , Dimitrios Basile Dimitriadis , Zhuo Chen , Andreas Stolcke , Takuya Yoshioka , Xuedong Huang
IPC分类号: H04R3/00 , H04S3/00 , H04N21/43 , H04N21/422 , H04R5/04
摘要: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.
-
公开(公告)号:US09900423B2
公开(公告)日:2018-02-20
申请号:US15249231
申请日:2016-08-26
发明人: Tony He , Susan Chory , Gregory Howard , Peter Bergler , Lijuan Qin , Jon Arnett , Janis Jungeun Lee , Petteri Mikkola , Issa Y. Khoury
IPC分类号: H04W8/22 , H04M1/725 , H04W4/12 , H04W4/00 , H04W8/18 , H04W4/16 , H04L12/24 , H04L29/12 , H04L29/06 , H04W88/06
CPC分类号: H04M1/72563 , H04L41/22 , H04L61/1594 , H04L63/0853 , H04M1/72519 , H04W4/02 , H04W4/12 , H04W4/16 , H04W4/50 , H04W4/60 , H04W8/18 , H04W8/183 , H04W8/22 , H04W88/06
摘要: Various user interfaces and other technologies for interacting with devices that support multiple communication lines can be implemented. Scenarios providing separate communications lines, such as voice over internet protocol (VOIP), social network communications, and the like can be supported. For example, communication-line-separated and communication-line-aggregated user interface paradigms can be supported. Intelligent selection of an appropriate paradigm can support user preferences, conversation user interfaces, and the like. Other features such as communication line defaults can help users deal with multiple communication line scenarios. A consistent, compact user interface for switching communication lines can be supported. Users can interact with their devices more efficiently and with less frustration. A wide variety of use scenarios are supported.
-
公开(公告)号:US11468895B2
公开(公告)日:2022-10-11
申请号:US16399152
申请日:2019-04-30
发明人: Takuya Yoshioka , Andreas Stolcke , Zhuo Chen , Dimitrios Basile Dimitriadis , Nanshan Zeng , Lijuan Qin , William Isaac Hinthorn , Xuedong Huang
IPC分类号: G10L15/26 , H04L65/403 , H04R1/40
摘要: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.
-
公开(公告)号:US20200351603A1
公开(公告)日:2020-11-05
申请号:US16399122
申请日:2019-04-30
发明人: William Isaac Hinthorn , Lijuan Qin , Nanshan Zeng , Dimitrios Basile Dimitriadis , Zhuo Chen , Andreas Stolcke , Takuya Yoshioka , Xuedong Huang
IPC分类号: H04S3/00 , H04R3/00 , H04R5/04 , H04N21/43 , H04N21/422
摘要: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.
-
公开(公告)号:US11875796B2
公开(公告)日:2024-01-16
申请号:US16399081
申请日:2019-04-30
发明人: Lijuan Qin , Nanshan Zeng , Dimitrios Basile Dimitriadis , Zhuo Chen , Andreas Stolcke , Takuya Yoshioka , William Isaac Hinthorn , Xuedong Huang
IPC分类号: G10L15/26 , G10L15/22 , H04L65/403 , H04N7/15 , H04R1/40
CPC分类号: G10L15/26 , G10L15/22 , H04L65/403 , H04N7/15 , H04R1/406
摘要: A computer implemented method includes receiving information streams on a meeting server from a set of multiple distributed devices included in a meeting, receiving audio signals representative of speech by at least two users in at least two of the information streams, receiving at least one video signal of at least one user in the information streams, associating a specific user with speech in the received audio signals as a function of the received audio and video signals, and generating a transcript of the meeting with an indication of the specific user associated with the speech.
-
公开(公告)号:US11322148B2
公开(公告)日:2022-05-03
申请号:US16399166
申请日:2019-04-30
发明人: Takuya Yoshioka , Andreas Stolcke , Zhuo Chen , Dimitrios Basile Dimitriadis , Nanshan Zeng , Lijuan Qin , William Isaac Hinthorn , Xuedong Huang
IPC分类号: G10L15/26 , G10L15/08 , G10L19/018
摘要: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices. Operations include performing speech recognition on each audio stream by a corresponding speech recognition system to generate utterance-level posterior probabilities as hypotheses for each audio stream, aligning the hypotheses and formatting them as word confusion networks with associated word-level posteriors probabilities, performing speaker recognition on each audio stream by a speaker identification algorithm that generates a stream of speaker-attributed word hypotheses, formatting speaker hypotheses with associated speaker label posterior probabilities and speaker-attributed hypotheses for each audio stream as a speaker confusion network, aligning the word and speaker confusion networks from all audio streams to each other to merge the posterior probabilities and align word and speaker labels, and creating a best speaker-attributed word transcript by selecting the sequence of word and speaker labels with the highest posterior probabilities.
-
公开(公告)号:US10743107B1
公开(公告)日:2020-08-11
申请号:US16399369
申请日:2019-04-30
发明人: Takuya Yoshioka , Andreas Stolcke , Zhuo Chen , Dimitrios Basile Dimitriadis , Nanshan Zeng , Lijuan Qin , William Isaac Hinthorn , Xuedong Huang
摘要: A computer implemented method includes receiving audio signals representative of speech via multiple audio channels transmitted from corresponding multiple distributed devices, designating one of the audio channels as a reference channel, and for each of the remaining audio channels, determine a difference in time from the reference channel, and correcting each remaining audio channel by compensating for the corresponding difference in time from the reference channel.
-
公开(公告)号:US20190236416A1
公开(公告)日:2019-08-01
申请号:US15885518
申请日:2018-01-31
发明人: Zhenghao Wang , Xuedong Huang , Lijuan Qin , Kun Wu , Huaming Wang
IPC分类号: G06K9/62 , H04N5/232 , H04N5/262 , G06K9/00 , G10L17/22 , G06F3/16 , G06F3/01 , H04R1/22 , G06K7/14 , G06K7/10 , G06N3/08
CPC分类号: G06K9/6289 , G06F3/017 , G06F3/16 , G06F3/167 , G06K7/10722 , G06K7/1417 , G06K9/00288 , G06N3/08 , G10L17/22 , H04N5/23216 , H04N5/23238 , H04N5/2628 , H04N13/204 , H04R1/222 , H04R1/2892 , H04R2201/401
摘要: In some embodiments, the disclosed subject matter involves a system and method relating to using an ambient capture device including a fisheye camera and a microphone array to capture audio and video in an environment, for use in an artificial intelligence (Al) application. The device with fisheye camera may provide approximately a 360° audio and video view, at relatively low cost. An embodiment may utilize a speech and vision fusion model component. The speech and vision fusion model may be trained using deep learning to combine features from many different sources, including available sensor data from the capture device. A long short term memory (LSTM) model may inter or identify features such as, but not limited to: audio direction; vision detection and tracking; voice signature; facial signature; gesture recognition; and object identification. The fusion processing may be performed by a cloud server, enabling the capture device to remain less complex.
-
公开(公告)号:US11138980B2
公开(公告)日:2021-10-05
申请号:US16399175
申请日:2019-04-30
发明人: Takuya Yoshioka , Andreas Stolcke , Zhuo Chen , Dimitrios Basile Dimitriadis , Nanshan Zeng , Lijuan Qin , William Isaac Hinthorn , Xuedong Huang
IPC分类号: G10L21/0272 , G10L25/30 , G10L15/30 , G10L15/16 , G10L21/0208
摘要: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.
-
-
-
-
-
-
-
-
-