Automatic collection of speaker name pronunciations
    1.
    发明授权
    Automatic collection of speaker name pronunciations 有权
    自动收集扬声器名称发音

    公开(公告)号:US09240181B2

    公开(公告)日:2016-01-19

    申请号:US13970850

    申请日:2013-08-20

    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.

    Abstract translation: 使用说话者分割和识别(SSR)将音频流分割成多个时间段,每个时间段对应于说话人的姓名,产生SSR记录。 使用自动语音识别(ASR)将音频流转录成多个单词区域,每个单词区域具有对翻译精度的置信度的度量,产生ASR记录。 确定了对翻译准确性相对较低置信度的词区域。 使用命名实体识别(NER)规则过滤低置信区域以识别可能名称的低置信区域。 NER规则将被识别为可能的名称的区域与与当前的,先前的或下一个时间段相对应的说话者的名称相关联。 选择与该扬声器名称相关联的所有可能的名称区域。

    Method and system for facial recognition for a videoconference
    2.
    发明授权
    Method and system for facial recognition for a videoconference 有权
    视频会议的面部识别方法和系统

    公开(公告)号:US09282284B2

    公开(公告)日:2016-03-08

    申请号:US13897476

    申请日:2013-05-20

    CPC classification number: H04N7/15 G06K9/00288

    Abstract: Videoconferencing may be provided. A participant may be identified from audio information and in video information. From the video information, a plurality of images may be captured of the participant identified in the video information. A unique identifier may be associated with the captured plurality of images. The unique identifier may correspond to the participant identified from the audio information. The captured plurality of images and the associated unique identifier may be saved in a database.

    Abstract translation: 可以提供视频会议。 可以从音频信息和视频信息中识别参与者。 从视频信息,可以捕获在视频信息中标识的参与者的多个图像。 唯一标识符可以与捕获的多个图像相关联。 唯一标识符可以对应于从音频信息识别的参与者。 捕获的多个图像和相关联的唯一标识符可以保存在数据库中。

    Method and apparatus for using face detection information to improve speaker segmentation
    3.
    发明授权
    Method and apparatus for using face detection information to improve speaker segmentation 有权
    用于使用面部检测信息来改善说话者分割的方法和装置

    公开(公告)号:US09165182B2

    公开(公告)日:2015-10-20

    申请号:US13969914

    申请日:2013-08-19

    CPC classification number: G06K9/00228 H04N7/147 H04S7/303

    Abstract: In one embodiment, a method includes obtaining media that includes a video stream and an audio stream. The method also includes detecting a number of faces visible in the video stream, and performing a speaker segmentation on the media. Performing the speaker segmentation on the media includes utilizing the number of faces visible in the video stream to augment the speaker segmentation.

    Abstract translation: 在一个实施例中,一种方法包括获得包括视频流和音频流的媒体。 该方法还包括检测视频流中可见的多个面部以及在介质上执行扬声器分割。 在媒体上执行扬声器分割包括利用在视频流中可见的面的数量来增加说话者分割。

    Automatic Collection of Speaker Name Pronunciations
    4.
    发明申请
    Automatic Collection of Speaker Name Pronunciations 有权
    自动收集扬声器名称发音

    公开(公告)号:US20150058005A1

    公开(公告)日:2015-02-26

    申请号:US13970850

    申请日:2013-08-20

    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.

    Abstract translation: 使用说话者分割和识别(SSR)将音频流分割成多个时间段,每个时间段对应于说话人的姓名,产生SSR记录。 使用自动语音识别(ASR)将音频流转录成多个单词区域,每个单词区域具有对翻译精度的置信度的度量,产生ASR记录。 确定了对翻译准确性相对较低置信度的词区域。 使用命名实体识别(NER)规则过滤低置信区域以识别可能名称的低置信区域。 NER规则将被识别为可能的名称的区域与与当前的,先前的或下一个时间段相对应的说话者的名称相关联。 选择与该扬声器名称相关联的所有可能的名称区域。

    METHOD AND APPARATUS FOR USING FACE DETECTION INFORMATION TO IMPROVE SPEAKER SEGMENTATION
    5.
    发明申请
    METHOD AND APPARATUS FOR USING FACE DETECTION INFORMATION TO IMPROVE SPEAKER SEGMENTATION 有权
    使用面部检测信息改进扬声器分类的方法和装置

    公开(公告)号:US20150049247A1

    公开(公告)日:2015-02-19

    申请号:US13969914

    申请日:2013-08-19

    CPC classification number: G06K9/00228 H04N7/147 H04S7/303

    Abstract: In one embodiment, a method includes obtaining media that includes a video stream and an audio stream. The method also includes detecting a number of faces visible in the video stream, and performing a speaker segmentation on the media. Performing the speaker segmentation on the media includes utilizing the number of faces visible in the video stream to augment the speaker segmentation.

    Abstract translation: 在一个实施例中,一种方法包括获得包括视频流和音频流的媒体。 该方法还包括检测视频流中可见的多个面部以及在介质上执行扬声器分割。 在媒体上执行扬声器分割包括利用在视频流中可见的面的数量来增加说话者分割。

    Method and System for Facial Recognition for a Videoconference
    6.
    发明申请
    Method and System for Facial Recognition for a Videoconference 有权
    视频会议面部识别方法与系统

    公开(公告)号:US20140340467A1

    公开(公告)日:2014-11-20

    申请号:US13897476

    申请日:2013-05-20

    CPC classification number: H04N7/15 G06K9/00288

    Abstract: Videoconferencing may be provided. A participant may be identified from audio information and in video information. From the video information, a plurality of images may be captured of the participant identified in the video information. A unique identifier may be associated with the captured plurality of images. The unique identifier may correspond to the participant identified from the audio information. The captured plurality of images and the associated unique identifier may be saved in a database.

    Abstract translation: 可以提供视频会议。 可以从音频信息和视频信息中识别参与者。 从视频信息,可以捕获在视频信息中标识的参与者的多个图像。 唯一标识符可以与捕获的多个图像相关联。 唯一标识符可以对应于从音频信息识别的参与者。 捕获的多个图像和相关联的唯一标识符可以保存在数据库中。

Patent Agency Ranking