SYSTEM AND METHOD FOR JOINT SPEAKER AND SCENE RECOGNITION IN A VIDEO/AUDIO PROCESSING ENVIRONMENT
    1.
    发明申请
    SYSTEM AND METHOD FOR JOINT SPEAKER AND SCENE RECOGNITION IN A VIDEO/AUDIO PROCESSING ENVIRONMENT 审中-公开
    视频/音频处理环境中的联合扬声器和场景识别的系统和方法

    公开(公告)号:US20130300939A1

    公开(公告)日:2013-11-14

    申请号:US13469886

    申请日:2012-05-11

    IPC分类号: H04N5/14

    摘要: An example method is provided and includes receiving a media file that includes video data and audio data; determining an initial scene sequence in the media file; determining an initial speaker sequence in the media file; and updating a selected one of the initial scene sequence and the initial speaker sequence in order to generate an updated scene sequence and an updated speaker sequence respectively. The initial scene sequence is updated based on the initial speaker sequence, and wherein the initial speaker sequence is updated based on the initial scene sequence.

    摘要翻译: 提供了一种示例性方法,并且包括接收包括视频数据和音频数据的媒体文件; 确定媒体文件中的初始场景序列; 确定所述媒体文件中的初始说话者序列; 以及更新所述初始场景序列和所述初始说话者序列中的所选择的一个,以便分别生成更新的场景序列和更新的说话者序列。 基于初始说话者序列更新初始场景序列,并且其中基于初始场景序列更新初始说话者序列。

    Speaker segmentation and recognition based on list of speakers
    2.
    发明授权
    Speaker segmentation and recognition based on list of speakers 有权
    扬声器分割和识别基于扬声器列表

    公开(公告)号:US09058806B2

    公开(公告)日:2015-06-16

    申请号:US13608420

    申请日:2012-09-10

    IPC分类号: G10L17/06 G10L17/22 G10L17/02

    CPC分类号: G10L17/02

    摘要: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.

    摘要翻译: 提供了一种方法,并且包括从一个或多个应用估计文件中的潜在扬声器的近似列表。 文件(例如,音频文件,视频文件或其任何合适的组合)包括多个扬声器的记录。 该方法还包括根据潜在扬声器的近似列表来分割文件,使得每个段对应于至少一个扬声器; 并根据潜在发言人的大致名单,识别文件中的特定发言人。

    SYSTEM AND METHOD FOR IMPROVING SPEAKER SEGMENTATION AND RECOGNITION ACCURACY IN A MEDIA PROCESSING ENVIRONMENT
    3.
    发明申请
    SYSTEM AND METHOD FOR IMPROVING SPEAKER SEGMENTATION AND RECOGNITION ACCURACY IN A MEDIA PROCESSING ENVIRONMENT 有权
    用于提高媒体处理环境中的扬声器分类和识别精度的系统和方法

    公开(公告)号:US20140074471A1

    公开(公告)日:2014-03-13

    申请号:US13608420

    申请日:2012-09-10

    IPC分类号: G10L17/00

    CPC分类号: G10L17/02

    摘要: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.

    摘要翻译: 提供了一种方法,并且包括从一个或多个应用估计文件中的潜在扬声器的近似列表。 文件(例如,音频文件,视频文件或其任何合适的组合)包括多个扬声器的记录。 该方法还包括根据潜在扬声器的近似列表来分割文件,使得每个段对应于至少一个扬声器; 并根据潜在发言人的大致名单,识别文件中的特定发言人。

    METHOD AND APPARATUS FOR DISCOVERING AND LABELING SPEAKERS IN A LARGE AND GROWING COLLECTION OF VIDEOS WITH MINIMAL USER EFFORT
    4.
    发明申请
    METHOD AND APPARATUS FOR DISCOVERING AND LABELING SPEAKERS IN A LARGE AND GROWING COLLECTION OF VIDEOS WITH MINIMAL USER EFFORT 审中-公开
    用于发现和标示演讲者的方法和装置,并以最小的用户体验收集视频

    公开(公告)号:US20130144414A1

    公开(公告)日:2013-06-06

    申请号:US13312800

    申请日:2011-12-06

    IPC分类号: G06F17/00

    CPC分类号: G10L17/02

    摘要: In one embodiment, an audio stream is partitioned into a plurality of segments such that the plurality of segments are clustered into one or more clusters, each of the one or more clusters identifying a subset of the plurality of segments in the audio stream and corresponding to one of a first set of one or more speaker models, each speaker model in the first set of speaker models representing one of a first set of hypothetical speakers. The speaker models in the first set of speaker models are compared with a second set of one or more speaker models, where each speaker model in the second set of speaker models represents one of a second set of hypothetical speakers. Labels associated with one or more speaker models in the second set of speaker models are propagated to one or more speaker models in the first set of speaker models according to a result of the comparing step.

    摘要翻译: 在一个实施例中,音频流被划分成多个段,使得多个段被聚集成一个或多个簇,一个或多个簇中的每一个标识音频流中的多个段的子集,并对应于 一个或多个扬声器模型的第一组中的一个,第一组扬声器模型中的每个扬声器模型代表第一组假想扬声器之一。 将第一组扬声器模型中的扬声器模型与第二组一个或多个扬声器模型进行比较,其中第二组扬声器模型中的每个扬声器模型表示第二组假想扬声器中的一个。 根据比较步骤的结果,在第二组扬声器模型中与一个或多个扬声器模型相关联的标签被传播到第一组扬声器模型中的一个或多个扬声器模型。

    Dynamic backlight control for video displays
    5.
    发明申请
    Dynamic backlight control for video displays 审中-公开
    视频显示器的动态背光控制

    公开(公告)号:US20120200484A1

    公开(公告)日:2012-08-09

    申请号:US12931553

    申请日:2011-02-04

    IPC分类号: G09G3/36

    摘要: Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.

    摘要翻译: 包括诸如手机或个人媒体播放器中的LCD屏幕的视觉显示器的电池供电设备的扩展操作取决于显示设备的低功耗。 为了节省显示功率,可以使用动态背光控制,包括调整背光亮度,并结合变换要显示的视频数据。 当显示视频或电影时,为了最小化感知到的闪烁,背光亮度的动态变化可以被限制为与场景变化一致。 视频场景变化可以在其在客户端设备中最终使用之前确定,并且可以将可用的场景变化信息与视频一起下载到客户端设备。 或者,可以在客户端设备或其他地方确定的场景变化信息存储在客户端设备上,以便在实际的视频显示期间稍后使用。

    Method of and apparatus for signal recognition that compensates for
mismatching
    6.
    发明授权
    Method of and apparatus for signal recognition that compensates for mismatching 失效
    用于信号识别的方法和装置,用于补偿不匹配

    公开(公告)号:US5727124A

    公开(公告)日:1998-03-10

    申请号:US263284

    申请日:1994-06-21

    摘要: Disclosed is a method for drastically reducing the average error rate for signals under mismatched conditions. The method takes a signal (e.g., speech signal) and a set of stored representations (e.g., stored representations of keywords) and performs at least one transformation that results in the signal more closely emulating the stored representations. This is accomplished by using one of three techniques. First, one may transform the signal so that the signal may be better approximated by (e.g., is closer to) one of the stored representations. Second, one may transform the set of stored representations so that one of the stored representations better approximates the signal. Third, one may transform both the signal and the set of stored representations.

    摘要翻译: 公开了一种用于在不匹配条件下显着降低信号的平均误码率的方法。 该方法获取信号(例如,语音信号)和一组存储的表示(例如,存储的关键字的表示),并执行至少一个导致信号更接近地仿真所存储的表示的变换。 这通过使用三种技术之一来实现。 首先,可以对信号进行变换,使得信号可以通过(例如,更靠近)存储的表示之一更好地近似。 第二,可以转换所存储的表示集合,使得所存储的表示中的一个更接近于该信号。 第三,可以转换信号和存储的表示集合。

    Method and system for learning linguistically valid word pronunciations from acoustic data
    7.
    发明授权
    Method and system for learning linguistically valid word pronunciations from acoustic data 有权
    从声学数据学习语言有效的单词发音的方法和系统

    公开(公告)号:US07266495B1

    公开(公告)日:2007-09-04

    申请号:US10661106

    申请日:2003-09-12

    IPC分类号: G10L15/06 G10L15/10

    CPC分类号: G10L15/06 G10L15/187

    摘要: A computerized pronunciation system is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The system includes a word list including at least one word; transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform; a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including: sets of initial pronunciations of the word, a scoring module configured score pronunciations and to generate phone probabilities, and a set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.

    摘要翻译: 提供了一种计算机化的发音系统,用于产生词的发音并将发音存储在发音词典中。 该系统包括包括至少一个单词的单词列表; 转录声学数据,包括用于该词的至少一个波形和与波形相关联的转录文本; 发音学习模块,被配置为接受单词列表和转录声学数据的输入,所述发音学习模块包括:该单词的初始发音集,评分模块配置得分发音并产生电话概率,以及一组 该单词的替代发音,其中该组交替发音包括最高得分的初始发音集合,其中替代最低概率电话的最高评分替代电话; 和发音词典,其配置为接收最高分的初始发音和一组交替发音。

    Distributed processing for video enhancement and display power management
    8.
    发明授权
    Distributed processing for video enhancement and display power management 有权
    用于视频增强和显示电源管理的分布式处理

    公开(公告)号:US07873229B2

    公开(公告)日:2011-01-18

    申请号:US11496191

    申请日:2006-07-31

    IPC分类号: G06K9/40

    摘要: In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to being used for such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes. In a network setting, the technique can be implemented in distributed fashion, so that subtasks of the technique are performed by different, interconnected processors such as server, client and proxy processors.

    摘要翻译: 在诸如具有背光照明的LCD装置的视觉显示装置中,背光通常消耗大部分装置电池电量。 为了以最小化的背光级显示给定的像素图案,可以通过从像素亮度统计确定的变换来维持图像质量,同时可以变换图案。 除了被用于这样的最小化之外,还可以使用变换也可以用于图像增强,使得显示的图像更好地满足视觉感知质量。 在任一种情况下,优选地约束变换以强制执行一个或多个显示属性。 在网络设置中,该技术可以分布式实现,使得该技术的子任务由不同的互连处理器(如服务器,客户端和代理处理器)执行。

    Distributed processing for video enhancement and display power management
    9.
    发明申请
    Distributed processing for video enhancement and display power management 有权
    用于视频增强和显示电源管理的分布式处理

    公开(公告)号:US20070183678A1

    公开(公告)日:2007-08-09

    申请号:US11496191

    申请日:2006-07-31

    IPC分类号: G06K9/40 G09G3/36

    摘要: In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to being used for such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes. In a network setting, the technique can be implemented in distributed fashion, so that subtasks of the technique are performed by different, interconnected processors such as server, client and proxy processors.

    摘要翻译: 在诸如具有背光照明的LCD装置的视觉显示装置中,背光通常消耗大部分装置电池电量。 为了以最小化的背光级显示给定的像素图案,可以通过从像素亮度统计确定的变换来维持图像质量,同时可以变换图案。 除了被用于这样的最小化之外,还可以使用变换也可以用于图像增强,使得显示的图像更好地满足视觉感知质量。 在任一种情况下,优选地约束变换以强制执行一个或多个显示属性。 在网络设置中,该技术可以分布式实现,使得该技术的子任务由不同的互连处理器(如服务器,客户端和代理处理器)执行。