Methods and apparatus for automatically synchronizing electronic audio files with electronic text files
    11.
    发明授权
    Methods and apparatus for automatically synchronizing electronic audio files with electronic text files 有权
    电子音频文件与电子文本文件自动同步的方法和装置

    公开(公告)号:US06260011B1

    公开(公告)日:2001-07-10

    申请号:US09531054

    申请日:2000-03-20

    IPC分类号: G10L1508

    摘要: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. A statistical language model is generated from the text data. A speech recognition operation is then performed on the audio data using the generated language model and a speaker independent acoustic model. Silence is modeled as a word which can be recognized. The speech recognition operation produces a time indexed set of recognized words some of which may be silence. The recognized words are globally aligned with the words in the text data. Recognized periods of silence, which correspond to expected periods of silence, and are adjoined by one or more correctly recognized words are identified as points where the text and audio files should be synchronized, e.g., by the insertion of bi-directional pointers. In one embodiment, for a text location to be identified for synchronization purposes, both words which bracket, e.g., precede and follow, the recognized silence must be correctly identified. Pointers, corresponding to identified locations of silence to be used for synchronization purposes are inserted into the text and/or audio files at the identified locations. Audio time stamps obtained from the speech recognition operation may be used as the bi-directional pointers. Synchronized text and audio data may be output in a variety of file formats.

    摘要翻译: 描述用于同步音频和文本数据的自动方法和装置,例如以电子文件的形式,表示相同作品或信息的音频和文本表达。 从文本数据生成统计语言模型。 然后使用生成的语言模型和与扬声器无关的声学模型对音频数据执行语音识别操作。 沉默被模仿为可以被认可的一个词。 语音识别操作产生识别字的时间索引集合,其中一些可能是静音。 识别的单词与文本数据中的单词全局对齐。 识别的静音期间,其对应于预期的沉默期,并且被一个或多个正确识别的字相邻,被识别为文本和音频文件应当被同步的点,例如通过插入双向指针。 在一个实施例中,对于要为同步目的被识别的文本位置,必须正确地识别包括例如先前和后面的两个单词。 对应于要用于同步目的的所确定的沉默位置的指针被插入到所识别位置的文本和/或音频文件中。 从语音识别操作获得的音频时间戳可以用作双向指针。 可以以各种文件格式输出同步的文本和音频数据。

    Message rendering for identification of content features
    12.
    发明授权
    Message rendering for identification of content features 有权
    消息渲染用于识别内容功能

    公开(公告)号:US08250159B2

    公开(公告)日:2012-08-21

    申请号:US12359126

    申请日:2009-01-23

    IPC分类号: G06F15/16 G06F12/16

    CPC分类号: G06Q10/107 H04L51/12

    摘要: Architecture for detecting and removing obfuscating clutter from the subject and/or body of a message, e.g., e-mail, prior to filtering of the message, to identify junk messages commonly referred to as spam. The technique utilizes the powerful features built into an HTML rendering engine to strip the HTML instructions for all non-substantive aspects of the message. Pre-processing includes pre-rendering of the message into a final format, which final format is that which is displayed by the rendering engine to the user. The final format message is then converted to a text-only format to remove graphics, color, non-text decoration, and spacing that cannot be rendered as ASCII-style or Unicode-style characters. The result is essentially to reduce each message to its common denominator essentials so that the junk mail filter can view each message on an equal basis.

    摘要翻译: 用于在过滤消息之前检测和去除来自主体和/或消息主体(例如电子邮件)的模糊杂波的体系结构,以识别通常被称为垃圾邮件的垃圾邮件。 该技术利用内置于HTML呈现引擎中的强大功能来剥离消息的所有非实质性方面的HTML指令。 预处理包括将消息预渲染成最终格式,最终格式是由呈现引擎向用户显示的最终格式。 最终格式化消息然后转换为纯文本格式以删除不能以ASCII样式或Unicode风格字符呈现的图形,颜色,非文本装饰和间距。 结果基本上是将每个消息减少到其公分要素,以便垃圾邮件过滤器可以在平等的基础上查看每个消息。

    Personal audio-video recorder for live meetings
    13.
    发明授权
    Personal audio-video recorder for live meetings 有权
    用于现场会议的个人录像机

    公开(公告)号:US08209181B2

    公开(公告)日:2012-06-26

    申请号:US11353382

    申请日:2006-02-14

    IPC分类号: G10L21/00

    CPC分类号: G09B19/00

    摘要: A unique recording system and method that facilitates recording live meetings, discussions or conversations whereby such recordings are available for immediate or near immediate playback is provided. As a result, a user who has momentarily become distracted or inattentive during the meeting can quickly re-listen to what was missed or misunderstood in order to readily catch up to the current discussion. The current discussion can continue to be recorded during playback of any previously recorded data. User behavior can be monitored to estimate when the user has started to become inattentive and likely segments or time points of the recordings can be suggested for playback. One or more portions of the recordings can be filtered or selected for playback so that any desired content can be eliminated or skipped in the playback version.

    摘要翻译: 提供了一种独特的记录系统和方法,其便于记录实时会议,讨论或对话,从而可以立即或即时重放这样的记录。 结果,在会议中暂时变得分心或不注意的用户可以快速重新听取被遗漏或被误解的内容,以便能够赶上当前的讨论。 在播放任何先前记录的数据期间,可以继续记录当前的讨论。 可以监视用户行为,以估计用户何时开始变得不注意,并且可能建议录制的段或时间点进行播放。 可以对记录的一个或多个部分进行过滤或选择以进行重放,以便可以在回放版本中消除或跳过任何期望的内容。

    Bayesian approach for learning regression decision graph models and regression models for time series analysis
    14.
    发明授权
    Bayesian approach for learning regression decision graph models and regression models for time series analysis 有权
    用于学习回归决策图模型的贝叶斯方法和时间序列分析的回归模型

    公开(公告)号:US07660705B1

    公开(公告)日:2010-02-09

    申请号:US10102116

    申请日:2002-03-19

    IPC分类号: G06F17/10

    CPC分类号: G06K9/6297

    摘要: Methods and systems are disclosed for learning a regression decision graph model using a Bayesian model selection approach. In a disclosed aspect, the model structure and/or model parameters can be learned using a greedy search algorithm applied to grow the model so long as the model improves. This approach enables construction of a decision graph having a model structure that includes a plurality of leaves, at least one of which includes a non-trivial linear regression. The resulting model thus can be employed for forecasting, such as for time series data, which can include single or multi-step forecasting.

    摘要翻译: 公开了使用贝叶斯模型选择方法学习回归决策图模型的方法和系统。 在公开的方面,只要模型改进,可以使用应用于增长模型的贪心搜索算法来学习模型结构和/或模型参数。 该方法能够构建具有包括多个叶子的模型结构的决策图,其中至少一个包括非平凡的线性回归。 因此,所得到的模型可以用于预测,例如用于时间序列数据,其可以包括单步或多步预测。

    Feedback loop for spam prevention
    15.
    发明授权
    Feedback loop for spam prevention 有权
    防止垃圾邮件的反馈回路

    公开(公告)号:US07558832B2

    公开(公告)日:2009-07-07

    申请号:US11743466

    申请日:2007-05-02

    IPC分类号: G06F15/16

    CPC分类号: H04L51/12 G06Q10/107

    摘要: The subject invention provides for a feedback loop system and method that facilitate classifying items in connection with spam prevention in server and/or client-based architectures. The invention makes uses of a machine-learning approach as applied to spam filters, and in particular, randomly samples incoming email messages so that examples of both legitimate and junk/spam mail are obtained to generate sets of training data. Users which are identified as spam-fighters are asked to vote on whether a selection of their incoming email messages is individually either legitimate mail or junk mail. A database stores the properties for each mail and voting transaction such as user information, message properties and content summary, and polling results for each message to generate training data for machine learning systems. The machine learning systems facilitate creating improved spam filter(s) that are trained to recognize both legitimate mail and spam mail and to distinguish between them.

    摘要翻译: 本发明提供了一种反馈循环系统和方法,其有助于在服务器和/或基于客户端的体系结构中与垃圾邮件防止相关联的项目进行分类。 本发明将机器学习方法应用于垃圾邮件过滤器,特别是随机抽取传入的电子邮件消息,以便获得合法和垃圾/垃圾邮件的示例以生成训练数据集。 被要求被识别为垃圾邮件战士的用户被要求投票选择他们的收到的电子邮件的选择是单独的合法邮件还是垃圾邮件。 数据库存储每个邮件和投票交易的属性,例如用户信息,消息属性和内容摘要,以及每个消息的轮询结果,以生成机器学习系统的训练数据。 机器学习系统便于创建改进的垃圾邮件过滤器,该过滤器被训练以识别合法邮件和垃圾邮件并区分它们。

    Feedback loop for spam prevention
    16.
    发明授权
    Feedback loop for spam prevention 有权
    防止垃圾邮件的反馈回路

    公开(公告)号:US07219148B2

    公开(公告)日:2007-05-15

    申请号:US10378463

    申请日:2003-03-03

    IPC分类号: G06F15/173

    CPC分类号: H04L51/12 G06Q10/107

    摘要: The subject invention provides for a feedback loop system and method that facilitate classifying items in connection with spam prevention in server and/or client-based architectures. The invention makes uses of a machine-learning approach as applied to spam filters, and in particular, randomly samples incoming email messages so that examples of both legitimate and junk/spam mail are obtained to generate sets of training data. Users which are identified as spam-fighters are asked to vote on whether a selection of their incoming email messages is individually either legitimate mail or junk mail. A database stores the properties for each mail and voting transaction such as user information, message properties and content summary, and polling results for each message to generate training data for machine learning systems. The machine learning systems facilitate creating improved spam filter(s) that are trained to recognize both legitimate mail and spam mail and to distinguish between them.

    摘要翻译: 本发明提供了一种反馈循环系统和方法,其有助于在服务器和/或基于客户端的体系结构中与垃圾邮件防止相关联的项目进行分类。 本发明将机器学习方法应用于垃圾邮件过滤器,特别是随机抽取传入的电子邮件消息,以便获得合法和垃圾/垃圾邮件的示例以生成训练数据集。 被要求被识别为垃圾邮件战士的用户被要求投票选择他们的收到的电子邮件的选择是单独的合法邮件还是垃圾邮件。 数据库存储每个邮件和投票交易的属性,例如用户信息,消息属性和内容摘要,以及每个消息的轮询结果,以生成机器学习系统的训练数据。 机器学习系统便于创建改进的垃圾邮件过滤器,该过滤器被训练以识别合法邮件和垃圾邮件并区分它们。

    Visualization of high-dimensional data
    17.
    发明授权
    Visualization of high-dimensional data 有权
    高维数据的可视化

    公开(公告)号:US06519599B1

    公开(公告)日:2003-02-11

    申请号:US09517138

    申请日:2000-03-02

    IPC分类号: G06F1730

    摘要: Visualization of high-dimensional data sets is disclosed, particularly the display of a network model for a data set. The network, such as a dependency or a Bayesian network, has a number of nodes having dependencies thereamong. The network can be displayed items and connections, corresponding to nodes and dependencies, respectively. Selection of a particular item in one embodiment results in the display of the local distribution associated with the node for the item. In one embodiment, only a predetermined number of the items are shown, such as only the items representing the most popular nodes. Furthermore, in one embodiment, in response to receiving a user input, a sub-set of the connections is displayed, proportional to the user input. In another embodiment, a particular item is displayed in an emphasized manner, and the particular connections representing dependencies including the node represented by the particular item, as well as the items representing nodes also in these dependencies, are also displayed in the emphasized manner. Furthermore, in one embodiment, only an indicated sub-set of the items is displayed.

    摘要翻译: 公开了高维数据集的可视化,特别是显示数据集的网络模型。 诸如依赖关系或贝叶斯网络的网络具有多个具有依赖关系的节点。 网络可以分别显示对应于节点和依赖关系的项目和连接。 在一个实施例中,特定项目的选择导致与项目的节点相关联的本地分布的显示。 在一个实施例中,仅显示预定数量的项目,诸如仅表示最受欢迎节点的项目。 此外,在一个实施例中,响应于接收到用户输入,显示与用户输入成比例的连接的子集。 在另一个实施例中,以强调方式显示特定项目,并且还以强调的方式显示表示依赖性的特定连接,包括由特定项目表示的节点以及表示节点的项目也在这些依赖关系中。 此外,在一个实施例中,仅显示所指示的项目子集。

    Data visualization using association networks
    18.
    发明授权
    Data visualization using association networks 有权
    使用关联网络的数据可视化

    公开(公告)号:US08605089B1

    公开(公告)日:2013-12-10

    申请号:US09781727

    申请日:2001-02-12

    IPC分类号: G06T11/20 G06T11/40 G09G5/20

    摘要: A system and method are employed to construct an association network to visualize relationships between variables of a data set. The relationships characterized by the association network may include symmetric or asymmetric measures of association between variables learned from the data. The association network includes nodes, which represent variables, and edges, which represent associations between variables. As a result, the association network helps a user to visualize useful information from data according to the determined measure of association.

    摘要翻译: 采用系统和方法来构建关联网络以可视化数据集的变量之间的关系。 由关联网络表征的关系可以包括从数据中学习的变量之间的关联的对称或非对称度量。 关联网络包括表示变量的节点和表示变量之间的关联的边。 结果,关联网络帮助用户根据确定的关联度量可视化数据中的有用信息。

    Diary-free calorimeter
    19.
    发明授权
    Diary-free calorimeter 有权
    无日记热量计

    公开(公告)号:US08182424B2

    公开(公告)日:2012-05-22

    申请号:US12051431

    申请日:2008-03-19

    IPC分类号: A61B5/00

    摘要: An indirect calorimeter estimates nutritional caloric intake by periodically monitoring weight and sensing physical exercise (i.e., physiological data and/or motion data related to physical exertion), which can then be used in a calorimetry model derived from regression analysis of a population (e.g., linear regression, feed-forward neural network, Gaussian process, boosted regression tree, etc.). A strap-on user device for tracking exercise can detect one or more of heart rate, body temperature, skin resistance, motion/acceleration sensing (e.g., pedometer, accelerometer), velocity sensing (e.g., global positioning system (GPS)), and an intelligent, integrated exercise machine (e.g., treadmill, exercise bike, etc.). To gain further fidelity, the user can fine-tune the estimate by undergoing a journal-based routine for a relatively short period of time or clinical calorimetry measurement (e.g., respiratory calorimeter), thereby providing a baseline for resting or exercising metabolic rate.

    摘要翻译: 间接热量计通过定期监测体重和感测身体运动(即与身体运动相关的生理数据和/或运动数据)来估计营养热量摄入量,然后可用于从人口回归分析得出的量热法模型(例如, 线性回归,前馈神经网络,高斯过程,提升回归树等)。 用于跟踪运动的绑带用户设备可以检测心率,体温,皮肤电阻,运动/加速度感测(例如,计步器,加速度计),速度感测(例如,全球定位系统(GPS))中的一个或多个,以及 一个智能的综合运动器材(如跑步机,运动自行车等)。 为了获得进一步的保真度,用户可以通过在相对短的时间段内进行基于日志的例程或临床量热测量(例如,呼吸量热计)来微调估计,由此提供用于休息或行使代谢率的基线。

    Identifying associations using graphical models
    20.
    发明授权
    Identifying associations using graphical models 有权
    使用图形模型识别关联

    公开(公告)号:US08050870B2

    公开(公告)日:2011-11-01

    申请号:US11622895

    申请日:2007-01-12

    IPC分类号: G01N33/48 G06F19/14 G06F19/00

    CPC分类号: G06F19/14

    摘要: Statistical models for identifying associations are described herein. By way of example, a system for identifying associations between variables can include a model builder and an association identifier. The model builder can receive observations about the variables and generate a null model and a non-null model. The association identifier can assess the strength of the association between the variables by determining how much the non-null model better explains the observed data than the null model. Additionally or alternatively, the structure of the observed data can be inferred simultaneously with the statistical model.

    摘要翻译: 本文描述了用于识别关联的统计模型。 作为示例,用于识别变量之间的关联的系统可以包括模型构建器和关联标识符。 模型构建器可以接收关于变量的观察结果,并生成空模型和非空模型。 关联标识符可以通过确定非空模型比空模型更好地解释观察到的数据,来评估变量之间关联的强度。 另外或替代地,观察数据的结构可以与统计模型同时推断。