Automatic organization of documents through email clustering
    11.
    发明授权
    Automatic organization of documents through email clustering 有权
    通过电子邮件聚类自动组织文档

    公开(公告)号:US07765212B2

    公开(公告)日:2010-07-27

    申请号:US11321963

    申请日:2005-12-29

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06Q10/107 H04L51/00

    摘要: A system that facilitates organization of emails comprises a clustering component that clusters a plurality of emails and creates topics for emails by assigning key phrases extracted from emails within one or more clusters. An organization component then utilizes the key phrases to organize documents. Furthermore, the organization component can comprise a probability component that determines a probability that a document belongs to a certain topic.

    摘要翻译: 促进电子邮件组织的系统包括:聚类组件,其聚集多个电子邮件,并通过分配从一个或多个集群内的电子邮件中提取的关键短语为电子邮件创建主题。 组织组件然后利用关键短语组织文档。 此外,组织组件可以包括确定文档属于某个主题的概率的概率组件。

    DEVELOPING IMPLICIT METADATA FOR DATA STORES
    13.
    发明申请
    DEVELOPING IMPLICIT METADATA FOR DATA STORES 审中-公开
    为数据存储开发隐含元数据

    公开(公告)号:US20130275434A1

    公开(公告)日:2013-10-17

    申请号:US13444482

    申请日:2012-04-11

    IPC分类号: G06F17/30

    摘要: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.

    摘要翻译: 系统通过后续使用数据存储,可以从数据存储的创建和生成开始收集关于数据存储的元数据。 该元数据可以包括与数据存储相关的关键字和数据存储中出现的数据。 因此,关键字和其他元数据可以在没有所有者/创建者干预的情况下生成,具有足够的语义意义,使得与数据存储相关联的发现过程更容易和高效。 对数据存储的使用或通信进行监控,并从使用或通信中提取关键字。 然后将关键字写入与数据存储的元数据相关联。 在搜索期间,元数据中的关键字可用于尝试匹配搜索者输入的查询词。

    Learning Discriminative Projections for Text Similarity Measures
    15.
    发明申请
    Learning Discriminative Projections for Text Similarity Measures 审中-公开
    用于文本相似度量度的学习判别预测

    公开(公告)号:US20120323968A1

    公开(公告)日:2012-12-20

    申请号:US13160485

    申请日:2011-06-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/31

    摘要: A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.

    摘要翻译: 公开了将文本对象的原始文本表示映射到向量空间的模型。 定义了一个功能,用于计算给定两个输出向量的相似度得分。 定义了一种损失函数,用于计算基于相似度得分和向量对的标签的误差。 调整模型的参数以最小化损失函数。 两个向量的标签表示对象的相似度。 标签可以是二进制数字或实数值。 用于计算相似性分数的函数可以是余弦,Jaccard或可微分函数。 损失函数可以将向量对与其标签进行比较。 输出向量的每个元素是输入向量的项的线性或非线性函数。 文本对象可以是不同类型的文档,并且可以同时训练两个不同的模型。

    Signal detection using multiple detectors
    16.
    发明授权
    Signal detection using multiple detectors 有权
    使用多个探测器进行信号检测

    公开(公告)号:US08103011B2

    公开(公告)日:2012-01-24

    申请号:US11669549

    申请日:2007-01-31

    CPC分类号: H04M19/04 H04B3/234

    摘要: Signal detectors are described herein. By way of example, a system for detecting signals can include a microphone signal detector, a loudspeaker signal detector, a signal discriminator and a decision component. When the microphone signal detector detects the presence of a microphone signal, the loudspeaker signal detector detects the presence of a loudspeaker signal and the signal discriminator determines that near-end speech dominates loudspeaker echo, the decision component can confirm the presence of doubletalk. When the microphone signal detector detects the presence of a microphone signal and the signal discriminator determines that near-end speech dominates loudspeaker echo, the decision component confirms the presence of near-end signal.

    摘要翻译: 这里描述了信号检测器。 作为示例,用于检测信号的系统可以包括麦克风信号检测器,扬声器信号检测器,信号鉴别器和决定部件。 当麦克风信号检测器检测到麦克风信号的存在时,扬声器信号检测器检测到扬声器信号的存在,并且信号鉴别器确定近端语音主导扬声器回波,判定部件可以确认双音节的存在。 当麦克风信号检测器检测到麦克风信号的存在并且信号鉴别器确定近端语音主导扬声器回波时,决定部件确认近端信号的存在。

    Method of classifying and active learning that ranks entries based on multiple scores, presents entries to human analysts, and detects and/or prevents malicious behavior
    18.
    发明授权
    Method of classifying and active learning that ranks entries based on multiple scores, presents entries to human analysts, and detects and/or prevents malicious behavior 有权
    基于多个分数对条目进行分类和主动学习的方法,向人类分析人员提供条目,并检测和/或防止恶意行为

    公开(公告)号:US07941382B2

    公开(公告)日:2011-05-10

    申请号:US11871587

    申请日:2007-10-12

    IPC分类号: G06E1/00

    CPC分类号: G06F15/16

    摘要: A malicious behavior detection/prevention system, such as an intrusion detection system, is provided that uses active learning to classify entries into multiple classes. A single entry can correspond to either the occurrence of one or more events or the non-occurrence of one or more events. During a training phase, entries are automatically classified into one of multiple classes. After classifying the entry, a generated model for the determined class is utilized to determine how well an entry corresponds to the model. Ambiguous classifications along with entries that do not fit the model well for the determined class are selected for labeling by a human analyst. The selected entries are presented to a human analyst for labeling. These labels are used to further train the classifier and the models. During an evaluation phase, entries are automatically classified using the trained classifier and a policy associated with determined class is applied.

    摘要翻译: 提供了一种恶意行为检测/预防系统,例如入侵检测系统,其使用主动学习将条目分类到多个类中。 单个条目可以对应于一个或多个事件的发生或一个或多个事件的不发生。 在训练阶段,条目自动分为多个类别之一。 在对条目进行分类之后,使用所确定的类的生成模型来确定条目对应于模型的良好程度。 选择不确定的分类以及不符合确定类别的模型的条目,由人类分析师进行标签。 选定的条目提交给人类分析人员进行标签。 这些标签用于进一步训练分类器和型号。 在评估阶段,使用训练有素的分类器对条目进行自动分类,并应用与确定类相关联的策略。

    SYSTEM AND PROCESS FOR REGRESSION-BASED RESIDUAL ACOUSTIC ECHO SUPPRESSION
    19.
    发明申请
    SYSTEM AND PROCESS FOR REGRESSION-BASED RESIDUAL ACOUSTIC ECHO SUPPRESSION 有权
    基于回归的残留声学抑制的系统和过程

    公开(公告)号:US20110013781A1

    公开(公告)日:2011-01-20

    申请号:US12890075

    申请日:2010-09-24

    IPC分类号: H04B3/20

    CPC分类号: H04M9/082

    摘要: A regression-based residual echo suppression (RES) system and process for suppressing the portion of the microphone signal corresponding to a playback of a speaker audio signal that was not suppressed by an acoustic echo canceller (AEC). In general, a prescribed regression technique is used between a prescribed spectral attribute of multiple past and present, fixed-length, periods (e.g., frames) of the speaker signal and the same spectral attribute of a current period (e.g., frame) of the echo residual in the output of the AEC. This automatically takes into consideration the correlation between the time periods of the speaker signal. The parameters of the regression can be easily tracked using adaptive methods. Multiple applications of RES can be used to produce better results and this system and process can be applied to stereo-RES as well.

    摘要翻译: 基于回归的残差回波抑制(RES)系统和用于抑制对应于未被声学回声消除器(AEC)抑制的扬声器音频信号的重放的麦克风信号的部分的处理。 通常,在多个过去和现在,固定长度的扬声器信号的周期(例如,帧)和当前周期(例如,帧)的相同频谱属性之间使用规定的回归技术 AEC输出中的回波残差。 这自动考虑了扬声器信号的时间段之间的相关性。 可以使用自适应方法轻松跟踪回归的参数。 RES的多个应用可以用于产生更好的结果,并且该系统和过程也可以应用于立体声RES。

    Strategies for identifying anomalies in time-series data
    20.
    发明授权
    Strategies for identifying anomalies in time-series data 失效
    确定时间序列数据异常的策略

    公开(公告)号:US07716011B2

    公开(公告)日:2010-05-11

    申请号:US11680590

    申请日:2007-02-28

    IPC分类号: G06F17/18

    CPC分类号: G06K9/00536 H04L63/1425

    摘要: A strategy is described for identifying anomalies in time-series data. The strategy involves dividing the time-series data into a plurality of collected data segments and then using a modeling technique to fit local models to the collected data segments. Large deviations of the time-series data from the local models are indicative of anomalies. In one approach, the modeling technique can use an absolute value (L1) measure of error value for all of the collected data segments. In another approach, the modeling technique can use the L1 measure for only those portions of the time-series data that are projected to be anomalous. The modeling technique can use a squared-term (L2) measure of error value for normal portions of the time-series data. In another approach, the modeling technique can use an iterative expectation-maximization strategy in applying the L1 and L2 measures.

    摘要翻译: 描述了一种用于识别时间序列数据异常的策略。 该策略涉及将时间序列数据划分为多个收集的数据段,然后使用建模技术将本地模型拟合到收集的数据段。 来自本地模型的时间序列数据的大偏差表示异常。 在一种方法中,建模技术可以使用所有收集的数据段的误差值的绝对值(L1)度量。 在另一种方法中,建模技术可以使用L1测量仅仅是被预计为异常的时间序列数据的那些部分。 建模技术可以对时间序列数据的正常部分使用误差值的平方(L2)度量。 在另一种方法中,建模技术可以使用迭代期望最大化策略来应用L1和L2度量。