SYSTEM AND METHOD FOR CLASSIFYING DATA STREAMS WITH VERY LARGE CARDINALITY
    1.
    发明申请
    SYSTEM AND METHOD FOR CLASSIFYING DATA STREAMS WITH VERY LARGE CARDINALITY 有权
    用非常大的卡片分类数据流的系统和方法

    公开(公告)号:US20090281971A1

    公开(公告)日:2009-11-12

    申请号:US12118405

    申请日:2008-05-09

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06K9/6267

    摘要: Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added. The sketch table with the largest overall sum is identified, and the class associated with that sketch table is assigned to the object to which the attribute patterns belong.

    摘要翻译: 提供了对象分类的系统和方法。 一个对象与描述该对象的属性一起被识别。 这些属性分为属性模式。 也可以在分类中使用的类别。 对于每个识别的类,使用已知分类的已知对象来创建并训练包含多个并行哈希表的草图表。 对于要分类的对象,使用每个草图表的所有散列函数处理每个属性模式。 这导致单个属性模式下每个草图下的多个值。 为每个草图表选择最低值。 针对每个属性模式评估所有草图表中的值分布。 这为每个属性模式产生歧视力。 选择具有高于给定阈值的辨别力的那些属性模式。 添加所选的属性模式和关联的草图表值。 识别具有最大总和的草图,并将与该草图表相关联的类分配给属性模式所属的对象。

    Content based method for product-peer filtering
    2.
    发明授权
    Content based method for product-peer filtering 有权
    基于内容的产品 - 对等过滤方法

    公开(公告)号:US06356879B2

    公开(公告)日:2002-03-12

    申请号:US09169029

    申请日:1998-10-09

    IPC分类号: G06F1760

    摘要: The present invention derives product characterizations for products offered at an e-commerce site based on the text descriptions of the products provided at the site. A customer characterization is generated for any customer browsing the e-commerce site. The characterizations include an aggregation of derived product characterizations associated with products bought and/or browsed by that customer. A peer group is formed by clustering customers having similar customer characterizations. Recommendations are then made to a customer based on the processed characterization and peer group data.

    摘要翻译: 本发明基于在现场提供的产品的文本描述,得出在电子商务站点提供的产品的产品特性。 为浏览电子商务网站的任何客户生成客户表征。 表征包括与由该客户购买和/或浏览的产品相关联的衍生产品表征的聚合。 通过对具有类似客户特征的客户进行聚类形成对等组。 然后根据经处理的特征和对等体组数据向客户提供建议。

    System and Method for Classifying Data Streams with Very Large Cardinality
    3.
    发明申请
    System and Method for Classifying Data Streams with Very Large Cardinality 失效
    用于分类具有非常大的基数的数据流的系统和方法

    公开(公告)号:US20120166382A1

    公开(公告)日:2012-06-28

    申请号:US13400863

    申请日:2012-02-21

    IPC分类号: G06N5/02

    CPC分类号: G06N99/005 G06K9/6267

    摘要: An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.

    摘要翻译: 识别描述该对象的对象和属性。 这些属性被分组成属性模式,并且识别分类类。 对于每个识别的类,创建包含多个并行哈希表的草图表。 对于要分类的对象,使用每个草图表的所有散列函数处理每个属性模式,从而在单个属性模式的每个草图表下产生多个值。 为每个草图表选择最低值。 对每个属性模式评估所有草图表中的值的分布,为每个属性模式产生歧视性的权力。 选择具有高于给定阈值的辨别力的属性模式并将其添加到关联的草图表值。 识别具有最大总和的草图表,并将关联的类分配给属于属性模式的对象。

    Systems and methods for condensation-based privacy in strings
    4.
    发明授权
    Systems and methods for condensation-based privacy in strings 失效
    字符串中基于冷凝的隐私的系统和方法

    公开(公告)号:US08010541B2

    公开(公告)日:2011-08-30

    申请号:US11540406

    申请日:2006-09-30

    IPC分类号: G06F17/30

    CPC分类号: G06F21/6245

    摘要: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.

    摘要翻译: 使用简单的基于模板的模型,用于隐私保护字符串数据挖掘的新方法和系统。 这种基于模板的模型在实践中是有效的,并且保持字符串的重要统计特征,例如记录内距离。 这里讨论的是字符串数据的匿名化的缩合模型。 针对字符串组创建摘要统计信息,并使用这些统计信息来生成伪字符串。 可以看出,一组新的字符串的聚合行为保持关键特征,例如组合,字符串间距离的顺序以及诸如分类的数据挖掘算法的准确性。 字符串间距离的保留是许多字符串和生物应用中的关键目标,这些应用程序深深地依赖于这种距离的计算,而可以显示诸如分类的应用的准确性不受匿名过程的影响。

    Semantic based collaborative filtering
    5.
    发明授权
    Semantic based collaborative filtering 有权
    基于语义的协同过滤

    公开(公告)号:US06487539B1

    公开(公告)日:2002-11-26

    申请号:US09369741

    申请日:1999-08-06

    IPC分类号: G06F1760

    摘要: A method for providing product recommendations to customers in an e-commerce environment includes the step of generating content and compatibility representations of products corresponding to a plurality of customers. A similarity function is calculated between pairs of content attributes corresponding to the products. A similarity function is calculated between pairs of compatibility attributes corresponding to the products. The plurality of customers are clustered into a plurality of peer groups. For a given customer, a closest peer group of the plurality of peer groups is determined. At least one potential recommendation is then generated for the given customer based on the closest peer group.

    摘要翻译: 在电子商务环境中向客户提供产品推荐的方法包括产生与多个客户对应的产品的内容和兼容性表示的步骤。 在对应于产品的内容属性对之间计算相似度函数。 在对应于产品的兼容性属性对之间计算相似度函数。 多个客户被聚集成多个对等体组。 对于给定的客户,确定多个对等组中最接近的对等组。 然后,基于最接近的对等组,为给定的客户生成至少一个潜在的建议。

    System and method for resource adaptive classification of data streams
    6.
    发明授权
    System and method for resource adaptive classification of data streams 有权
    数据流资源自适应分类的系统和方法

    公开(公告)号:US08165979B2

    公开(公告)日:2012-04-24

    申请号:US13078419

    申请日:2011-04-01

    IPC分类号: G06N5/00

    CPC分类号: G06K9/6282 G06N99/005

    摘要: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.

    摘要翻译: 一种用于数据流资源自适应分类的系统和方法。 系统和方法的实施例提供在计算机中接收的分类数据,包括离散接收的数据,从所接收的数据构建中间数据结构作为训练实例,对所接收的数据进行子空间采样作为测试实例,并基于 所述子空间抽样统计。

    Systems and methods for condensation-based privacy in strings
    7.
    发明申请
    Systems and methods for condensation-based privacy in strings 失效
    字符串中基于冷凝的隐私的系统和方法

    公开(公告)号:US20080082566A1

    公开(公告)日:2008-04-03

    申请号:US11540406

    申请日:2006-09-30

    IPC分类号: G06F7/00

    CPC分类号: G06F21/6245

    摘要: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.

    摘要翻译: 使用简单的基于模板的模型,用于隐私保护字符串数据挖掘的新方法和系统。 这种基于模板的模型在实践中是有效的,并且保持字符串的重要统计特征,例如记录内距离。 这里讨论的是字符串数据的匿名化的缩合模型。 针对字符串组创建摘要统计信息,并使用这些统计信息来生成伪字符串。 可以看出,一组新的字符串的聚合行为保持关键特征,例如组合,字符串间距离的顺序以及诸如分类的数据挖掘算法的准确性。 字符串间距离的保留是许多字符串和生物应用中的关键目标,这些应用程序深深地依赖于这种距离的计算,而可以显示诸如分类的应用的准确性不受匿名过程的影响。

    Method for optimizing profits in electronic delivery of digital objects
    8.
    发明授权
    Method for optimizing profits in electronic delivery of digital objects 失效
    优化数字物体电子交付利润的方法

    公开(公告)号:US06631413B1

    公开(公告)日:2003-10-07

    申请号:US09239008

    申请日:1999-01-28

    IPC分类号: G06F1300

    CPC分类号: G06Q10/08

    摘要: In accordance with the present invention, a method for selecting a channel and delivery time for digital objects for a broadcast delivery service including multiple channels of varying bandwidths includes the steps of selecting digital objects to be sent over the multiple channels, generating a schedule and pricing for the digital objects based on the digital object selected and existing delivery commitments and manipulating the schedule and pricing to provide a profitable delivery of the digital objects. A system is also included.

    摘要翻译: 根据本发明,一种用于为包括多个变化带宽的多个信道的广播传送业务的数字对象选择信道和传送时间的方法包括以下步骤:选择要在多个信道上发送的数字对象,生成调度和定价 用于基于选定的数字对象和现有交付承诺的数字对象,并操纵计划和定价以提供数字对象的有利可图的交付。 还包括一个系统。

    Systems and methods for metadata embedding in streaming medical data
    10.
    发明授权
    Systems and methods for metadata embedding in streaming medical data 有权
    用于元数据嵌入流式医疗数据的系统和方法

    公开(公告)号:US08229191B2

    公开(公告)日:2012-07-24

    申请号:US12042961

    申请日:2008-03-05

    IPC分类号: G06K9/00 G06K9/36

    摘要: Systems and methods for embedding metadata such as personal patient information within actual medical data signals obtained from a patient are provided wherein two watermarks, a robust watermark and a fragile watermark are embedded in a given medical data signal. The robust watermark includes a binary coded representation of the metadata that is incorporated into the frequency domain of the medical data signal using discrete Fourier transformations and additive embedding. Error correcting code can also be added to the binary representation of the metadata using Hamming coding. A given robust watermark can be incorporated multiple times in the medical data signal. The fragile watermark is added on top of the modified medical signal containing the robust watermark in the spatial domain of the modified medical signal. The fragile watermark utilizes hash function to generate random sequences that are incorporated through the medical data signal.

    摘要翻译: 提供了用于将诸如个人患者信息之类的元数据嵌入到从患者获得的实际医疗数据信号中的系统和方法,其中在给定医疗数据信号中嵌入两个水印,鲁棒水印和脆弱水印。 鲁棒水印包括使用离散傅里叶变换和附加嵌入结合到医疗数据信号的频域中的元数据的二进制编码表示。 错误纠正码也可以使用汉明编码加到元数据的二进制表示中。 给定的鲁棒水印可以被并入多次在医疗数据信号中。 在修改后的医疗信号的空间域中包含鲁棒水印的经修改的医学信号之上添加脆弱水印。 脆弱水印利用散列函数产生通过医疗数据信号并入的随机序列。