System and method of feature selection for text classification using subspace sampling
    1.
    发明授权
    System and method of feature selection for text classification using subspace sampling 有权
    使用子空间采样的文本分类的特征选择的系统和方法

    公开(公告)号:US08046317B2

    公开(公告)日:2011-10-25

    申请号:US12006178

    申请日:2007-12-31

    IPC分类号: G06N5/00

    摘要: An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于使用子空间采样进行文本分类的特征选择。 可以提供文本分类器生成器,用于使用来自训练数据语料库的子空间采样来选择一小组特征,以训练文本分类器以使用用于分类文本的小的特征集合。 为了选择一小组特征,可以根据训练数据语料库的特征的子空间根据特征集合上的概率分布来随机抽样,其中概率可以分配给与 表示训练文本语料库的特征矩阵的左奇异矢量行的欧几里得规范。 一小部分功能可以仅使用相当的特征来分类文本,这些功能包含大量的训练特征。

    System and method of feature selection for text classification using subspace sampling
    2.
    发明申请
    System and method of feature selection for text classification using subspace sampling 有权
    使用子空间采样的文本分类的特征选择的系统和方法

    公开(公告)号:US20090171870A1

    公开(公告)日:2009-07-02

    申请号:US12006178

    申请日:2007-12-31

    IPC分类号: G06F15/18

    摘要: An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于使用子空间采样进行文本分类的特征选择。 可以提供文本分类器生成器,用于使用来自训练数据语料库的子空间采样来选择一小组特征,以训练文本分类器以使用用于分类文本的小的特征集合。 为了选择一小组特征,可以根据训练数据语料库的特征的子空间根据特征集合上的概率分布来随机抽样,其中概率可以分配给与 表示训练文本语料库的特征矩阵的左奇异矢量行的欧几里得规范。 一小部分功能可以仅使用相当的特征来分类文本,这些功能包含大量的训练特征。

    ALGORITHM FOR STORYBOARDING IN DISPLAY ADVERTISING
    3.
    发明申请
    ALGORITHM FOR STORYBOARDING IN DISPLAY ADVERTISING 审中-公开
    陈列广告中的故障处理算法

    公开(公告)号:US20100063881A1

    公开(公告)日:2010-03-11

    申请号:US12205809

    申请日:2008-09-05

    IPC分类号: G06Q30/00

    摘要: Methods and system for optimally allocating ad space to advertisers on a webpage viewed by a user in a single browsing session includes identifying a plurality of advertisement stories that match the content of the webpage. An advertisement pool is generated using the identified ad stories. Each ad story in the advertisement pool includes one or more advertisement pages and is associated with corresponding ad value. An ad story from the pool of ad stories is chosen by dynamically evaluating ad value associated with each ad story in the pool based on continued surfing by the user such that the identified ad story provides the maximum ad value when rendered on the webpage. The identified ad story is scheduled for rendering on the webpage while providing relevant ad content at the webpage.

    摘要翻译: 在用户在单个浏览会话中查看的网页上将广告空间最佳地分配给广告商的方法和系统包括识别与网页的内容匹配的多个广告故事。 使用识别的广告故事生成广告池。 广告池中的每个广告故事都包含一个或多个广告页面,并与相应的广告值相关联。 通过根据用户继续浏览动态评估与池中每个广告故事相关联的广告值,从而选择广告故事池中的广告故事,以便识别的广告故事在网页上呈现时提供最大的广告价值。 识别的广告故事预定在网页上呈现,同时在网页上提供相关的广告内容。

    METHOD AND SYSTEM FOR FAST SIMILARITY COMPUTATION IN HIGH DIMENSIONAL SPACE
    5.
    发明申请
    METHOD AND SYSTEM FOR FAST SIMILARITY COMPUTATION IN HIGH DIMENSIONAL SPACE 有权
    用于在高维空间中快速相似计算的方法和系统

    公开(公告)号:US20130031059A1

    公开(公告)日:2013-01-31

    申请号:US13189696

    申请日:2011-07-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30628

    摘要: Method, system, and programs for computing similarity. Input data is first received from one or more data sources and then analyzed to obtain an input feature vector that characterizes the input data. An index is then generated based on the input feature vector and is used to archive the input data, where the value of the index is computed based on an improved Johnson-Lindenstrass transformation (FJLT) process. With the improved FJLT process, first, the sign of each feature in the input feature vector is randomly flipped to obtain a flipped vector. A Hadamard transformation is then applied to the flipped vector to obtain a transformed vector. An inner product between the transformed vector and a sparse vector is then computed to obtain a base vector, based on which the value of the index is determined.

    摘要翻译: 用于计算相似度的方法,系统和程序。 首先从一个或多个数据源接收输入数据,然后分析以获得表征输入数据的输入特征向量。 然后基于输入特征向量生成索引,并且用于存档输入数据,其中基于改进的约翰逊 - 林登斯特拉斯变换(FJLT)处理来计算索引的值。 随着改进的FJLT过程,首先,输入特征向量中的每个特征的符号被随机翻转以获得翻转矢量。 然后将Hadamard变换应用于翻转矢量以获得变换矢量。 然后计算变换向量和稀疏向量之间的内积,以获得基准向量,基于此确定索引的值。

    PLAYFUL INCENTIVE FOR LABELING CONTENT
    6.
    发明申请
    PLAYFUL INCENTIVE FOR LABELING CONTENT 审中-公开
    有趣的激励标签内容

    公开(公告)号:US20090327168A1

    公开(公告)日:2009-12-31

    申请号:US12147342

    申请日:2008-06-26

    IPC分类号: G06F3/048

    CPC分类号: H04L51/12

    摘要: Embodiments are directed towards employing a playful incentive to encourage users to provide feedback that is useable to train a classifier. The classifier being associated with any of a variety of different settings, including but not limited to classifying: messages as ham/spam, images, advertising, bookmarking, music, videos, photographs, shopping, or the like. An animated image, such as a pet, provides an interface to the classifier that encourages and responds to user feedback. Users may share their classifiers or aspects thereof with other users to enable a community of knowledge to be applied to a classification task, while preserving privacy of the user feedback. One form of sharing may be within the context of a competitive game. Various evaluations may be performed on a classifier to indicate user feedback consistency, or quality. Classifiers may also be used to provide users with advertisements, products, or services based on the user's feedback.

    摘要翻译: 实施例旨在采用有趣的激励来鼓励用户提供可用于训练分类器的反馈。 分类器与各种不同的设置相关联,包括但不限于分类:消息作为火腿/垃圾邮件,图像,广告,书签,音乐,视频,照片,购物等。 动画图像(如宠物)为分类器提供了一个界面,鼓励和响应用户反馈。 用户可以与其他用户共享他们的分类器或其方面,以使知识社区能够应用于分类任务,同时保持用户反馈的隐私。 一种共享的形式可能在竞争性游戏的背景下。 可以在分类器上执行各种评估,以指示用户反馈一致性或质量。 分类器也可以用于根据用户的反馈向用户提供广告,产品或服务。

    User trustworthiness
    7.
    发明授权
    User trustworthiness 有权
    用户可信赖性

    公开(公告)号:US09519682B1

    公开(公告)日:2016-12-13

    申请号:US13117037

    申请日:2011-05-26

    摘要: Embodiments are directed towards generating a unified user account trustworthiness system through user account trustworthiness scores. A trusted group of user accounts may be identified for a given action by grouping a plurality of user accounts into tiers based on a trustworthiness score of each user account for the given action. The tiers and/or trustworthiness scores may be employed to classify an item, such as a message as spam or non-spam, based on input from the user accounts. The trustworthiness scores may also be employed to determine if a user account is a robot account or a human account. The trusted group for a given action may dynamically evolve over time by regrouping the user accounts based on modified trustworthiness scores. A trustworthiness score of an individual user account may be modified based on input received from the individual user account and input from other user accounts.

    摘要翻译: 实施例旨在通过用户帐户可信度得分来生成统一的用户帐户可信赖性系统。 可以基于针对给定动作的每个用户帐户的可信度分数将多个用户帐户分组成层,可以为给定动作识别可信赖的用户帐户组。 层级和/或可信赖性分数可以用于基于来自用户帐户的输入来将项目(诸如作为垃圾邮件或非垃圾邮件)的消息分类。 还可以使用可信度分数来确定用户帐户是机器人帐户还是人类账户。 给定动作的受信任组可以通过基于修改的可信度得分重新分组用户账户而随着时间的推移而动态演变。 可以基于从单个用户帐户接收的输入和来自其他用户帐户的输入来修改个人用户帐户的可信度分数。

    Multi-step captcha with serial time-consuming decryption of puzzles
    8.
    发明授权
    Multi-step captcha with serial time-consuming decryption of puzzles 有权
    多步验证码具有串行耗时的解谜难题

    公开(公告)号:US08522327B2

    公开(公告)日:2013-08-27

    申请号:US13206583

    申请日:2011-08-10

    IPC分类号: H04L29/06

    摘要: A system and method for implementing a multi-step challenge and response test includes steps or acts of: using an input/output subsystem for presenting a series of challenges to a user that require said user to correctly solve each challenge before a next challenge is revealed to the user; receiving the user's response to each challenge; and submitting a last response in the series of challenges to a server for validation. The method further includes: using a processor device configured to perform for each challenge in the series of challenges: internally validating the response by comparing the user's response to a correct response; and using the user's response, decrypting the next challenge to reveal the next challenge; wherein the next challenge remains obfuscated until a previous challenge is correctly solved.

    摘要翻译: 用于实现多步骤挑战和响应测试的系统和方法包括以下步骤或动作:使用输入/输出子系统向使用者呈现一系列挑战,要求所述用户在下一个挑战被揭露之前正确地解决每个挑战 给用户; 接收用户对每个挑战的响应; 并将一系列挑战中的最后一个响应提交给服务器进行验证。 该方法还包括:使用配置成针对一系列挑战中的每个挑战执行的处理器设备:通过将用户的响应与正确响应进行比较来内部验证响应; 并使用用户的响应,解密下一个挑战,揭示下一个挑战; 其中下一个挑战保持混淆,直到前一个挑战被正确地解决。

    Mail compression scheme with individual message decompressability
    9.
    发明授权
    Mail compression scheme with individual message decompressability 有权
    具有消息解压缩功能的邮件压缩方案

    公开(公告)号:US07836099B2

    公开(公告)日:2010-11-16

    申请号:US11831828

    申请日:2007-07-31

    IPC分类号: G06F17/30

    CPC分类号: H04L51/00 Y10S707/99942

    摘要: Embodiments of the present inversion relate to a two-pass compression scheme that achieves compression performance on par with existing methods while admitting individual message decompression. These methods provide both storage savings and lower end-user latency. They preserve the advantages of standard text compression in exploiting short-range similarities in data, while introducing a second step to take advantage of long-range similarities often present in certain types of structured data, e.g. email archival files.

    摘要翻译: 本反转的实施例涉及一种双通道压缩方案,其在允许单独的消息解压缩的同时实现与现有方法相当的压缩性能。 这些方法提供了存储节省和较低的终端用户延迟。 它们在利用数据中的短距离相似性的同时保留标准文本压缩的优点,同时引入第二步来利用通常存在于某些类型的结构化数据中的长程相似性,例如, 电子邮件归档文件

    CONSTRUCTING IMAGE CAPTCHAS UTILIZING PRIVATE INFORMATION OF THE IMAGES
    10.
    发明申请
    CONSTRUCTING IMAGE CAPTCHAS UTILIZING PRIVATE INFORMATION OF THE IMAGES 审中-公开
    使用图像的私人信息构建图像CAPTCHAS

    公开(公告)号:US20100228804A1

    公开(公告)日:2010-09-09

    申请号:US12397561

    申请日:2009-03-04

    IPC分类号: H04L9/32 G06F17/30

    摘要: An image CAPTCHA having one or more images, a challenge, and a correct answer to the challenge is constructed by selecting the one or more images from a plurality of candidate images based at least in part on each image's public information and private information. The private information of each of the images is accessible only to an entity responsible for constructing the CAPTCHA. Optionally, the one or more images are selected further based on the specific type of the CAPTCHA to be constructed.

    摘要翻译: 通过至少部分地基于每个图像的公共信息和私人信息,通过从多个候选图像中选择一个或多个图像来构建具有一个或多个图像,挑战和对挑战的正确答案的图像验证码。 每个图像的私人信息只能由负责构建CAPTCHA的实体访问。 可选地,基于要构建的CAPTCHA的具体类型进一步选择一个或多个图像。