Method and system for form-filling crawl and associating rich keywords
    2.
    发明授权
    Method and system for form-filling crawl and associating rich keywords 有权
    表单填充方法和系统抓取和关联丰富的关键字

    公开(公告)号:US08793239B2

    公开(公告)日:2014-07-29

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供了技术,用于有效地定位,处理和检索从通常可通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的网页获得的本地产品信息。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。

    System and method for generating a maximum utility slate of advertisements for online advertisement auctions
    3.
    发明授权
    System and method for generating a maximum utility slate of advertisements for online advertisement auctions 有权
    用于生成在线广告拍卖广告的最大效用图的系统和方法

    公开(公告)号:US08719096B2

    公开(公告)日:2014-05-06

    申请号:US11642433

    申请日:2006-12-20

    IPC分类号: G06Q30/00 G06Q40/00

    摘要: An improved system and method for generating a maximum utility slate of advertisements for online advertisement auctions is provided. Various utility factors for each advertisement that may be a candidate in a slate of advertisements may be applied within a framework in order to generate a maximum utility slate of advertisements. Either backward or forward dynamic programming may be applied to recursively evaluate the utility of subslates of advertisements in order to generate a maximum utility slate of advertisements. In an embodiment, a network with directed edges and associated costs may be defined, and the longest path may be found in the directed network for constructing a maximum utility slate of advertisements. Various utility factors may be applied for different objectives of an auctioneer and the framework presented may be extended for revenue ordering, exclusion of bidders, ordering slates according to first and second price utilities, and so forth.

    摘要翻译: 提供了一种用于生成在线广告拍卖的广告的最大效用图的改进的系统和方法。 可以在框架内应用可能是广告板中的候选者的每个广告的各种效用因子,以便生成广告的最大效用图。 可以应用向后或向前动态规划来递归地评估广告的下层的效用,以便生成广告的最大效用图。 在一个实施例中,可以定义具有有向边缘和相关联成本的网络,并且可以在有向网络中找到最长路径来构建广告的最大效用图。 可以将各种效用因素应用于拍卖人的不同目标,并且所提出的框架可以扩展为收入订购,排除投标者,根据第一和第二价格公用事业订购板岩等。

    Efficient algorithm for pairwise preference learning
    4.
    发明授权
    Efficient algorithm for pairwise preference learning 有权
    用于成对偏好学习的高效算法

    公开(公告)号:US08280829B2

    公开(公告)日:2012-10-02

    申请号:US12504460

    申请日:2009-07-16

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: In one embodiment, training a ranking model comprises: accessing the ranking model and an objective function of the ranking model; accessing one or more preference pairs of objects, wherein for each of the preference pairs of objects comprising a first object and a second object, there is a preference between the first object and the second object with respect to the particular reference, and the first object and the second object each has a feature vector comprising one or more feature values; and training the ranking model by minimizing the objective function using the preference pairs of objects, wherein for each of the preference pairs of objects, a difference between the first feature vector of the first object and the second feature vector of the second object is not calculated.

    摘要翻译: 在一个实施例中,训练排名模型包括:访问排名模型和排名模型的目标函数; 访问一个或多个偏好对对,其中对于包括第一对象和第二对象的对象的每个优选对,在第一对象和第二对象之间存在关于特定引用的偏好,并且第一对象 并且所述第二对象各自具有包括一个或多个特征值的特征向量; 并且通过使用对象的偏好对最小化目标函数来训练排名模型,其中对于每个偏好对的对象,不计算第一对象的第一特征向量与第二对象的第二特征向量之间的差异 。

    TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION
    5.
    发明申请
    TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION 审中-公开
    对特定记录属性提取的传播方法

    公开(公告)号:US20100274770A1

    公开(公告)日:2010-10-28

    申请号:US12429442

    申请日:2009-04-24

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/285

    摘要: Disclosed are methods and apparatus for segmenting and labeling a collection of token sequences. A plurality of segments of one or more tokens in a token sequence collection are partially labeled with labels from a set of target labels using high precision domain-specific labelers so as to generate a partially labeled sequence collection having a plurality of labeled segments and a plurality of unlabeled segments. Any label conflicts in the partially labeled sequence collection are resolved. One or more of the labeled segments of the partially labeled sequence collection are expanded so as to cover one or more additional tokens of the partially labeled sequence collection. A statistical model, for labeling segments using local token and segment features of the sequence collection, is trained based on the partially labeled sequence collection. This trained model is then used to label the unlabeled segments and the labeled segments of the sequence collection so as to generate a labeled sequence collection. The labeled sequence collection is then stored as structured output records in a database.

    摘要翻译: 公开了用于分割和标记令牌序列集合的方法和装置。 令牌序列集合中的一个或多个令牌的多个片段使用高精度域专用标签器从一组目标标签部分标记,以便生成具有多个标记片段和多个标记片段的部分标记序列集合 的未标记片段。 部分标记的序列集合中的任何标签冲突都被解决。 扩展部分标记的序列集合的一个或多个标记片段,以覆盖部分标记的序列集合的一个或多个附加标记。 基于部分标记的序列集合训练用于使用本地令牌和序列集合的片段特征来标记片段的统计模型。 然后将该训练模型用于标记序列集合的未标记片段和标记片段,以产生标记序列集合。 标记的序列集合然后作为结构化输出记录存储在数据库中。

    EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE TAXONOMY TEXT CLASSIFICATION
    6.
    发明申请
    EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE TAXONOMY TEXT CLASSIFICATION 审中-公开
    有效建立大型文本分类的紧凑型模型

    公开(公告)号:US20100161527A1

    公开(公告)日:2010-06-24

    申请号:US12342750

    申请日:2008-12-23

    IPC分类号: G06F15/18

    CPC分类号: G06F16/58 G06F16/51

    摘要: A taxonomy model is determined with a reduced number of weights. For example, the taxonomy model is a tangible representation of a hierarchy of nodes that represents a hierarchy of classes that, when labeled with a representation of a combination of weights, is usable to classify documents having known features but unknown class. For each node of the taxonomy, the training example documents are processed to determine the features for which there are a sufficient number of training example documents having a class label corresponding to at least one of the leaf nodes of a subtree having that node as a root node. For each node of the taxonomy, a sparse weight vector is determined for that node, including setting zero weights, for that node, those features determined to not appear at least a minimum number of times in a given set of leaf nodes in the sub-tree with that node as a root node. The sparse weight vectors can be learned by solving an optimization problem using a maximum entropy classifier, or a large margin classifier with a sequential dual method (SDM) with margin or slack resealing. The determined sparse weight vectors are tangibly embodied in a computer-readable medium in association with the tangible representation of the nodes of the taxonomy.

    摘要翻译: 用减少的权数确定分类模型。 例如,分类模型是代表层次结构的节点层次结构的有形表示,当用标号组合的权重标记可用于对具有已知特征但未知类的文档进行分类时。 对于分类法的每个节点,处理训练示例文档以确定具有足够数量的训练示例文档的特征,所述训练示例文档具有对应于具有该节点的子树的至少一个叶节点作为根的类标签 节点。 对于分类法的每个节点,为该节点确定该节点的稀疏权重向量,包括为该节点设置零权重,确定该子节点中给定的一组叶节点中至少不存在最少次数的那些特征, 树与该节点作为根节点。 可以通过使用最大熵分类器或具有边缘或松弛重新密度的顺序双重方法(SDM)的大余量分类器来求解优化问题来学习稀疏权重向量。 所确定的稀疏权重向量与计算机可读介质中的有形表示相结合,与分类法的节点的有形表示相关联。

    System and method for training a multi-class support vector machine to select a common subset of features for classifying objects
    7.
    发明申请
    System and method for training a multi-class support vector machine to select a common subset of features for classifying objects 有权
    用于训练多类支持向量机的系统和方法,以选择用于分类对象的特征的公共子集

    公开(公告)号:US20090150309A1

    公开(公告)日:2009-06-11

    申请号:US12001932

    申请日:2007-12-10

    IPC分类号: G06F15/18

    CPC分类号: G06K9/6249 G06K9/6269

    摘要: An improved system and method is provided for training a multi-class support vector machine to select a common subset of features for classifying objects. A multi-class support vector machine generator may be provided for learning classification functions to classify sets of objects into classes and may include a sparse support vector machine modeling engine for training a multi-class support vector machine using scaling factors by simultaneously selecting a common subset of features iteratively for all classes from sets of features representing each of the classes. An objective function using scaling factors to ensure sparsity of features may be iteratively minimized, and features may be retained and added until a small set of features stabilizes. Alternatively, a common subset of features may be found by iteratively removing at least one feature simultaneously for all classes from an active set of features initialized to represent the entire set of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于训练多类支持向量机以选择用于分类对象的特征的公共子集。 可以提供多类支持向量机生成器用于学习分类功能以将对象集合分类到类中,并且可以包括稀疏支持向量机建模引擎,用于使用缩放因子来同时选择公共子集来训练多类支持向量机 的特征迭代地为表示每个类的特征的集合的所有类。 使用缩放因子以确保特征的稀疏性的目标函数可以被迭代地最小化,并且可以保留和添加特征,直到一小组特征稳定。 或者,可以通过从被初始化为表示整套训练特征的活动特征集合中的所有类别同时迭代地去除至少一个特征来发现特征的公共子集。

    PAIRWISE RANKING-BASED CLASSIFIER
    8.
    发明申请
    PAIRWISE RANKING-BASED CLASSIFIER 有权
    基于排序的分类器

    公开(公告)号:US20110099131A1

    公开(公告)日:2011-04-28

    申请号:US12603763

    申请日:2009-10-22

    IPC分类号: G06F15/18 G06N5/02

    CPC分类号: G06N99/005 G06F17/30707

    摘要: The present invention provides methods and systems for binary classification of items. Methods and systems are provided for constructing a machine learning-based and pairwise ranking method-based classification model for binary classification of items as positive or negative with regard to a single class, based on training using a training set of examples including positive examples and unlabelled examples. The model includes only one hyperparameter and only one threshold parameter, which are selected to optimize the model with regard to constraining positive items to be classified as positive while minimizing a number of unlabelled items classified as positive.

    摘要翻译: 本发明提供了用于项目二进制分类的方法和系统。 提供方法和系统,用于构建基于机器学习和成对排序方法的分类模型,对于单个类别的项目的二进制分类为正或负,基于使用包括正面示例和未标记的示例的训练集的训练 例子。 该模型仅包括一个超参数和仅一个阈值参数,其被选择以优化模型以限制正项目被分类为正,同时使被分类为阳性的未标记项目的数量最小化。

    System and method for scheduling online keyword auctions over multiple time periods subject to budget and query volume constraints
    9.
    发明申请
    System and method for scheduling online keyword auctions over multiple time periods subject to budget and query volume constraints 审中-公开
    在多个时间段内按预算和查询量限制调度在线关键字拍卖的系统和方法

    公开(公告)号:US20090112691A1

    公开(公告)日:2009-04-30

    申请号:US11981319

    申请日:2007-10-30

    IPC分类号: G06Q30/00 G06F17/30 G06Q10/00

    摘要: An improved system and method for scheduling online keyword auctions over multiple time periods subject to budget constraints is provided. A linear programming model of slates of advertisements may be created for predicting the volume and order in which queries may appear throughout multiple time periods for use in allocating bidders to auctions to optimize revenue of an auctioneer. Each slate of advertisements may represent a candidate set of advertisements in order of optimal revenue to an auctioneer. Linear programming using column generation with the keyword as a constraint and a bidder's budget as a constraint may be applied for each time period to generate a column that may be added to a linear programming model of slates of advertisements. Upon receiving a query request, a slate of advertisements for the time period may be output for sending to a web browser for display.

    摘要翻译: 提供了一种用于在多个时间段内按预算约束调度在线关键词拍卖的改进的系统和方法。 可以创建广告平板的线性规划模型,用于预测在多个时间段期间查询可能出现的音量和顺序,以用于将投标者分配给拍卖以优化拍卖者的收入。 广告的每一张广告可以以拍卖者的最佳收入的顺序代表一组候选广告。 可以对每个时间段应用使用关键字作为约束的列生成和作为约束的出价者预算的线性规划,以生成可以被添加到广告平面的线性规划模型的列。 在接收到查询请求时,可以输出该时间段的广告片,以便发送到web浏览器进行显示。

    System and method for generating a maximum utility slate of advertisements for online advertisement auctions
    10.
    发明申请
    System and method for generating a maximum utility slate of advertisements for online advertisement auctions 有权
    用于生成在线广告拍卖广告的最大效用图的系统和方法

    公开(公告)号:US20080154662A1

    公开(公告)日:2008-06-26

    申请号:US11642433

    申请日:2006-12-20

    IPC分类号: G06Q30/00

    摘要: An improved system and method for generating a maximum utility slate of advertisements for online advertisement auctions is provided. Various utility factors for each advertisement that may be a candidate in a slate of advertisements may be applied within a framework in order to generate a maximum utility slate of advertisements. Either backward or forward dynamic programming may be applied to recursively evaluate the utility of subslates of advertisements in order to generate a maximum utility slate of advertisements. In an embodiment, a network with directed edges and associated costs may be defined, and the longest path may be found in the directed network for constructing a maximum utility slate of advertisements. Various utility factors may be applied for different objectives of an auctioneer and the framework presented may be extended for revenue ordering, exclusion of bidders, ordering slates according to first and second price utilities, and so forth.

    摘要翻译: 提供了一种用于生成在线广告拍卖的广告的最大效用图的改进的系统和方法。 可以在框架内应用可能是广告板中的候选者的每个广告的各种效用因子,以便生成广告的最大效用图。 可以应用向后或向前动态规划来递归地评估广告的下层的效用,以便生成广告的最大效用图。 在一个实施例中,可以定义具有有向边缘和相关联成本的网络,并且可以在有向网络中找到最长路径来构建广告的最大效用图。 可以将各种效用因素应用于拍卖人的不同目标,并且所提出的框架可以扩展为收入订购,排除投标者,根据第一和第二价格公用事业订购板岩等。