APPARATUS AND METHODS FOR CONCEPT-CENTRIC INFORMATION EXTRACTION
    3.
    发明申请
    APPARATUS AND METHODS FOR CONCEPT-CENTRIC INFORMATION EXTRACTION 审中-公开
    概念中心信息提取的装置和方法

    公开(公告)号:US20100241639A1

    公开(公告)日:2010-09-23

    申请号:US12408450

    申请日:2009-03-20

    IPC分类号: G06F17/30

    CPC分类号: G06F16/345 G06F16/313

    摘要: Disclosed are methods and apparatus for extracting (or annotating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain-specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain. Extraction of the structured data instances is accomplished by (i) using the domain-specific concept labeler to annotate a subset of nodes of the tree instances; and (ii) using a locally adaptive concept annotator to extract the structured data instances based on the annotated segments and the local properties associated with such annotated segments. The extracted structured data instance is stored as structured output records in a database.

    摘要翻译: 公开了从网页内容中提取(或注释)结构化信息的方法和装置。 来自特定域的感兴趣的Web内容被表示为具有多个分支节点的一个或多个树实例,每个分支节点对应于web对象,使得树实例对应于一个或多个结构化数据实例。 特定域与域知识相关联,其包括一个或多个呈现规则集,每个表示规则集指定一组数据实例的特定结构,特定于域的概念标签器,树实例中的web对象的一个​​或多个指定的属性,以及 一个概念模式,指定要从Web内容中提取的数据的表示。 基于特定域的域知识,从一个或多个树实例提取符合概念模式的结构化数据实例。 结构化数据实例的提取是通过(i)使用域特定概念标签器来注释树实例的节点的子集来实现的; 以及(ii)使用本地适应性概念注释器基于所注释的段和与这些注释段相关联的本地属性来提取结构化数据实例。 提取的结构化数据实例作为结构化输出记录存储在数据库中。

    System for opinion reconciliation
    4.
    发明授权
    System for opinion reconciliation 有权
    意见调解制度

    公开(公告)号:US07895149B2

    公开(公告)日:2011-02-22

    申请号:US11957779

    申请日:2007-12-17

    IPC分类号: G06N5/00

    CPC分类号: G06N5/04

    摘要: A system is disclosed for reconciling opinions generated by agents with respect to one or more predicates. The disclosed system may use observed variables and a probabilistic model including latent parameters to estimate a truth score associated with each of the predicates. The truth score, as well as one or more of the latent parameters of the probabilistic model, may be estimated based on the observed variables. The truth score generated by the disclosed system may enable publishers to reliably represent the truth of a predicate to interested users.

    摘要翻译: 披露了一种系统,用于协调代理人对一种或多种谓词产生的意见。 所公开的系统可以使用观测变量和包括潜在参数的概率模型来估计与每个谓词相关联的真值得分。 可以基于观察到的变量来估计真实分数以及概率模型的一个或多个潜在参数。 由所公开的系统产生的真相得分可以使得发布者能够可靠地向感兴趣的用户表示谓词的真实性。

    Extracting rich temporal context for business entities and events
    5.
    发明授权
    Extracting rich temporal context for business entities and events 有权
    为业务实体和事件提取丰富的时间背景

    公开(公告)号:US08606564B2

    公开(公告)日:2013-12-10

    申请号:US12917389

    申请日:2010-11-01

    IPC分类号: G06F17/27 G06F17/30

    摘要: Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.

    摘要翻译: 公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。 在一个实施例中,获得文本序列。 将标签分配给文本的多个片段中的一个或多个,使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。 将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示,其中规则包括一个或多个示意图规则。 示意性规则中的每一个涉及多个时间数据类别中的一个或多个,并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。

    EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS
    6.
    发明申请
    EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS 有权
    为商业实体和活动提供丰富的时间背景

    公开(公告)号:US20120109637A1

    公开(公告)日:2012-05-03

    申请号:US12917389

    申请日:2010-11-01

    IPC分类号: G06F17/27 G06F17/30

    摘要: Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.

    摘要翻译: 公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。 在一个实施例中,获得文本序列。 将标签分配给文本的多个片段中的一个或多个,使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。 将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示,其中规则包括一个或多个示意图规则。 示意性规则中的每一个涉及多个时间数据类别中的一个或多个,并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。

    TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION
    7.
    发明申请
    TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION 审中-公开
    对特定记录属性提取的传播方法

    公开(公告)号:US20100274770A1

    公开(公告)日:2010-10-28

    申请号:US12429442

    申请日:2009-04-24

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/285

    摘要: Disclosed are methods and apparatus for segmenting and labeling a collection of token sequences. A plurality of segments of one or more tokens in a token sequence collection are partially labeled with labels from a set of target labels using high precision domain-specific labelers so as to generate a partially labeled sequence collection having a plurality of labeled segments and a plurality of unlabeled segments. Any label conflicts in the partially labeled sequence collection are resolved. One or more of the labeled segments of the partially labeled sequence collection are expanded so as to cover one or more additional tokens of the partially labeled sequence collection. A statistical model, for labeling segments using local token and segment features of the sequence collection, is trained based on the partially labeled sequence collection. This trained model is then used to label the unlabeled segments and the labeled segments of the sequence collection so as to generate a labeled sequence collection. The labeled sequence collection is then stored as structured output records in a database.

    摘要翻译: 公开了用于分割和标记令牌序列集合的方法和装置。 令牌序列集合中的一个或多个令牌的多个片段使用高精度域专用标签器从一组目标标签部分标记,以便生成具有多个标记片段和多个标记片段的部分标记序列集合 的未标记片段。 部分标记的序列集合中的任何标签冲突都被解决。 扩展部分标记的序列集合的一个或多个标记片段,以覆盖部分标记的序列集合的一个或多个附加标记。 基于部分标记的序列集合训练用于使用本地令牌和序列集合的片段特征来标记片段的统计模型。 然后将该训练模型用于标记序列集合的未标记片段和标记片段,以产生标记序列集合。 标记的序列集合然后作为结构化输出记录存储在数据库中。

    Rapid iterative development of classifiers
    8.
    发明授权
    Rapid iterative development of classifiers 有权
    分类器的快速迭代开发

    公开(公告)号:US08849790B2

    公开(公告)日:2014-09-30

    申请号:US12344132

    申请日:2008-12-24

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30265 G06F17/3028

    摘要: A classifier development process seamlessly and intelligently integrates different forms of human feedback on instances and features into the data preparation, learning and evaluation stages. A query utility based active learning approach is applicable to different types of editorial feedback. A bi-clustering based technique may be used to further speed up the active learning process.

    摘要翻译: 分类器开发过程将数据准备,学习和评估阶段的实例和特征的不同形式的人类反馈无缝智能地整合在一起。 基于查询实用程序的主动学习方法适用于不同类型的编辑反馈。 可以使用基于双聚类的技术来进一步加速主动学习过程。

    SQUASHED MATRIX FACTORIZATION FOR MODELING INCOMPLETE DYADIC DATA
    9.
    发明申请
    SQUASHED MATRIX FACTORIZATION FOR MODELING INCOMPLETE DYADIC DATA 审中-公开
    用于建模不完整的数据的SQUASHED MATRIX FACTORIZATION

    公开(公告)号:US20100169158A1

    公开(公告)日:2010-07-01

    申请号:US12346641

    申请日:2008-12-30

    IPC分类号: G06N5/02 G06Q10/00

    摘要: A method of predicting a response relationship between elements of two sets includes: specifying a dyadic response matrix; specifying covariates that measure additional dyadic relationships; specifying a number of row clusters and a number of column clusters for clustering the rows and columns of the response matrix; specifying a rank for cluster factors that model average interactions between row clusters and column clusters by products of cluster factors; and determining prediction parameters for predicting responses between elements of the first set and the second set by improving a likelihood value that relates the prediction parameters to the response matrix, the covariates, the observation weights, the row clusters and the column clusters. Determining the prediction parameters includes: updating the prediction parameters for fixed assignments of row clusters and column clusters, and updating assignments for row clusters and column clusters for fixed prediction parameters.

    摘要翻译: 一种预测两组元素之间的响应关系的方法包括:指定二元响应矩阵; 指定衡量其他二元关系的协变量; 指定用于聚集响应矩阵的行和列的行簇和多个列簇; 指定通过集群因子乘积建模行簇和列簇之间的平均相互作用的集群因子的等级; 以及通过改进将所述预测参数与所述响应矩阵,所述协变量,所述观测权重,所述行簇和所述列簇相关联的似然值来确定用于预测所述第一集合和所述第二集合的元素之间的响应的预测参数。 确定预测参数包括:更新用于行簇和列簇的固定分配的预测参数,以及针对固定预测参数更新行簇和列簇的分配。

    User interface for managing questions and answers across multiple social media data sources
    10.
    发明授权
    User interface for managing questions and answers across multiple social media data sources 有权
    用于管理多个社交媒体数据源的问题和答案的用户界面

    公开(公告)号:US09026916B2

    公开(公告)日:2015-05-05

    申请号:US13167452

    申请日:2011-06-23

    摘要: A method for managing user-generated questions and answers across multiple social media data sources can begin with the receiving of query parameters, including a user-entered question, via the user interface of a social media Q&A manage. Social media data sources can be queried for knowledge related to the user-entered question. When knowledge related to the user-entered question exists, the existing related knowledge can be organized and presented in the user interface according to a determined answer quality. When knowledge related to the user-entered question does not exist or is deemed unsatisfactory by a user, the user-entered question can be automatically submitted to applicable social media data sources by the social media Q&A manager on behalf of the user. A status of the submitted user-entered question can be monitored. When the status of the submitted user-entered question changes, the method can be re-executed at the querying step.

    摘要翻译: 用于管理多个社交媒体数据源的用户生成的问题和答案的方法可以从通过社交媒体问答管理的用户界面接收查询参数(包括用户输入的问题)开始。 社会媒体资料来源可以查询与用户输入的问题有关的知识。 当存在与用户输入问题相关的知识时,可以根据确定的答案质量在用户界面中组织和呈现现有的相关知识。 当与用户输入的问题相关的知识不存在或被用户认为不令人满意时,用户输入的问题可以由社交媒体问答经理代表用户自动提交给适用的社交媒体数据源。 可以监视提交的用户输入的问题的状态。 当提交的用户输入的问题的状态改变时,该方法可以在查询步骤重新执行。