Utilizing grammatical parsing for structured layout analysis
    1.
    发明申请
    Utilizing grammatical parsing for structured layout analysis 审中-公开
    利用语法解析进行结构化布局分析

    公开(公告)号:US20060245654A1

    公开(公告)日:2006-11-02

    申请号:US11119451

    申请日:2005-04-29

    IPC分类号: G06K9/72 G06F7/00

    摘要: Grammatical parsing is utilized to parse structured layouts that are modeled as grammars. This type of parsing provides an optimal parse tree for the structured layout based on a grammatical cost function associated with a global search. Machine learning techniques facilitate in discriminatively selecting features and setting parameters in the grammatical parsing process. In one instance, labeled examples are parsed and a chart is generated. The chart is then converted into a subsequent set of labeled learning examples. Classifiers are then trained utilizing conventional machine learning and the subsequent example set. The classifiers are then employed to facilitate scoring of succedent sub-parses. A global reference grammar can also be established to facilitate in completing varying tasks without requiring additional grammar learning, substantially increasing the efficiency of the structured layout analysis techniques.

    摘要翻译: 语法解析用于分析模拟为语法的结构化布局。 这种类型的解析为基于与全局搜索相关联的语法成本函数的结构化布局提供了最佳解析树。 机器学习技术有助于在语法解析过程中区分性地选择特征和设置参数。 在一个实例中,已分析标记的示例并生成图表。 然后将该图转换成随后的一组标记的学习示例。 然后使用常规机器学习和随后的示例集训练分类器。 然后使用分类器来方便后续子解析的得分。 还可以建立全局参考语法,以便于完成各种任务,而不需要额外的语法学习,从而大大提高结构化布局分析技术的效率。

    CONTINUOUS INFERENCE FOR SEQUENCE DATA
    2.
    发明申请
    CONTINUOUS INFERENCE FOR SEQUENCE DATA 有权
    序列数据的连续干扰

    公开(公告)号:US20070282538A1

    公开(公告)日:2007-12-06

    申请号:US11421585

    申请日:2006-06-01

    IPC分类号: G06F19/00

    CPC分类号: G06F19/22

    摘要: Dynamic inference is leveraged to provide online sequence data labeling. This provides real-time alternatives to current methods of inference for sequence data. Instances estimate an amount of uncertainty in a prediction of labels of sequence data and then dynamically predict a label when an uncertainty in the prediction is deemed acceptable. The techniques utilized to determine when the label can be generated are tunable and can be personalized for a given user and/or a system. Employed decoding techniques can be dynamically adjusted to tradeoff system resources for accuracy. This allows for fine tuning of a system based on available system resources. Instances also allow for online inference because the inference does not require knowledge of a complete set of sequence data.

    摘要翻译: 利用动态推理来提供在线序列数据标签。 这提供了对序列数据的推理的当前方法的实时替代。 实例估计序列数据标签的预测中的不确定性量,然后当预测中的不确定性被认为是可接受的时候动态地预测标签。 用于确定何时可以生成标签的技术是可调谐的,并且可以针对给定的用户和/或系统进行个性化。 采用解码技术可以动态调整,以便对系统资源进行权衡以获得准确性。 这允许基于可用的系统资源对系统进行微调。 实例还允许在线推理,因为推理不需要知道一套完整的序列数据。

    Extracting data from semi-structured information utilizing a discriminative context free grammar
    3.
    发明申请
    Extracting data from semi-structured information utilizing a discriminative context free grammar 审中-公开
    使用歧视性上下文无关语法从半结构化信息中提取数据

    公开(公告)号:US20060245641A1

    公开(公告)日:2006-11-02

    申请号:US11119467

    申请日:2005-04-29

    IPC分类号: G06K9/62 G06F17/27 G06K9/46

    摘要: A discriminative grammar framework utilizing a machine learning algorithm is employed to facilitate in learning scoring functions for parsing of unstructured information. The framework includes a discriminative context free grammar that is trained based on features of an example input. The flexibility of the framework allows information features and/or features output by arbitrary processes to be utilized as the example input as well. Myopic inside scoring is circumvented in the parsing process because contextual information is utilized to facilitate scoring function training.

    摘要翻译: 采用利用机器学习算法的歧视性语法框架来促进用于解析非结构化信息的学习评分功能。 该框架包括基于示例输入的特征进行训练的歧视上下文无关语法。 框架的灵活性允许通过任意进程输出的信息特征和/或特征作为示例输入。 在分析过程中绕过了近视的内分,因为利用上下文信息来促进评分功能训练。

    Continuous inference for sequence data
    4.
    发明授权
    Continuous inference for sequence data 有权
    序列数据的连续推断

    公开(公告)号:US07551784B2

    公开(公告)日:2009-06-23

    申请号:US11421585

    申请日:2006-06-01

    IPC分类号: G06K9/00 G06K9/62

    CPC分类号: G06F19/22

    摘要: Dynamic inference is leveraged to provide online sequence data labeling. This provides real-time alternatives to current methods of inference for sequence data. Instances estimate an amount of uncertainty in a prediction of labels of sequence data and then dynamically predict a label when an uncertainty in the prediction is deemed acceptable. The techniques utilized to determine when the label can be generated are tunable and can be personalized for a given user and/or a system. Employed decoding techniques can be dynamically adjusted to tradeoff system resources for accuracy. This allows for fine tuning of a system based on available system resources. Instances also allow for online inference because the inference does not require knowledge of a complete set of sequence data.

    摘要翻译: 利用动态推理来提供在线序列数据标签。 这提供了对序列数据的推理的当前方法的实时替代。 实例估计序列数据标签的预测中的不确定性量,然后当预测中的不确定性被认为是可接受的时候动态地预测标签。 用于确定何时可以生成标签的技术是可调谐的,并且可以针对给定的用户和/或系统进行个性化。 采用解码技术可以动态调整,以便对系统资源进行权衡以获得准确性。 这允许基于可用的系统资源对系统进行微调。 实例还允许在线推理,因为推理不需要知道一套完整的序列数据。

    AUTOMATICALLY GENERATING TRAINING DATA
    5.
    发明申请
    AUTOMATICALLY GENERATING TRAINING DATA 审中-公开
    自动生成培训数据

    公开(公告)号:US20110314011A1

    公开(公告)日:2011-12-22

    申请号:US12818377

    申请日:2010-06-18

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Computer-readable media, computer systems, and computing devices facilitate generating binary classifier and entity extractor training data. Seed URLs are selected and URL patterns within the seed URLs are identified. Matching URLs in a data structure are identified and corresponding queries and their associated weights are added to a potential training data set from which training data is selected.

    摘要翻译: 计算机可读介质,计算机系统和计算设备便于生成二进制分类器和实体提取器训练数据。 选择种子网址,并识别种子网址中的URL模式。 识别数据结构中的匹配URL,并将对应的查询及其相关权重添加到从其中选择训练数据的潜在训练数据集。

    Implicit geolocation of social networking users
    9.
    发明授权
    Implicit geolocation of social networking users 有权
    社交网络用户的隐性地理位置

    公开(公告)号:US08972570B1

    公开(公告)日:2015-03-03

    申请号:US13588956

    申请日:2012-08-17

    IPC分类号: G06F15/16 H04L12/26

    摘要: In one embodiment, one or more computing systems receive a request for a location prediction for a user from a service. The computing systems access one or more real-time location signals and one or more aggregated location signals, generate one or more location predictions from the one or more real-time location signals and the one or more aggregated location signals, and calculate a single location prediction for the user from the one or more location predictions. The computing systems then transmit, in response to the request, the single location prediction for the user to the requesting service.

    摘要翻译: 在一个实施例中,一个或多个计算系统从服务接收对用户的位置预测的请求。 计算系统访问一个或多个实时位置信号和一个或多个聚合位置信号,从一个或多个实时位置信号和一个或多个聚合位置信号产生一个或多个位置预测,并计算单个位置 从一个或多个位置预测中预测用户。 然后,计算系统响应于该请求,向用户发送针对请求服务的单个位置预测。

    Learning A* priority function from unlabeled data
    10.
    发明申请
    Learning A* priority function from unlabeled data 有权
    从未标记的数据学习A *优先级功能

    公开(公告)号:US20080256007A1

    公开(公告)日:2008-10-16

    申请号:US11786006

    申请日:2007-04-10

    IPC分类号: G06F15/18

    CPC分类号: G06K9/6297 G06N99/005

    摘要: A technique for increasing efficiency of inference of structure variables (e.g., an inference problem) using a priority-driven algorithm rather than conventional dynamic programming. The technique employs a probable approximate underestimate which can be used to compute a probable approximate solution to the inference problem when used as a priority function (“a probable approximate underestimate function”) for a more computationally complex classification function. The probable approximate underestimate function can have a functional form of a simpler, easier to decode model. The model can be learned from unlabeled data by solving a linear/quadratic optimization problem. The priority function can be computed quickly, and can result in solutions that are substantially optimal. Using the priority function, computation efficiency of a classification function (e.g., discriminative classifier) can be increased using a generalization of the A* algorithm.

    摘要翻译: 一种使用优先级驱动算法而不是常规动态规划来提高结构变量推理效率(例如,推理问题)的技术。 该技术采用可能的近似低估,可用于计算对于更为计算复杂的分类函数的用作优先级函数(“可能的近似低估函数”)的推理问题的可能近似解。 可能的近似低估函数可以具有更简单,更容易解码模型的功能形式。 通过求解线性/二次优化问题,可以从未标记的数据中学习该模型。 可以快速计算优先级函数,并且可以产生基本上最优的解。 使用优先功能,可以使用A *算法的泛化来增加分类函数(例如,辨别分类器)的计算效率。