Using source-channel models for word segmentation
    31.
    发明授权
    Using source-channel models for word segmentation 有权
    使用源通道模型进行分词

    公开(公告)号:US07493251B2

    公开(公告)日:2009-02-17

    申请号:US10448644

    申请日:2003-05-30

    CPC classification number: G06F17/2755 G06F17/277

    Abstract: A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.

    Abstract translation: 提供了用于分割文本的方法和装置,其从字符序列识别实体类型的序列,从而识别字符序列的分割。 在本发明下,使用描述实体序列的可能性的概率模型和给定特定实体的字符序列的可能性来识别实体类型的序列。 在本发明的一个方面,从识别的实体的第一序列识别组织名称实体,以形成所识别实体的最终序列。

    QUERY SPELLER
    32.
    发明申请

    公开(公告)号:US20080046405A1

    公开(公告)日:2008-02-21

    申请号:US11465023

    申请日:2006-08-16

    CPC classification number: G06F17/3064

    Abstract: Candidate suggestions for correcting misspelled query terms input into a search application are automatically generated. A score for each candidate suggestion can be generated using a first decoding pass and paths through the suggestions can be ranked in a second decoding pass. Candidate suggestions can be generated based on typographical errors, phonetic mistakes and/or compounding mistakes. Furthermore, a ranking model can be developed to rank candidate suggestions to be presented to a user.

    Abstract translation: 自动生成用于纠正输入到搜索应用程序中的拼错查询条件的候选建议。 可以使用第一解码通道来生成每个候选建议的得分,并且通过建议的路径可以被排列在第二解码通行证中。 可以根据印刷错误,语音错误和/或复合错误生成候选建议。 此外,可以开发排名模型来排列要呈现给用户的候选建议。

    Unsupervised training for overlapping ambiguity resolution in word segmentation
    33.
    发明申请
    Unsupervised training for overlapping ambiguity resolution in word segmentation 审中-公开
    用于重叠模糊度分辨率的无监督训练

    公开(公告)号:US20050060150A1

    公开(公告)日:2005-03-17

    申请号:US10662502

    申请日:2003-09-15

    Applicant: Mu Li Jianfeng Gao

    Inventor: Mu Li Jianfeng Gao

    CPC classification number: G06F17/2863 G06F17/2775

    Abstract: A method for resolving overlapping ambiguity strings in unsegmented languages such as Chinese. The methodology includes segmenting sentences into two possible segmentations and recognizing overlapping ambiguity strings in the sentences. One of the two possible segmentations is selected as a function of probability information. The probability information is derived from unsupervised training data. A method of constructing a knowledge base containing probability information needed to select one of the segmentation is also provided.

    Abstract translation: 用于解析诸如中文的未分段语言中的重叠歧义字符串的方法。 该方法包括将句子分割成两个可能的分段,并识别句子中的重叠歧义字符串。 作为概率信息的函数选择两个可能的分段中的一个。 概率信息是从无监督的训练数据导出的。 还提供了构建包含选择分割之一所需的概率信息的知识库的方法。

Patent Agency Ranking