Automated Interpretation and Replacement of Date References in Unstructured Text
    1.
    发明申请
    Automated Interpretation and Replacement of Date References in Unstructured Text 审中-公开
    在非结构化文本中自动解释和替换日期参考

    公开(公告)号:US20080154897A1

    公开(公告)日:2008-06-26

    申请号:US11942127

    申请日:2007-11-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/277 G06F16/258

    摘要: A method for interpreting date information from unstructured text includes performing phrase tokenization on the unstructured text to identify one or more temporal phrases. Word categorization is performed on the one or more temporal phrases to categorize one or more words of each temporal phrase. Grammar analysis is performed to match each temporal phrase to an understood syntax using the categorizations of the words of each temporal phrase. Each temporal phrase is interpreted based on the matched syntax.

    摘要翻译: 用于从非结构化文本解释日期信息的方法包括在非结构化文本上执行短语标记化以识别一个或多个时间短语。 对一个或多个时间短语执行词分类以对每个时间短语的一个或多个单词进行分类。 执行语法分析以使用每个时间短语的单词的分类将每个时间短语与理解的语法相匹配。 基于匹配的语法来解释每个时间短语。

    Data De-Identification By Obfuscation
    3.
    发明申请
    Data De-Identification By Obfuscation 审中-公开
    通过混淆的数据去识别

    公开(公告)号:US20080240425A1

    公开(公告)日:2008-10-02

    申请号:US12048361

    申请日:2008-03-14

    IPC分类号: H04L9/28

    CPC分类号: G06F21/6254

    摘要: Medical or other data is de-identified by obfuscation. Located instances are replaced. By replacing with values in a same format and level of generality, multiple possible identifications—the replacement values and the instances not located—are provided in the data, obfuscating the original identification. By replacing as a function of a probability, the resulting data set has different instances distributed in a way making identification of the actual or original instances not located by searching more difficult.

    摘要翻译: 医疗或其他数据通过混淆来取消。 定位的实例被替换。 通过以相同格式和通用级别替换值,可以在数据中提供多个可能的标识 - 替换值和未定位的实例,从而模糊原始标识。 通过将概率替换为函数,所得到的数据集具有不同的实例,其分布方式使得通过搜索更困难而不是定位的实际或原始实例的标识。

    System and method for text tagging and segmentation using a generative/discriminative hybrid hidden markov model
    4.
    发明授权
    System and method for text tagging and segmentation using a generative/discriminative hybrid hidden markov model 有权
    使用生成/区分性混合隐马尔可夫模型进行文本标记和分割的系统和方法

    公开(公告)号:US08086443B2

    公开(公告)日:2011-12-27

    申请号:US12195932

    申请日:2008-08-21

    CPC分类号: G10L15/142

    摘要: A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters θ and discriminative parameters {tilde over (θ)}, providing a functional LL−C×Penalty, where LL is a log-likelihood function LL = log ⁢ ⁢ p ⁡ ( θ , θ ~ ) + ∏ l = 1 M ⁢ ⁢ [ log ⁢ ⁢ p ⁢ ( X l , Y l | θ ~ ) - log ⁢ ⁢ p ⁡ ( X l | θ ~ ) ] + ∏ l = 1 M ⁢ ⁢ log ⁢ ⁢ p ⁡ ( X l | θ ) , ⁢ Penalty = ∑ y ∈ V Y ⁢ ( em y 2 + tr y 2 + e ⁢ ⁢ m ~ y 2 + t ⁢ ⁢ r ~ y 2 ) , where emy=1−Σ∀xiεVXp(xi|y), e{tilde over (m)}y=1−Σ∀xiεVX{tilde over (p)}(xi|y) are emission probability constraints, try=1−Σ∀yiεVYp(yi|y), t{tilde over (r)}y=1−Σ∀yiεVY{tilde over (p)}(yi|y) are transition probability constraints, and extracting gradients of LL−C×Penalty with respect to the transition and emission probabilities and solving θk*,{tilde over (θ)}k*that maximize LL−C×Penalty, initializing a new iteration with θk*,{tilde over (θ)}k* and incrementing C and repeating until solutions have converged, where parameters θ,{tilde over (θ)} are the probabilities that a new sentence X′ is labeled as Y′.

    摘要翻译: 用于对医疗病人记录进行顺序标记的方法包括提供从一组医疗记录中取得的标记语句库,初始化生成参数和假设; 提供一个功能性的LL-C×Penalty,其中LL是一个对数似然函数,LL = log-perm p⁡(&Thetas;,&thetas;〜)+Πl = 1 M ¯[⁢⁢⁢⁡⁡⁡(X as;;⁡⁡⁡⁡⁡ΠΠΠΠΠΠ⁡⁡⁡⁡⁡⁡ |& tt;φ········ VXp (xi | y),e {tilde over(m)} y = 1&Sgr;∀xi&egr; VX {tilde over(p)}(xi | y)是发射概率约束,try = 1-&Sgr;∀yi&egr; (yi | y),t {tilde over(r)} y = 1&Sgr;∀yi&egr; VY {tilde over(p)}(yi | y)是转移概率约束,提取LL-C× 对于过渡和排放概率和解决方案的惩罚; k *,{tilde over(&thetas;)} k *,使LL-C×Penalty最大化,用&thetas初始化新的迭代; k *,{tilde over(&thetas; )} k *并递增C并重复 直到解决方案已经收敛,其中参数&thetas; {tilde over(&thetas;)}是新句子X'被标记为Y'的概率。

    System and Method for Text Tagging and Segmentation Using a Generative/Discriminative Hybrid Hidden Markov Model

    公开(公告)号:US20090055183A1

    公开(公告)日:2009-02-26

    申请号:US12195932

    申请日:2008-08-21

    IPC分类号: G10L15/14

    CPC分类号: G10L15/142

    摘要: A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters θ and discriminative parameters {tilde over (θ)}, providing a functional LL−C×Penalty, where LL is a log-likelihood function LL = log   p  ( θ , θ ~ ) + ∏ l = 1 M   [ log   p  ( X l , Y l | θ ~ ) - log   p  ( X l | θ ~ ) ] + ∏ l = 1 M   log   p  ( X l | θ ) ,  Penalty = ∑ y ∈ V Y  ( em y 2 + tr y 2 + e   m ~ y 2 + t   r ~ y 2 ) , where emy=1−Σ∀xjεVXp(xi|y), e{tilde over (m)}y=1−Σ∀xiεVX{tilde over (p)}(xi|y) are emission probability constraints, try=1−Σ∀yiεVYp(yi|y), t{tilde over (r)}y=1−Σ∀yiεVY{tilde over (p)}(yi|y) are transition probability constraints, and extracting gradients of LL−C×Penalty with respect to the transition and emission probabilities and solving θ*k,{tilde over (θ)}*k that maximize LL−C×Penalty, initializing a new iteration with θ*k,{tilde over (θ)}*k and incrementing C and repeating until solutions have converged, where parameters θ,{tilde over (θ)} are the probabilities that a new sentence X′ is labeled as Y′.

    System and Method for Creating and Searching Medical Ontologies
    9.
    发明申请
    System and Method for Creating and Searching Medical Ontologies 有权
    用于创建和搜索医学本体的系统和方法

    公开(公告)号:US20090024615A1

    公开(公告)日:2009-01-22

    申请号:US12172367

    申请日:2008-07-14

    IPC分类号: G06F17/30

    摘要: A method for creating and searching medical ontologies includes providing a semi-structured information source comprising a plurality of articles linked to each other, each article having one or more sections and each article is associated with a concept, creating a directed unlabeled graph representative of the information source, providing a plurality of labels, labeling a subset of edges, and assigning each unlabeled edge an equal probability of being assigned one of the labels. For each node, the probability of each outgoing edge is updated by smoothing each probability by an overall probability distribution of labels over all outgoing edges of each node, and the probability of each incoming edge is updated the same way. A label with a maximum probability is assigned to an edge if said maximum probability is greater than a predetermined threshold to create a labeled graph.

    摘要翻译: 一种用于创建和搜索医学本体的方法包括提供包括彼此链接的多个物品的半结构化信息源,每个物品具有一个或多个部分,并且每个物品与概念相关联,创建代表该物品的定向未标记图 信息源,提供多个标签,标记边缘的子集,以及分配每个未标记的边缘等于分配其中一个标签的概率。 对于每个节点,通过在每个节点的所有输出边缘上的标签的总概率分布来平滑每个概率来更新每个出站边缘的概率,并且以相同的方式更新每个进入边缘的概率。 如果所述最大概率大于预定阈值以创建标记图,则将具有最大概率的标签分配给边。

    Medical Entity Extraction From Patient Data
    10.
    发明申请
    Medical Entity Extraction From Patient Data 审中-公开
    病人资料医疗实体提取

    公开(公告)号:US20080228769A1

    公开(公告)日:2008-09-18

    申请号:US12047416

    申请日:2008-03-13

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G16H50/20

    摘要: Members of a medical entity class are extracted from patient data. A semi-supervised approach uses one or more initial medical terms such as terms from an ontology, for a given category or medical canonical entity. A larger set of medical terms is extracted from the medical information. In one example, the extraction is performed using lexical surface form features, rather than syntactical parsing.

    摘要翻译: 医疗实体类的成员从患者数据中提取。 半监督方法使用一个或多个初始医学术语,例如来自本体的术语,对于给定类别或医学规范实体。 从医疗信息中提取更大的一组医学术语。 在一个示例中,使用词法表面形式特征来执行提取,而不是语法解析。