Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace
    1.
    发明授权
    Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace 有权
    用于信息检索和利用多维子空间的文档分类的方法,装置和计算机程序产品

    公开(公告)号:US06701305B1

    公开(公告)日:2004-03-02

    申请号:US09693114

    申请日:2000-10-20

    IPC分类号: G06F1700

    摘要: Methods, apparatus and computer program products are provided for retrieving information from a text data collection and for classifying a document into none, one or more of a plurality of predefined classes. In each aspect, a representation of at least a portion of the original matrix is projected into a lower dimensional subspace and those portions of the subspace representation that relate to the term(s) of the query are weighted following the projection into the lower dimensional subspace. In order to retrieve the documents that are most relevant with respect to a query, the documents are then scored with documents having better scores being of generally greater relevance. Alternatively, in order to classify a document, the relationship of the document to the classes of documents is scored with the document then being classified in those classes, if any, that have the best scores.

    摘要翻译: 提供了方法,装置和计算机程序产品,用于从文本数据收集中检索信息,并将文档分类为多个预定类别中的一个或多个。 在每个方面,原始矩阵的至少一部分的表示被投影到较低维子空间中,并且与查询的项相关的子空间表示的那些部分被加权后跟随投影到较低维子空间中 。 为了检索与查询最相关的文档,然后使用具有更好分数的文档具有更大的相关性的文档进行评分。 或者,为了对文档进行分类,将文档与文档类的关系进行评分,然后将文档分类为具有最佳分数的那些类别(如果有的话)。

    Spatial data compression using implicit geometry
    3.
    发明授权
    Spatial data compression using implicit geometry 有权
    使用隐式几何空间数据压缩

    公开(公告)号:US08525835B1

    公开(公告)日:2013-09-03

    申请号:US12711931

    申请日:2010-02-24

    IPC分类号: G06T15/10

    摘要: A plurality of data from a first coordinate system is transformed into a plurality of metadata, each metadata comprising a location identifier and a value summarizing an amount of data points in the first coordinate system associated with a corresponding location in the second coordinate identified by the location identifier. A metadata is formed only when a non-zero value is assigned to a location.

    摘要翻译: 将来自第一坐标系的多个数据变换为多个元数据,每个元数据包括位置标识符和总结第一坐标系中的数据点的数量的值,该数据点与由该位置识别的第二坐标中的对应位置相关联 标识符 仅当将非零值分配给某个位置时才形成元数据。

    Text summarization method and apparatus using a multidimensional subspace
    4.
    发明授权
    Text summarization method and apparatus using a multidimensional subspace 有权
    使用多维子空间的文本摘要方法和装置

    公开(公告)号:US07831597B2

    公开(公告)日:2010-11-09

    申请号:US11417196

    申请日:2006-05-04

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30719 Y10S707/917

    摘要: A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.

    摘要翻译: 文本摘要器识别文档中的相关术语,对术语加权并提取一个或多个片段以生成摘要或摘要。 特定的各种术语相对于现有的文档集进行加权。 术语权重计算机计算文档中的术语的术语权重,并且阈值比较器比较术语权重以确定相应的术语是否与文档集合相关。 接下来,术语权重加法器在文档的各个段中将每个相关项的每次出现的项权重相加,并且求和比较器比较求和以识别表示文档的文本摘要段。 可选地,相关术语可以在术语汇总段中突出显示。

    Sub-wavelength ultrasound characterization of composite material
    5.
    发明授权
    Sub-wavelength ultrasound characterization of composite material 有权
    亚波长超声表征复合材料

    公开(公告)号:US07584062B1

    公开(公告)日:2009-09-01

    申请号:US12143662

    申请日:2008-06-20

    IPC分类号: G01B5/02

    摘要: An ultrasonic stimulus pulse is emitted incident to a laminar structure and recorded as pulse data. Echoes resulting from the stimulus pulse are recorded as echo data. One or more vectors are derived by way of time-shifting the recorded pulse data by respective amounts and a matrix Φ is defined including the one or more vectors. An echo vector Y is defined using the recorded echo data. A solution vector X is determined in accordance with: Y=Φ*X, typically within a predetermined tolerance. B-scan display or other analysis of one or more distinct solution vectors enables user and/or automated identification and measurement of any anomalies within the laminate material.

    摘要翻译: 超声波激发脉冲被发射入层流结构并记录为脉冲数据。 由刺激脉冲产生的回波记录为回波数据。 一个或多个向量是通过以相应的量对所记录的脉冲数据进行时移的导出的,并且定义包括一个或多个向量的矩阵Phi。 使用所记录的回波数据来定义回波矢量Y. 根据Y = Phi * X确定解向量X,通常在预定的公差范围内。 一个或多个不同解决方案的B扫描显示或其他分析使得用户和/或自动识别和测量层压材料内的任何异常。

    Text summarization method and apparatus using a multidimensional subspace
    6.
    发明申请
    Text summarization method and apparatus using a multidimensional subspace 有权
    使用多维子空间的文本摘要方法和装置

    公开(公告)号:US20070118518A1

    公开(公告)日:2007-05-24

    申请号:US11417196

    申请日:2006-05-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30719 Y10S707/917

    摘要: A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.

    摘要翻译: 文本摘要器识别文档中的相关术语,对术语加权并提取一个或多个片段以生成摘要或摘要。 特定的各种术语相对于现有的文档集进行加权。 术语权重计算机计算文档中的术语的术语权重,并且阈值比较器比较术语权重以确定相应的术语是否与文档集合相关。 接下来,术语权重加法器在文档的各个段中将每个相关项的每次出现的项权重添加,并且求和比较器比较求和以识别表示文档的文本摘要段。 可选地,相关术语可以在术语汇总段中突出显示。

    Query-based text summarization
    7.
    发明授权
    Query-based text summarization 有权
    基于查询的文本摘要

    公开(公告)号:US07752204B2

    公开(公告)日:2010-07-06

    申请号:US11281499

    申请日:2005-11-18

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30719 Y10S707/917

    摘要: A text summarizer identifies relevant terms in a document, weights the terms and extracts one or more segments to produce a summary or abstract. The various terms in a particular are weighted in relation to an existing document collection. A term weight computer computes term weights for terms in the document, and a threshold comparator compares the term weights to determine if the corresponding terms are relevant to the document collection. Next, a term weight summer adds the term weights for each occurrence of each relevant term in the various segments of the document, and a summation comparator compares the summations to identify a text summarization segment representative of the document. Optionally, relevant terms can be highlighted in the term summarization segment.

    摘要翻译: 文本摘要器识别文档中的相关术语,对术语加权并提取一个或多个片段以生成摘要或摘要。 特定的各种术语相对于现有的文档集进行加权。 术语权重计算机计算文档中的术语的术语权重,并且阈值比较器比较术语权重以确定相应的术语是否与文档集合相关。 接下来,术语权重加法器在文档的各个段中将每个相关项的每次出现的项权重相加,并且求和比较器比较求和以识别表示文档的文本摘要段。 可选地,相关术语可以在术语汇总段中突出显示。

    Text differentiation methods, systems, and computer program products for content analysis
    8.
    发明授权
    Text differentiation methods, systems, and computer program products for content analysis 有权
    文本分类方法,系统和计算机程序产品进行内容分析

    公开(公告)号:US07403932B2

    公开(公告)日:2008-07-22

    申请号:US11173600

    申请日:2005-07-01

    IPC分类号: G06N5/00

    CPC分类号: G06F17/2211 G06F17/30719

    摘要: Provided are improved methods, apparatus, and computer program products for text differentiation which involves identifying differences between documents with similar content, not merely similar terms, and generating results. Text differentiation provides the ability to find non-similar, or different, content hidden within documents with similar overall content, but not exactly the same content. Text differentiation may be used to quickly identify key differences between similar documents.

    摘要翻译: 提供了用于文本区分的改进的方法,装置和计算机程序产品,其涉及识别具有相似内容的文档之间的差异,而不仅仅是类似的术语,并且产生结果。 文本区分提供了找到隐藏在具有类似总体内容但不完全相同的内容的文档内的不相似或不同的内容的能力。 文本差异可能用于快速识别类似文档之间的关键差异。

    Streaming text data mining method & apparatus using multidimensional subspaces
    9.
    发明申请
    Streaming text data mining method & apparatus using multidimensional subspaces 有权
    使用多维子空间的流文本数据挖掘方法和装置

    公开(公告)号:US20070083509A1

    公开(公告)日:2007-04-12

    申请号:US11246195

    申请日:2005-10-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705 G06F17/30616

    摘要: A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.

    摘要翻译: 流文本数据比较器在流文本数据上执行实时文本数据挖掘。 比较器接收流文本数据文档并生成与现有文档集合相关的术语频率的向量表示。 比较器然后将术语频率矢量转换成表示原始文档集合的预计算多维子空间中的投影。 比较器还计算表示向量表示和子空间之间的相似性或差异的关系值,并将关系值与预定阈值进行比较,以确定流文本数据文档是否与原始文档集合相关。 如果流文本数据文档相关,则流文本数据比较器将新文档插入到文档集合中。 如果新文档不相关,则比较器可以存储或删除不相关的文档。

    Text differentiation methods, systems, and computer program products for content analysis
    10.
    发明申请
    Text differentiation methods, systems, and computer program products for content analysis 有权
    文本分类方法,系统和计算机程序产品进行内容分析

    公开(公告)号:US20070022072A1

    公开(公告)日:2007-01-25

    申请号:US11173600

    申请日:2005-07-01

    IPC分类号: G06N5/00 G06F17/00

    CPC分类号: G06F17/2211 G06F17/30719

    摘要: Provided are improved methods, apparatus, and computer program products for text differentiation which involves identifying differences between documents with similar content, not merely similar terms, and generating results. Text differentiation provides the ability to find non-similar, or different, content hidden within documents with similar overall content, but not exactly the same content. Text differentiation may be used to quickly identify key differences between similar documents.

    摘要翻译: 提供了用于文本区分的改进的方法,装置和计算机程序产品,其涉及识别具有相似内容的文档之间的差异,而不仅仅是类似的术语,并且产生结果。 文本区分提供了找到隐藏在具有类似总体内容但不完全相同的内容的文档内的不相似或不同的内容的能力。 文本差异可能用于快速识别类似文档之间的关键差异。