-
公开(公告)号:US08832549B2
公开(公告)日:2014-09-09
申请号:US12479850
申请日:2009-06-07
CPC分类号: G06F17/2294 , G06F17/21 , G06F17/211 , G06F17/212 , G06F17/218 , G06F17/2217 , G06F17/2247 , G06F17/243 , G06F17/248 , G06F17/2705 , G06F17/28 , G06F17/30011 , G06K9/00456 , G06K9/00463
摘要: Some embodiments provide a for analyzing a document that includes a number of primitive elements. The method identifies boundaries between sets of primitive elements and identifies regions bounded by the boundaries. The method uses the identified regions to define structural elements for the document. The method defines a structured document based on the primitive elements and the structural elements.
摘要翻译: 一些实施例提供用于分析包括多个基元的文档。 该方法识别原始元素组之间的边界,并识别由边界界定的区域。 该方法使用识别的区域来定义文档的结构元素。 该方法基于原始元素和结构元素定义结构化文档。
-
公开(公告)号:US08549399B2
公开(公告)日:2013-10-01
申请号:US13109918
申请日:2011-05-17
IPC分类号: G06F17/27
CPC分类号: G06F17/2229 , G06F17/212 , G06F17/2241
摘要: For a document with content that has been structured into a set primitive areas, a novel method for performing contiguous selection of document content across different primitive areas in the document is disclosed. The method defines a contiguous section in the ordered list by identifying the first and last primitive elements of the contiguous selection. The first primitive element is identified as the primitive element that is closest in reading flow to a start selection point on the page, while the last primitive element is identified as the primitive element that is closest in reading flow to an end selection point on the page.
摘要翻译: 对于具有已经被构造成集合原始区域的内容的文档,公开了一种用于在文档中的不同原始区域执行连续选择文档内容的新颖方法。 该方法通过识别连续选择的第一个和最后一个原始元素来定义有序列表中的连续部分。 第一个原始元素被识别为在页面上的开始选择点读取流中最接近的原始元素,而最后一个元素被识别为在页面中的最终选择点的读取流中最接近的元素元素 。
-
公开(公告)号:US20120185491A1
公开(公告)日:2012-07-19
申请号:US13106806
申请日:2011-05-12
IPC分类号: G06F17/30
CPC分类号: G06F17/2241 , G06F17/2745 , G06F17/30
摘要: Some embodiments provide a method for analyzing a document that includes several primitive elements. The method identifies that a set of primitive elements include an implicit list in the document based on location and appearance of the set of primitive elements. The method defines the identified implicit list as an explicit list. The method stores the explicit list as a structure associated with the document.
摘要翻译: 一些实施例提供了一种用于分析包含若干基元的文档的方法。 该方法基于原始元素集合的位置和外观来识别一组原始元素包括文档中的隐式列表。 该方法将识别的隐式列表定义为显式列表。 该方法将显式列表存储为与文档相关联的结构。
-
公开(公告)号:US20120182317A1
公开(公告)日:2012-07-19
申请号:US13106803
申请日:2011-05-12
IPC分类号: G09G5/00
CPC分类号: G06T3/00 , G06T3/0006
摘要: Some embodiments provide a method that defines a group of associated graphic objects for display on a display device. The method defines a set of operations to perform on the associated graphic objects in a particular order. The operations include one or more transforms applied to at least one of the graphic objects. For each particular transform applied to a set of the graphic objects, each graphic object in the set has a set of parameters indicating whether the graphic object is affected by each of a set of primitive transforms of the particular transform. The method stores the set of associated graphic objects and set of operations as a single graphic object.
摘要翻译: 一些实施例提供了定义用于在显示设备上显示的一组相关联的图形对象的方法。 该方法定义了以特定顺序对关联的图形对象执行的一组操作。 操作包括应用于至少一个图形对象的一个或多个变换。 对于应用于一组图形对象的每个特定变换,集合中的每个图形对象具有指示图形对象是否受特定变换的一组原始变换中的每一个影响的一组参数。 该方法将一组关联的图形对象和一组操作存储为单个图形对象。
-
公开(公告)号:US20100174978A1
公开(公告)日:2010-07-08
申请号:US12479847
申请日:2009-06-07
IPC分类号: G06F17/00
CPC分类号: G06F17/2294 , G06F17/21 , G06F17/211 , G06F17/212 , G06F17/218 , G06F17/2217 , G06F17/2247 , G06F17/243 , G06F17/248 , G06F17/2705 , G06F17/28 , G06F17/30011 , G06K9/00456 , G06K9/00463
摘要: Some embodiments provide a method for analyzing an unstructured document that includes a number of words. Each word is an associated set of glyphs and each glyph has location coordinates. The method identifies clusters of words based on the location coordinates. Based on the identified clusters, the method defines a set of boundary elements for the glyphs that identify a set of borders for the glyphs. The method defines a structured document for the unstructured document based on the glyphs and the defined boundary elements. To identify clusters of words, the method orders the location coordinates and identifies several partitions of the location coordinates. Each partition specifies a particular grouping of the coordinates into subsets. For each partition, the method identifies a particular set of subsets of location values that satisfy a particular set of constraints and determines a set of subsets of location values that optimizes a particular measure.
摘要翻译: 一些实施例提供了一种用于分析包括多个单词的非结构化文档的方法。 每个单词都是一组关联的字形,每个字形都具有位置坐标。 该方法基于位置坐标来识别词群。 基于所识别的集群,该方法定义了用于标识字形的一组边框的字形的一组边界元素。 该方法基于字形和定义的边界元素定义非结构化文档的结构化文档。 为了识别单词群集,该方法命令位置坐标并标识位置坐标的几个分区。 每个分区将坐标的特定分组指定为子集。 对于每个分区,该方法识别满足特定的约束集合的位置值子集的特定集合,并且确定优化特定度量的位置值子集的集合。
-
公开(公告)号:US20100174975A1
公开(公告)日:2010-07-08
申请号:US12479848
申请日:2009-06-07
CPC分类号: G06F17/2294 , G06F17/21 , G06F17/211 , G06F17/212 , G06F17/218 , G06F17/2217 , G06F17/2247 , G06F17/243 , G06F17/248 , G06F17/2705 , G06F17/28 , G06F17/30011 , G06K9/00456 , G06K9/00463
摘要: Some embodiments provide a method for analyzing an unstructured document that includes a number of glyphs. The method identifies boundaries between sets of glyphs. The method identifies that several of the boundaries form a table. The method defines a tabular structural element based on the table. The tabular structural element includes several cells arranged in a plurality of rows and columns, each of which includes an associated set of glyphs.
摘要翻译: 一些实施例提供了一种用于分析包括多个字形的非结构化文档的方法。 该方法识别字形集之间的边界。 该方法识别出几个边界形成一个表。 该方法基于该表定义了一个表格结构元素。 表格结构元素包括布置在多个行和列中的几个单元格,每个列和列都包括一组关联的字形。
-
公开(公告)号:US08543911B2
公开(公告)日:2013-09-24
申请号:US13109921
申请日:2011-05-17
IPC分类号: G06F17/00
CPC分类号: G06F17/2229 , G06F17/212 , G06F17/2241
摘要: For a page that has been decomposed into a set of primitive areas, a novel method for organizing the set of primitive areas into an ordered list is disclosed. The primitive areas in the ordered list are initially sorted using start point order relation ordering, which compares the start points of the primitive areas in the coordinate system of the page. The ordering of the primitive areas in the ordered list are then refined by using contextual order relation ordering, which compares primitive areas against each other according to coordinate systems local to the primitive areas being compared. A new ordered list is then created by transposing primitive areas that are incorrectly ordered according to contextual order relation ordering.
摘要翻译: 对于已被分解为一组原始区域的页面,公开了一种用于将一组原始区域组织成有序列表的新颖方法。 有序列表中的原始区域最初使用起点顺序关系排序进行排序,该排序比较了页面坐标系中原始区域的起始点。 然后通过使用上下文顺序关系排序来改进有序列表中的原始区域的顺序,该顺序关系排序根据被比较的原始区域的本地坐标系将原始区域相互比较。 然后通过根据上下文顺序关系排序来转置未正确排序的原始区域来创建新的有序列表。
-
公开(公告)号:US08473467B2
公开(公告)日:2013-06-25
申请号:US12479852
申请日:2009-06-07
IPC分类号: G06F17/30
CPC分类号: G06F17/2294 , G06F17/21 , G06F17/211 , G06F17/212 , G06F17/218 , G06F17/2217 , G06F17/2247 , G06F17/243 , G06F17/248 , G06F17/2705 , G06F17/28 , G06F17/30011 , G06K9/00456 , G06K9/00463
摘要: Some embodiments provide a method that receives an unstructured document including a number of primitive elements. The method identifies a default set of document reconstruction operations for reconstructing the unstructured document to define a structured document. The method performs at least one of the document reconstruction operations from the default set. Based on results of the performed document reconstruction operations, the method identifies a profile for the unstructured document. The method modifies the set of document reconstruction operations for reconstructing the unstructured document according to the identified profile.
摘要翻译: 一些实施例提供了一种接收包括多个原始元素的非结构化文档的方法。 该方法识别用于重建非结构化文档以定义结构化文档的默认文档重建操作集合。 该方法从默认集执行至少一个文档重建操作。 基于执行的文档重建操作的结果,该方法识别非结构化文档的简档。 该方法根据所识别的简档修改用于重构非结构化文档的文档重建操作的集合。
-
公开(公告)号:US08442998B2
公开(公告)日:2013-05-14
申请号:US13106813
申请日:2011-05-12
IPC分类号: G06F17/00
CPC分类号: G06F17/30292 , G06F17/218 , G06F17/2217 , G06F17/2247
摘要: Some embodiments provide a method for storing a document. The method stores a content stream representation of the document that includes an ordered stream of code representations for primitive elements of the document. Each code representation of a primitive element has an index that indicates the order in the content stream of the primitive element representation. The method stores an object representation of the document that includes a set of object nodes arranged in a tree structure. Each object node references a range of indices in the content stream.
-
公开(公告)号:US08352855B2
公开(公告)日:2013-01-08
申请号:US12479845
申请日:2009-06-07
IPC分类号: G06F17/00
CPC分类号: G06F17/2294 , G06F17/21 , G06F17/211 , G06F17/212 , G06F17/218 , G06F17/2217 , G06F17/2247 , G06F17/243 , G06F17/248 , G06F17/2705 , G06F17/28 , G06F17/30011 , G06K9/00456 , G06K9/00463
摘要: Some embodiments provide a method for defining a selection of text in an unstructured document that includes a number of glyphs. The method identifies associated sets of glyphs and a reading order that specifies a flow of reading through the glyphs. The method displays the document. The method receives a start point and end point for a selection of text within the displayed document. The method defines a selection of text from the start point to the end point by using the identified sets of glyphs and intended flow of reading.
摘要翻译: 一些实施例提供了一种用于定义包括多个字形的非结构化文档中的文本选择的方法。 该方法识别关联的字形集合和读取顺序,其指定通过字形读取的流程。 该方法显示文档。 该方法接收所显示文档中文本选择的起始点和终点。 该方法通过使用识别的字形集合和预期的读取流定义了从起始点到终点的文本选择。
-
-
-
-
-
-
-
-
-