Isolating desired content, metadata, or both from social media

发明授权

US08239425B1 Isolating desired content, metadata, or both from social media 有权

标题翻译：从社交媒体隔离所需的内容，元数据或两者

请登陆查看更多内容

专利标题： Isolating desired content, metadata, or both from social media
专利标题（中）： 从社交媒体隔离所需的内容，元数据或两者
申请号： US13036776

申请日： 2011-02-28
公开(公告)号： US08239425B1

公开(公告)日： 2012-08-07
发明人: Eric B. Bell , Shawn J. Bohn , Andrew J. Cowell , Michelle L. Gregory , Eric J. Marshall , Deborah A. Payne
申请人： Eric B. Bell , Shawn J. Bohn , Andrew J. Cowell , Michelle L. Gregory , Eric J. Marshall , Deborah A. Payne
申请人地址： US WA Richland
专利权人： Battelle Memorial Institute
当前专利权人： Battelle Memorial Institute
当前专利权人地址： US WA Richland
代理商 Allan C. Tuan
主分类号： G06F7/00
IPC分类号： G06F7/00

Isolating desired content, metadata, or both from social media

摘要：

Desired content, metadata, or both can be isolated from the full content of social media websites having content-rich pages. Achieving this can include obtaining from the content-rich pages a language-independent representation having a hierarchical structure of nodes and then generating a node representation for each node. Feature vectors for the nodes are generated and a label is assigned to each node representation according to a schema. Assignment can occur by executing a trained classification algorithm on the feature vectors. The schema has schema elements and each schema element corresponds to a label. For each schema element, all node representations having matching labels are gathered and then one node representation is elected from among those with matching labels to be assigned to a schema element field in a template. The template can be applied to extract desired content, metadata, or both according to the schema from all the content-rich pages.

摘要（中）：

期望的内容，元数据或两者都可以从具有内容丰富的网页的社交媒体网站的完整内容中隔离开来。实现这一点可以包括从内容丰富的页面获得具有节点的分层结构然后为每个节点生成节点表示的独立于语言的表示。生成节点的特征向量，并根据模式将标签分配给每个节点表示。可以通过对特征向量执行经过训练的分类算法来进行分配。模式具有模式元素，每个模式元素对应于一个标签。对于每个模式元素，收集具有匹配标签的所有节点表示，然后从具有匹配标签的那些中选择一个节点表示，以将其分配给模板中的模式元素字段。该模板可以应用于根据所有富含内容的页面的模式提取所需内容，元数据或二者。

公开/授权文献

US20120221545A1 ISOLATING DESIRED CONTENT, METADATA, OR BOTH FROM SOCIAL MEDIA 公开/授权日：2012-08-30

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）