发明申请
US20120209844A1 EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM
有权
用于数据处理系统中信息提取的可扩展系统和方法
- 专利标题: EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM
- 专利标题(中): 用于数据处理系统中信息提取的可扩展系统和方法
-
申请号: US13413893申请日: 2012-03-07
-
公开(公告)号: US20120209844A1公开(公告)日: 2012-08-16
- 发明人: Yunyao Li , Frederick R. Reiss , David E. Simmen , Suresh Thalamati
- 申请人: Yunyao Li , Frederick R. Reiss , David E. Simmen , Suresh Thalamati
- 申请人地址: US NY Armonk
- 专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream.
公开/授权文献
信息查询