METHOD AND DEVICE FOR RECOGNIZING STOP WORD
    3.
    发明公开
    METHOD AND DEVICE FOR RECOGNIZING STOP WORD 审中-公开
    用于识别停止词的方法和装置

    公开(公告)号:EP3232336A1

    公开(公告)日:2017-10-18

    申请号:EP15909502.5

    申请日:2015-12-01

    IPC分类号: G06F17/30

    摘要: The present application relates to the field of computer technologies, and in particular, to a stop word identification method used in an information retrieval system. In a stop word identification method, after a first query input by a user is acquired, a second query that belongs to a same session as the first query is acquired, and a stop word in the first query is identified according to a change-based feature of each word in the first query relative to the second query. According to the solution provided by the present application, a stop word in a query can be identified more accurately, and efficiency and precision of an information retrieval system are improved.

    摘要翻译: 本发明涉及计算机技术领域,尤其涉及一种信息检索系统中使用的停用词识别方法。 在停用词识别方法中,获取用户输入的第一查询后,获取与第一查询属于同一会话的第二查询,根据基于变化的标识识别第一查询中的停用词 第一个查询中每个单词相对于第二个查询的特征。 根据本申请提供的方案,可以更准确地识别查询中的停用词,提高信息检索系统的效率和精度。

    EFFICIENT ENTITY DATA ATTRIBUTION
    4.
    发明公开
    EFFICIENT ENTITY DATA ATTRIBUTION 审中-公开
    有效的实体数据归属

    公开(公告)号:EP3176710A1

    公开(公告)日:2017-06-07

    申请号:EP16201733.9

    申请日:2016-12-01

    IPC分类号: G06F17/30

    摘要: Systems and methods for using disparate data sets to attribute data to an entity are disclosed. Disparate data sets can be obtained from a variety of data sources. The disclosed systems and methods can obtain a first and second data set. Trajectories can represent multiple data records in a data set associated with an entity. Trajectories from the obtained data sets can be used to associate data stored among the various data sets. The association can be based on the agreement between the trajectories. The associated data records can further be used to associate the entities related to the associated data records.

    摘要翻译: 公开了使用不同数据集来将数据归入实体的系统和方法。 不同的数据集可以从各种数据源获得。 所公开的系统和方法可以获得第一和第二数据集。 轨迹可以表示与实体关联的数据集中的多个数据记录。 来自所获得的数据集的轨迹可以用于关联存储在各种数据集中的数据。 该协会可以基于轨迹之间的协议。 关联的数据记录可以进一步用于关联与关联的数据记录相关的实体。

    ASCRIBING ACTIONABLE ATTRIBUTES TO DATA THAT DESCRIBES A PERSONAL IDENTITY
    5.
    发明公开
    ASCRIBING ACTIONABLE ATTRIBUTES TO DATA THAT DESCRIBES A PERSONAL IDENTITY 审中-公开
    ZUSCHREIBUNG AUSFHRBARER属性ZU DATEN ZUR BESCHREIBUNG DERIDENTITÄTEINER PERSON

    公开(公告)号:EP2558988A4

    公开(公告)日:2016-12-21

    申请号:EP11769597

    申请日:2011-04-14

    IPC分类号: G06Q10/00 G06F17/30

    CPC分类号: G06F17/30536 G06F17/30448

    摘要: There is provided a method that includes (a) receiving an inquiry to initiate a search for data for a specific individual, (b) determining, based on the inquiry, a strategy and flexible predictiveness equations to search a reference database, (c) searching the reference database, in accordance with the strategy, for a match to the inquiry; and (d) outputting the match. The method may also output flexible feedback related to the match that reflects inferred quality of the match experience which can be used by an end-user to determine the degree to which the matched entity meets that end-user's quality-based criteria. There is also provided a system that performs the method, and a storage medium that contains instructions that control a processor to perform the method.

    摘要翻译: 提供了一种方法,其包括(a)接收询问以启动对特定个人的数据的搜索,(b)基于询问确定搜索参考数据库的策略和灵活的预测性方程,(c)搜索 参考数据库,按照策略,进行匹配查询; 和(d)输出比赛。 该方法还可以输出与反映推测的匹配经验的质量的匹配相关的灵活反馈,其可以由终端用户使用以确定匹配实体满足最终用户基于质量的准则的程度。 还提供了执行该方法的系统和包含控制处理器执行该方法的指令的存储介质。

    DATA MINING METHOD
    6.
    发明公开
    DATA MINING METHOD 审中-公开
    DATENAUSWERTUNGSVERFAHREN

    公开(公告)号:EP3082051A1

    公开(公告)日:2016-10-19

    申请号:EP14869820.2

    申请日:2014-12-10

    IPC分类号: G06F17/30

    摘要: The present invention proposes a method for data mining, the method comprising: making statistics of the feature vectors of each target object according to the records in a target data set so as to constitute a rough data set, each of the feature vectors including the value of at least one attribute data of the target objects corresponding thereto; screening the feature vectors which correspond to all known the first type of target objects from the rough data set, and performing a filter operation onto the screened feature vectors to obtain samples; and building a regression model based on the samples, and then using the built regression model to determine whether each of all known the second type of target objects potentially belongs to the first type of target objects. The method for data mining disclosed in the present invention is capable of mining and classifying the target objects according to the comprehensive features of the target objects.

    摘要翻译: 本发明提出了一种数据挖掘方法,该方法包括:根据目标数据集中的记录,对每个目标对象的特征向量进行统计,构成粗略数据集,每个特征向量包括值 与其对应的目标对象的至少一个属性数据; 从粗略数据集筛选与所有已知的第一类型的目标对象相对应的特征向量,并对筛选的特征向量执行过滤操作以获得样本; 并基于样本构建回归模型,然后使用内建的回归模型来确定所有已知的第二类型的目标对象中的每一个是否潜在地属于第一类型的目标对象。 本发明公开的数据挖掘方法能够根据目标对象的综合特征挖掘和分类目标对象。

    LIGHTWEIGHT TABLE COMPARISON
    7.
    发明公开
    LIGHTWEIGHT TABLE COMPARISON 审中-公开
    轻量表比较

    公开(公告)号:EP3070620A1

    公开(公告)日:2016-09-21

    申请号:EP16159212.6

    申请日:2016-03-08

    IPC分类号: G06F17/30

    摘要: A system, method and computer program product for enabling light weight table comparison with high-accuracy (high confidence) of tables where one is a copy of the other, which copy may be maintained synchronized by replication. The method performs database comparison using a sample-based, statistics-based, or materialized query tables-based approaches. The method first identifies a block comprising a sub-set of rows of data of a source database table and a corresponding block from a target database table, and obtains a statistical value associated with each block. Then the statistical values for the corresponding source and target block are compared and a consistency evaluation of source and target database is determined based on comparing results. Further methods enable a determination of the data as being persistent or not in manner that accounts for real-time data modifications to underlying source and target database tables while identified blocks are being compared.

    摘要翻译: 一种系统,方法和计算机程序产品,用于实现具有高准确度(高置信度)的表格的轻量级表格比较,其中一个是另一个的副本,该副本可以通过复制来保持同步。 该方法使用基于样本的,基于统计的或基于物化查询表的方法执行数据库比较。 该方法首先从目标数据库表中识别包括源数据库表和相应块的数据行的子集的块,并且获得与每个块相关联的统计值。 然后比较相应的源块和目标块的统计值,并根据比较结果确定源和目标数据库的一致性评估。 进一步的方法能够确定数据是持久的还是不以考虑对正在比较的标识块进行实时数据修改的方式进行对底层源和目标数据库表的确定。

    ASSOCIATING RELATED RECORDS TO COMMON ENTITIES ACROSS MULTIPLE LISTS
    8.
    发明公开
    ASSOCIATING RELATED RECORDS TO COMMON ENTITIES ACROSS MULTIPLE LISTS 审中-公开
    ZUORDNUNG VERWANDTERDATENSÄTZEZU GEMEINSAMENENTITÄTENÜBERMEHRERE LISTEN

    公开(公告)号:EP3035214A1

    公开(公告)日:2016-06-22

    申请号:EP15200073.3

    申请日:2015-12-15

    IPC分类号: G06F17/30

    摘要: In relation to associating records across lists, wherein the lists include a plurality of records and the plurality of records is associated with a respective entity, a system and method are provided. In accordance with some embodiments, the systems and methods further comprise grouping one or more records from a first list into a first group based on fields of the records in the first list, grouping one or more records from a second list into a second group based on fields of the records in the second list, pairing a record from the first group with a record from the second group, assessing each pair of records based on an evaluation of the respective pair according to fields of the pair, and associating records from the first group and records of the second group with an entity based on the assessment.

    摘要翻译: 关于在列表之间关联记录,其中列表包括多个记录,并且多个记录与相应实体相关联,提供了系统和方法。 根据一些实施例,系统和方法还包括基于第一列表中的记录的字段将来自第一列表的一个或多个记录分组成第一组,将来自第二列表的一个或多个记录分组为基于第二组的第二组 在第二列表中的记录的字段上,将来自第一组的记录与来自第二组的记录配对,根据对中的字段对相应对的评估来评估每对记录,并将来自 第一组和第二组与根据评估的实体的记录。

    SYSTEMS AND METHODS FOR ANONYMIZED USER LIST COUNTS
    10.
    发明公开
    SYSTEMS AND METHODS FOR ANONYMIZED USER LIST COUNTS 审中-公开
    系统公司VERFAHREN ZUR ANONYMISITENTENBENUTZERLISTENZÄHLUNG

    公开(公告)号:EP2930646A2

    公开(公告)日:2015-10-14

    申请号:EP15161869.1

    申请日:2015-03-31

    IPC分类号: G06F21/62

    摘要: A computer system includes a database configured to receive a query and to produce a list of User IDs and an anonymization module. The anonymization module is configured to receive a list of user IDs in response to a query, the list of user IDs defining a true user count, generate a noisy user count of the list of user IDs, compare the true user count to a first threshold value stored in memory, compare the noisy user count to a second threshold value stored in memory, and output the noisy user count only if the true user count is greater than the first threshold value and the noisy user count is greater then the second threshold.

    摘要翻译: 计算机系统包括被配置为接收查询并产生用户ID列表和匿名模块的数据库。 匿名化模块被配置为响应于查询接收用户ID的列表,定义真实用户计数的用户ID列表,生成用户ID列表的噪声用户计数,将真实用户计数与第一阈值进行比较 存储在存储器中的值,将噪声用户计数与存储在存储器中的第二阈值进行比较,并且仅当真实用户计数大于第一阈值且噪声用户计数大于第二阈值时才输出噪声用户计数。