METHOD FOR DETERMINING INTERACTION SITES BETWEEN BIOSEQUENCES

    公开(公告)号:US20210304842A1

    公开(公告)日:2021-09-30

    申请号:US17345699

    申请日:2021-06-11

    IPC分类号: G16B20/30 G16B20/00 G16B40/00

    摘要: A method and system for determining interaction sites between biosequences is described herein. A dataset of contact data for a plurality of biomolecule pairs is obtained to account their frequency of occurrence. Statistical weights are obtained for each frequency of occurrence. A statistical vector space (SRV) is decomposed through principal component decomposition. The r-vectors of the SRV are re-projected back to a new SRV with a new set of SR coordinates. A feature vector is generated and inputted into a predictor for outputting a likelihood of an interaction site. A method and system for determining significant attribute-value associations (AVAs) from relational datasets is also described. A frequency of occurrence of attribute value pairs and statistical weights may be obtained for each frequency of occurrence. Principal component decomposition and re-projection of AVA vectors may also be performed. The disentangle SR of AVAs could be used to identify AVA related to subgroups/classes.

    SYSTEM AND METHOD FOR DETERMINING DATA PATTERNS USING DATA MINING

    公开(公告)号:US20200301949A1

    公开(公告)日:2020-09-24

    申请号:US16823627

    申请日:2020-03-19

    IPC分类号: G06F16/28 G06K9/62

    摘要: A system and method for processing relational datasets are provided, the method may include: retrieving a relational dataset containing a plurality of entities and a plurality of attribute values; constructing an entity address table, based on the relational dataset, wherein the entity address table contains the plurality of attribute values, and each of the plurality of attribute values is associated with one or more entity addresses in the relational dataset; generating a frequency table, based on the entity address table, wherein the frequency table contains one or more cardinality values; generating a SR vector space table comprising a plurality of SR values for the plurality of a pair of attribute values; generating PCs and their corresponding RSRVs through disentangling SRV into a plurality of disentangled spaces (DS); selecting from the plurality of DS, a subset of DS; and generating one or more patterns based on the plurality of DS.

    Method for determining interaction sites between biosequences

    公开(公告)号:US11923047B2

    公开(公告)日:2024-03-05

    申请号:US17345699

    申请日:2021-06-11

    摘要: A method and system for determining interaction sites between biosequences is described herein. A dataset of contact data for a plurality of biomolecule pairs is obtained to account their frequency of occurrence. Statistical weights are obtained for each frequency of occurrence. A statistical vector space (SRV) is decomposed through principal component decomposition. The r-vectors of the SRV are re-projected back to a new SRV with a new set of SR coordinates. A feature vector is generated and inputted into a predictor for outputting a likelihood of an interaction site. A method and system for determining significant attribute-value associations (AVAs) from relational datasets is also described. A frequency of occurrence of attribute value pairs and statistical weights may be obtained for each frequency of occurrence. Principal component decomposition and re-projection of AVA vectors may also be performed. The disentangle SR of AVAs could be used to identify AVA related to subgroups/classes.

    Aligning and clustering sequence patterns to reveal classificatory functionality of sequences

    公开(公告)号:US10354745B2

    公开(公告)日:2019-07-16

    申请号:US14784978

    申请日:2014-04-17

    摘要: A system and method of discovering sequence patterns with variations is provided. The method includes: accessing or acquiring a data set including a family of sequences or related families of sequences; a) applying a pattern discovery process to the sequences; b) grouping and aligning the similar patterns that may have different lengths into one or more Aligned Pattern Clusters; c) discovering the co-occurrence relation between Aligned Patterns and/or Aligned Pattern Clusters to reveal the distal function between segments represented by the aligned Pattern Clusters and d) breaking down an Aligned Pattern Cluster into sub-clusters with stable cluster configuration that reveals sub-clusters with distinct and shared characteristic among sub-family of the sequences.

    ALIGNING AND CLUSTERING SEQUENCE PATTERNS TO REVEAL CLASSIFICATORY FUNCTIONALITY OF SEQUENCES
    7.
    发明申请
    ALIGNING AND CLUSTERING SEQUENCE PATTERNS TO REVEAL CLASSIFICATORY FUNCTIONALITY OF SEQUENCES 审中-公开
    排序和聚类序列模式来显示序列的分类功能

    公开(公告)号:US20160070854A1

    公开(公告)日:2016-03-10

    申请号:US14784978

    申请日:2014-04-17

    IPC分类号: G06F19/22 G06F19/24

    CPC分类号: G16B30/00 G16B40/00

    摘要: A system and method of discovering sequence patterns with variations is provided. The method includes: accessing or acquiring a data set including a family of sequences or related families of sequences; a) applying a pattern discovery process to the sequences; b) grouping and aligning the similar patterns that may have different lengths into one or more Aligned Pattern Clusters; c) discovering the co-occurrence relation between Aligned Patterns and/or Aligned Pattern Clusters to reveal the distal function between segments represented by the aligned Pattern Clusters and d) breaking down an Aligned Pattern Cluster into sub-clusters with stable cluster configuration that reveals sub-clusters with distinct and shared characteristic among sub-family of the sequences.

    摘要翻译: 提供了一种发现具有变化的序列模式的系统和方法。 该方法包括:访问或获取包括序列族或相关族序列的数据集; a)对序列应用模式发现过程; b)将可能具有不同长度的相似图案分组和对准到一个或多个对齐图案簇; c)发现对齐模式和/或对齐模式集群之间的同现关系,以揭示由对齐的模式集群表示的分段之间的远端功能,以及d)将对齐模式集群分解成具有稳定集群配置的子集群,其显示子 - 在序列的子系列中具有不同和共享特征的群集。