METHOD FOR DETERMINING INTERACTION SITES BETWEEN BIOSEQUENCES

    公开(公告)号:US20210304842A1

    公开(公告)日:2021-09-30

    申请号:US17345699

    申请日:2021-06-11

    IPC分类号: G16B20/30 G16B20/00 G16B40/00

    摘要: A method and system for determining interaction sites between biosequences is described herein. A dataset of contact data for a plurality of biomolecule pairs is obtained to account their frequency of occurrence. Statistical weights are obtained for each frequency of occurrence. A statistical vector space (SRV) is decomposed through principal component decomposition. The r-vectors of the SRV are re-projected back to a new SRV with a new set of SR coordinates. A feature vector is generated and inputted into a predictor for outputting a likelihood of an interaction site. A method and system for determining significant attribute-value associations (AVAs) from relational datasets is also described. A frequency of occurrence of attribute value pairs and statistical weights may be obtained for each frequency of occurrence. Principal component decomposition and re-projection of AVA vectors may also be performed. The disentangle SR of AVAs could be used to identify AVA related to subgroups/classes.

    SYSTEM AND METHOD FOR DETERMINING DATA PATTERNS USING DATA MINING

    公开(公告)号:US20200301949A1

    公开(公告)日:2020-09-24

    申请号:US16823627

    申请日:2020-03-19

    IPC分类号: G06F16/28 G06K9/62

    摘要: A system and method for processing relational datasets are provided, the method may include: retrieving a relational dataset containing a plurality of entities and a plurality of attribute values; constructing an entity address table, based on the relational dataset, wherein the entity address table contains the plurality of attribute values, and each of the plurality of attribute values is associated with one or more entity addresses in the relational dataset; generating a frequency table, based on the entity address table, wherein the frequency table contains one or more cardinality values; generating a SR vector space table comprising a plurality of SR values for the plurality of a pair of attribute values; generating PCs and their corresponding RSRVs through disentangling SRV into a plurality of disentangled spaces (DS); selecting from the plurality of DS, a subset of DS; and generating one or more patterns based on the plurality of DS.