TY - JOUR
T1 - Detecting Unbiased Associations in Large Data Sets
AU - Liu, Chuanlu
AU - Wang, Shuliang
AU - Yuan, Hanning
AU - Liu, Xiaojia
N1 - Publisher Copyright:
© 2022, Mary Ann Liebert, Inc., publishers 2022.
PY - 2022/8/1
Y1 - 2022/8/1
N2 - Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.
AB - Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.
KW - characteristic matrix
KW - large data set
KW - maximal information coefficient (MIC)
KW - relationship overestimation
KW - unbiased associations
KW - weighted information coefficient mean (WICM)
UR - http://www.scopus.com/pages/publications/85127552380
U2 - 10.1089/big.2021.0193
DO - 10.1089/big.2021.0193
M3 - Article
C2 - 34936492
AN - SCOPUS:85127552380
SN - 2167-6461
VL - 10
SP - 337
EP - 355
JO - Big Data
JF - Big Data
IS - 4
ER -