Detecting Unbiased Associations in Large Data Sets

Chuanlu Liu, Shuliang Wang*, Hanning Yuan, Xiaojia Liu

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

7 引用 (Scopus)

摘要

Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.

源语言英语
页(从-至)337-355
页数19
期刊Big Data
10
4
DOI
出版状态已出版 - 1 8月 2022

指纹

探究 'Detecting Unbiased Associations in Large Data Sets' 的科研主题。它们共同构成独一无二的指纹。

引用此