View on GitHub

HHG_large_sample_framework

Code base and framework for the simulations and results of the large sample HHG paper

Simulation framwork for non parametric independence testing for large sample sizes, using package ‘HHG’

This Git repository houses the code and framework for the simulations and results of the paper regarding the nonparametric test of independence given in the function Fast.independence.test(…). This test is computationally efficient for large sample sizes.

Directory structure

The Script files are as follows:

Other files in the directory are the simulation results and graphs

Reproducability

All simulation results and graphs used in producing the paper, are available in this repository. Simulation flags (in each file) are configured such that if scripts are run, long simulations (taking hours/ days) are loaded from memory (and not rerun), All graphs and figures are fully reproduced. To rerun simualtions, change the ‘MODE_XXX’ flags at the top of each file to TRUE.

Power simulations were run on a C5.18XL amazon machine with 72 cores. The number of cores in each simulation is set according to the machine parameters it is run on. To fully reproduce paper result, please set number of cores to 72. Running times were measured on a single core. On a different machine, running times may differ.

Further Reading

  1. Heller, R., Heller, Y., and Gorfine, M. (2013). A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2), 503-510.

  2. Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54 http://www.jmlr.org/papers/volume17/14-441/14-441.pdf

  3. Brill B. (2016) Scalable Non-Parametric Tests of Independence (master’s thesis). http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf