A paper on Causal Data Fusion by Gu et al. accepted by NeurIPS 2023

The paper “FAST: a Fused and Accurate Shrinkage Tree for Heterogeneous Treatment Effects Estimation” has just been published online (https://openreview.net/pdf?id=wzg0BsV8rQ).

In general, the causal problems can be studied through both experimental (also known as randomized control trials, RCTs) and observational studies. Experimental studies are widely regarded as the gold standard for assessing causal effects. However, large-scale RCTs can be challenging due to issues related to cost, time, and ethics. On the other hand, observational data are often readily available with an adequate sample size. But the validity of observational studies typically requires untestable assumptions (e.g. unconfoundedness ).


Given the limitations of relying on individual data sources, data fusion, as a branch of causal inference strategies that integrates both the trial and the observational data, has gained popularity in the field.



This paper proposes a novel strategy for estimating the HTE called the Fused and Accurate Shrinkage Tree (FAST). Our approach utilizes both trial and observational data to improve the accuracy and robustness of the estimator, and the main contributions are summarized as follows:

(i) Inspired by the concept of shrinkage estimation in statistics, we develop an optimal weighting scheme and a corresponding estimator that balances the unbiased estimator based on the trial data with the potentially biased estimator based on the observational data.

(ii) Combined with tree-based techniques, we introduce a new split criterion that utilizes both trial data and observational data to more accurately estimate the treatment effect.

(iii) We confirm the consistency of our proposed tree-based estimator and demonstrate the effectiveness of our criterion in reducing prediction error through theoretical analysis.  


The co-first authors of this paper are Gu Jia (Ph.D. student at Center for Statistical Science, Peking University) and Tang Caizhi (Ant Group), and the other authors include Han Yan (Ph.D. student at Guanghua School of Management, Peking University), Cui Qing, Li Longfei (Ant Group), and Zhou Jun (corresponding author). The research was supported by the Ant Group and the National Natural Science Foundation of China (Grant No. 12026607, 92046021, 12071013403 and 12026607).