Skip to content

Commit ddd9fea

Browse files
committed
Initial commit
0 parents  commit ddd9fea

File tree

120 files changed

+1089290
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+1089290
-0
lines changed

.gitignore

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# Created by .ignore support plugin (hsz.mobi): Python, Windows, MacOS
2+
3+
4+
### Python template
5+
# Byte-compiled / optimized / DLL files
6+
__pycache__/
7+
*.py[cod]
8+
*$py.class
9+
10+
# C extensions
11+
*.so
12+
13+
# Distribution / packaging
14+
.Python
15+
build/
16+
develop-eggs/
17+
dist/
18+
downloads/
19+
eggs/
20+
.eggs/
21+
lib/
22+
lib64/
23+
parts/
24+
sdist/
25+
var/
26+
wheels/
27+
*.egg-info/
28+
.installed.cfg
29+
*.egg
30+
MANIFEST
31+
32+
# PyInstaller
33+
# Usually these files are written by a python script from a template
34+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
35+
*.manifest
36+
*.spec
37+
38+
# Installer logs
39+
pip-log.txt
40+
pip-delete-this-directory.txt
41+
42+
# Unit test / coverage reports
43+
htmlcov/
44+
.tox/
45+
.coverage
46+
.coverage.*
47+
.cache
48+
nosetests.xml
49+
coverage.xml
50+
*.cover
51+
.hypothesis/
52+
53+
# Translations
54+
*.mo
55+
*.pot
56+
57+
# Django stuff:
58+
*.log
59+
.static_storage/
60+
.media/
61+
local_settings.py
62+
63+
# Flask stuff:
64+
instance/
65+
.webassets-cache
66+
67+
# Scrapy stuff:
68+
.scrapy
69+
70+
# Sphinx documentation
71+
docs/_build/
72+
73+
# PyBuilder
74+
target/
75+
76+
# Jupyter Notebook
77+
.ipynb_checkpoints
78+
79+
# pyenv
80+
.python-version
81+
82+
# celery beat schedule file
83+
celerybeat-schedule
84+
85+
# SageMath parsed files
86+
*.sage.py
87+
88+
# Environments
89+
.env
90+
.venv
91+
env/
92+
venv/
93+
ENV/
94+
env.bak/
95+
venv.bak/
96+
97+
# Spyder project settings
98+
.spyderproject
99+
.spyproject
100+
101+
# Rope project settings
102+
.ropeproject
103+
104+
# mkdocs documentation
105+
/site
106+
107+
# mypy
108+
.mypy_cache/
109+
110+
111+
### macOS template
112+
# General
113+
.DS_Store
114+
.AppleDouble
115+
.LSOverride
116+
117+
# Icon must end with two \r
118+
Icon
119+
120+
# Thumbnails
121+
._*
122+
123+
# Files that might appear in the root of a volume
124+
.DocumentRevisions-V100
125+
.fseventsd
126+
.Spotlight-V100
127+
.TemporaryItems
128+
.Trashes
129+
.VolumeIcon.icns
130+
.com.apple.timemachine.donotpresent
131+
132+
# Directories potentially created on remote AFP share
133+
.AppleDB
134+
.AppleDesktop
135+
Network Trash Folder
136+
Temporary Items
137+
.apdisk
138+
139+
140+
### Windows template
141+
# Windows thumbnail cache files
142+
Thumbs.db
143+
ehthumbs.db
144+
ehthumbs_vista.db
145+
146+
# Dump file
147+
*.stackdump
148+
149+
# Folder config file
150+
[Dd]esktop.ini
151+
152+
# Recycle Bin used on file shares
153+
$RECYCLE.BIN/
154+
155+
# Windows Installer files
156+
*.cab
157+
*.msi
158+
*.msm
159+
*.msp
160+
161+
# Windows shortcuts
162+
*.lnk
163+
164+
### PyCharm
165+
.idea/
166+
167+
168+
### Custom
169+
data/LBIDD/*.tar.gz

License.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright 2018 IBM Corp.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.
14+
15+
-----------------------------------------------------------------------------------------------------------------------
16+
Warning! Data Use Restrictions Read Carefully Before Using
17+
The data provided here is derived fromm data collected by the National Center for Health Statistics (NCHS),
18+
Centers for Disease Control and Prevention (CDC).
19+
The Public Health Service Act (Section 308 (d)) provides that the data collected by
20+
the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC),
21+
may be used only for the purpose of health statistical reporting and analysis.
22+
23+
Any effort to determine the identity of any reported case is prohibited by this law.
24+
25+
NCHS does all it can to assure that the identity of data subjects cannot be disclosed.
26+
All direct identifiers, as well as any characteristics that might lead to identification, are omitted from the dataset.
27+
Any intentional identification or disclosure of a person or establishment violates
28+
the assurances of confidentiality given to the providers of the information. Therefore, users will:
29+
30+
1. Use the data in this dataset for statistical reporting and analysis only.
31+
2. Make no use of the identity of any person or establishment discovered inadvertently and advise the Director,
32+
NCHS, of any such discovery.
33+
3. Not link this dataset with individually identifiable data from other NCHS or non- NCHS datasets.
34+
35+
By using these data you signify your agreement to comply with the above-stated statutorily based requirements.

MANIFEST.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
include README.md
2+
include License.txt
3+
include requirements.txt
4+
include setup.py
5+
include setup.cfg

README.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Causal Inference Benchmarking Framework
2+
Framework for evaluating causal inference methods.
3+
4+
- [General](#general)
5+
- [Getting Started](#getting-started)
6+
- [Prerequisites](#prerequisites)
7+
- [Installation](#installation)
8+
- [Usage](#usage)
9+
- [Citing](#citing)
10+
- [License](#license)
11+
- [Authors](#authors)
12+
13+
## General
14+
Causality-Benchmark is a library developed by IBM Research for benchmarking algorithms that
15+
estimate causal effect.
16+
The framework includes unlabeled data, labeled data, and code for scoring algorithm predictions.
17+
It can benchmark predictions of both population effect size and individual effect size.
18+
19+
The evaluation script is not bounded to the provided data, and can be used on other data as
20+
long as some basic requirements are kept regarding the formats.
21+
For more technical details about the evaluation metrics and the data, please refer to the
22+
framework menuscript **TODO: Link to the menuscript/technical report**
23+
24+
Please note that due to GitHub limitation, only a sample of the data is available in this
25+
repository. However, you can manually access and download the entire dataset from the
26+
[Synapse sharing platform](https://www.synapse.org/#!Synapse:syn11294478/files/)
27+
28+
## Getting Started
29+
### Prerequisites
30+
Causality-Benchmarking is a Python 3.x library with some backward support for Python 2.7.x.
31+
The code heavily depends on pandas and requires:
32+
* pandas >= 0.20.3
33+
* numpy >= 1.13.1
34+
* future >= 0.16.0 (for Python 2 compatibility)
35+
36+
### Installation
37+
#### Using git clone
38+
This will clone the entire repository first, so you would have both the data and the unittests as well,
39+
on top of the evaluation scripts of the library. This way you could use your tools on the benchmark's
40+
data also.
41+
```bash
42+
git clone https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework.git
43+
cd IBM-Causal-Inference-Benchmarking-Framework
44+
python setup.py install
45+
```
46+
47+
#### Using pip
48+
This will only install the evaluation scripts of the library and will include neither the tests
49+
nor the data. Use this option in case you only want to score using the evaluation metrics.
50+
```bash
51+
pip install git+https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework.git
52+
```
53+
54+
### Usage
55+
#### Evaluation
56+
The evaluation script can be used either from a command line or from inside another Python
57+
script.
58+
##### Command-line API
59+
```bash
60+
$ cd IBM-Causal-Inference-Benchmarking-Framework/evaluattion
61+
$ evaluate PATH_TO_PREDICTION_OUTPUT PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY
62+
```
63+
Type `evaluate -h` for the full manual.
64+
65+
##### Python module API
66+
```python
67+
from causalbenchmark.evaluate import evaluate
68+
PATH_TO_PREDICTION_OUTPUT = "/SOME/PATH/TO/YOUR/ESTIMATES"
69+
PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY = "/SOME/PATH/TO/GROUND/TRUTH/DATA"
70+
scores = evaluate(PATH_TO_PREDICTION_OUTPUT, PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY)
71+
```
72+
73+
##### Population vs individual prediction
74+
The default behaviour of the scoring script is to evaluate the average treatment effect
75+
in the population.
76+
In case the user wish to estimate individual effect size, one should add the `individual` flag:
77+
```bash
78+
$ evaluate PATH_TO_PREDICTION_OUTPUT PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY --i
79+
```
80+
```python
81+
scores = evaluate(PATH_TO_PREDICTION_OUTPUT, PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY,
82+
individual_prediction=True)
83+
```
84+
##### Expected Files
85+
* The counterfactual files (holding $y^1$, $y^0$ for each individual), are expected to be a
86+
directory with different comma-separated-files and their file names corresponding to the
87+
data-instance but having some suffix (e.g. `"_cf.csv"`).
88+
* The predictions for population effect size are expected to be one comma-delimited-file with
89+
every row corresponding to a different data-instance.
90+
* The prediction for individual effect size are expected to be a directory containing different
91+
comma-delimited-files, each corresponding to a data-instance and each containing the
92+
estimated outcome under no-treatment and under positive treatment.
93+
94+
For full explanation, please refer to the menuscript **TODO: link to menuscript**
95+
96+
#### Estimation
97+
To avoid inflating file sizes for nothing,
98+
we supply one main covariate file and multiple files containing simulated treatment
99+
assignment and simulated outcome based on the main covariate matrix.
100+
An observed dataset, to apply causal inference methods on, can be achieved by compiling
101+
the covariate matrix and the simulated matrix together. This is done by a simple
102+
*inner join*.
103+
A python generator is provided to iterate over all simulated files, combine them with
104+
the covariate matrix into one complete observed dataset so user can obtain causal estimations
105+
from.
106+
```python
107+
from causalbenchmark.utils import combine_covariates_with_observed
108+
COVARIATE_FILE_PATH = "/SOME/MAIN/COVARIATE/FILE.csv"
109+
FACTUAL_FILE_DIR = "/SOME/PATH/TO/DIRECTORY/WITH/FACTUAL/FILES"
110+
for observed_dataset in combine_covariates_with_observed(COVARIATE_FILE_PATH,FACTUAL_FILE_DIR):
111+
causal_effect_estimations = apply_my_awesome_model(observed_dataset)
112+
```
113+
114+
## Citing
115+
If you use either the data, the evaluation metrics or the evaluation code, please cite this
116+
report as follows:
117+
```
118+
@article{causality-benchmark,
119+
title={......},
120+
author={....},
121+
journal={arXiv preprint arXiv:xxx.yyyyy},
122+
year={2018}
123+
```
124+
**TODO: complete citation info**
125+
126+
## License
127+
The current content is open source under Apache License 2.0. For full specification see:
128+
[License.txt](License.txt)
129+
130+
## Authors
131+
* bullets (link to personal github profile)
132+
* of
133+
* authors' (link to personal site)
134+
* names
135+
136+
137+

causalbenchmark/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# from .evaluate import evaluate as evaluate_predictions # causalbenchmark.evaluate_predictions()
2+
# from .utils import combine_covariates_with_observed # causalbenchmark.combine_covariates_with_observed()
3+
4+
__version__ = "0.1.0"

0 commit comments

Comments
 (0)