IBM-HRL-MLHLS
diff --git a/‎.gitignore‎
Lines changed: 169 additions & 0 deletions b/‎.gitignore‎
Lines changed: 169 additions & 0 deletions
diff --git a/‎License.txt‎
Lines changed: 35 additions & 0 deletions b/‎License.txt‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎MANIFEST.in‎
Lines changed: 5 additions & 0 deletions b/‎MANIFEST.in‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 137 additions & 0 deletions b/‎README.md‎
Lines changed: 137 additions & 0 deletions
diff --git a/‎causalbenchmark/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎causalbenchmark/__init__.py‎
Lines changed: 4 additions & 0 deletions
@@ -0,0 +1,169 @@
+# Created by .ignore support plugin (hsz.mobi): Python, Windows, MacOS
+
+
+### Python template
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+.static_storage/
+.media/
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+
+### macOS template
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Icon must end with two \r
+Icon
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+
+
+### Windows template
+# Windows thumbnail cache files
+Thumbs.db
+ehthumbs.db
+ehthumbs_vista.db
+
+# Dump file
+*.stackdump
+
+# Folder config file
+[Dd]esktop.ini
+
+# Recycle Bin used on file shares
+$RECYCLE.BIN/
+
+# Windows Installer files
+*.cab
+*.msi
+*.msm
+*.msp
+
+# Windows shortcuts
+*.lnk
+
+### PyCharm
+.idea/
+
+
+### Custom
+data/LBIDD/*.tar.gz
@@ -0,0 +1,35 @@
+Copyright 2018 IBM Corp.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+-----------------------------------------------------------------------------------------------------------------------
+Warning! Data Use Restrictions Read Carefully Before Using
+The data provided here is derived fromm data collected by the National Center for Health Statistics (NCHS),
+Centers for Disease Control and Prevention (CDC).
+The Public Health Service Act (Section 308 (d)) provides that the data collected by
+the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC),
+may be used only for the purpose of health statistical reporting and analysis.
+
+Any effort to determine the identity of any reported case is prohibited by this law.
+
+NCHS does all it can to assure that the identity of data subjects cannot be disclosed.
+All direct identifiers, as well as any characteristics that might lead to identification, are omitted from the dataset.
+Any intentional identification or disclosure of a person or establishment violates
+the assurances of confidentiality given to the providers of the information. Therefore, users will:
+
+1. Use the data in this dataset for statistical reporting and analysis only.
+2. Make no use of the identity of any person or establishment discovered inadvertently and advise the Director,
+   NCHS, of any such discovery.
+3. Not link this dataset with individually identifiable data from other NCHS or non- NCHS datasets.
+
+By using these data you signify your agreement to comply with the above-stated statutorily based requirements.
@@ -0,0 +1,5 @@
+include README.md 
+include License.txt 
+include requirements.txt 
+include setup.py 
+include setup.cfg 
@@ -0,0 +1,137 @@
+# Causal Inference Benchmarking Framework
+Framework for evaluating causal inference methods.
+
+ - [General](#general)
+ - [Getting Started](#getting-started)
+   - [Prerequisites](#prerequisites)
+   - [Installation](#installation)
+   - [Usage](#usage)
+ - [Citing](#citing)
+ - [License](#license)
+ - [Authors](#authors)
+
+## General
+Causality-Benchmark is a library developed by IBM Research for benchmarking algorithms that 
+estimate causal effect.
+The framework includes unlabeled data, labeled data, and code for scoring algorithm predictions.  
+It can benchmark predictions of both population effect size and individual effect size.  
+
+The evaluation script is not bounded to the provided data, and can be used on other data as 
+long as some basic requirements are kept regarding the formats.  
+For more technical details about the evaluation metrics and the data, please refer to the 
+framework menuscript **TODO: Link to the menuscript/technical report**
+
+Please note that due to GitHub limitation, only a sample of the data is available in this 
+repository. However, you can manually access and download the entire dataset from the 
+[Synapse sharing platform](https://www.synapse.org/#!Synapse:syn11294478/files/)
+
+## Getting Started
+### Prerequisites
+Causality-Benchmarking is a Python 3.x library with some backward support for Python 2.7.x.  
+The code heavily depends on pandas and requires:
+* pandas >= 0.20.3
+* numpy >= 1.13.1
+* future >= 0.16.0 (for Python 2 compatibility)
+
+### Installation
+#### Using git clone
+This will clone the entire repository first, so you would have both the data and the unittests as well,
+on top of the evaluation scripts of the library. This way you could use your tools on the benchmark's
+data also.
+```bash
+git clone https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework.git
+cd IBM-Causal-Inference-Benchmarking-Framework
+python setup.py install
+```
+
+#### Using pip 
+This will only install the evaluation scripts of the library and will include neither the tests
+nor the data. Use this option in case you only want to score using the evaluation metrics.
+```bash
+pip install git+https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework.git
+``` 
+
+### Usage
+#### Evaluation
+The evaluation script can be used either from a command line or from inside another Python
+script.  
+##### Command-line API
+```bash
+$ cd IBM-Causal-Inference-Benchmarking-Framework/evaluattion
+$ evaluate PATH_TO_PREDICTION_OUTPUT PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY
+```
+Type `evaluate -h` for the full manual.
+
+##### Python module API
+```python
+from causalbenchmark.evaluate import evaluate
+PATH_TO_PREDICTION_OUTPUT = "/SOME/PATH/TO/YOUR/ESTIMATES" 
+PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY = "/SOME/PATH/TO/GROUND/TRUTH/DATA" 
+scores = evaluate(PATH_TO_PREDICTION_OUTPUT, PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY)
+```
+
+##### Population vs individual prediction
+The default behaviour of the scoring script is to evaluate the average treatment effect 
+in the population.
+In case the user wish to estimate individual effect size, one should add the `individual` flag:  
+```bash
+$ evaluate PATH_TO_PREDICTION_OUTPUT PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY --i
+``` 
+```python 
+scores = evaluate(PATH_TO_PREDICTION_OUTPUT, PATH_TO_COUNTERFACTUAL_FILES_DIRECTORY,
+                  individual_prediction=True)
+```
+##### Expected Files
+* The counterfactual files (holding $y^1$, $y^0$ for each individual), are expected to be a
+  directory with different comma-separated-files and their file names corresponding to the
+  data-instance but having some suffix (e.g. `"_cf.csv"`).
+* The predictions for population effect size are expected to be one comma-delimited-file with
+  every row corresponding to a different data-instance.
+* The prediction for individual effect size are expected to be a directory containing different
+  comma-delimited-files, each corresponding to a data-instance and each containing the
+  estimated outcome under no-treatment and under positive treatment.
+
+For full explanation, please refer to the menuscript **TODO: link to menuscript** 
+
+#### Estimation
+To avoid inflating file sizes for nothing, 
+we supply one main covariate file and multiple files containing simulated treatment 
+assignment and simulated outcome based on the main covariate matrix.    
+An observed dataset, to apply causal inference methods on, can be achieved by compiling 
+the covariate matrix and the simulated matrix together. This is done by a simple 
+*inner join*.  
+A python generator is provided to iterate over all simulated files, combine them with
+the covariate matrix into one complete observed dataset so user can obtain causal estimations
+from.
+```python
+from causalbenchmark.utils import combine_covariates_with_observed
+COVARIATE_FILE_PATH = "/SOME/MAIN/COVARIATE/FILE.csv"
+FACTUAL_FILE_DIR = "/SOME/PATH/TO/DIRECTORY/WITH/FACTUAL/FILES"
+for observed_dataset in combine_covariates_with_observed(COVARIATE_FILE_PATH,FACTUAL_FILE_DIR):
+    causal_effect_estimations = apply_my_awesome_model(observed_dataset)
+```
+ 
+## Citing
+If you use either the data, the evaluation metrics or the evaluation code, please cite this 
+report as follows:
+```
+@article{causality-benchmark,
+  title={......},
+  author={....},
+  journal={arXiv preprint arXiv:xxx.yyyyy},
+  year={2018}
+```
+**TODO: complete citation info** 
+
+## License
+The current content is open source under Apache License 2.0. For full specification see: 
+[License.txt](License.txt)
+
+## Authors
+* bullets (link to personal github profile)
+* of 
+* authors' (link to personal site)
+* names
+
+
+ 
@@ -0,0 +1,4 @@
+# from .evaluate import evaluate as evaluate_predictions      # causalbenchmark.evaluate_predictions()
+# from .utils import combine_covariates_with_observed         # causalbenchmark.combine_covariates_with_observed()
+
+__version__ = "0.1.0"