Deep learning models have been shown to outperform methods that rely on summary statistics, like the power spectrum, in extracting information from complex cosmological data sets.
However, due to differences in the subgrid physics implementation and numerical approximations across different simulation suites, models trained on data from one cosmological simulation show a drop in performance when tested on another.
Similarly, models trained on any of the simulations would also likely experience a drop in performance when applied to observational data.
Training on data from two different suites of the CAMELS hydrodynamic cosmological simulations, we examine the generalization capabilities of Domain Adaptive Graph Neural Networks (DA-GNNs).
By utilizing GNNs, we capitalize on their capacity to capture structured scale-free cosmological information from galaxy distributions.
Moreover, by including unsupervised domain adaptation via Maximum Mean Discrepancy (MMD), we enable our models to extract domain-invariant features.
We demonstrate that DA-GNN achieves higher accuracy and robustness on cross-dataset tasks (up to
Comparison of models without (top row) and with DA (bottom row), trained on the SIMBA suite. Training data graphs include 3D positions, maximum circular velocity, stellar mass, stellar radius, and stellar metallicity. From left to right, we report: a scatter plot for the value of Ωm on 1) the same domain, 2) cross-domain and 3) the isomap showing how the GNN is encoding the two datasets in the latent space (SIMBA - triangles, IllustrisTNG - circles). In the non-domain adapted isomap, ellipses highlight regions where distributions lie, showing the difference between simulation encodings that leads to a substantial drop in performance on the cross-domain task.
This work was accepted to the Machine Learning and the Physical Sciences Workshop at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Paper is available on: arXiv, NeurIPS workshop website.
Based on this paper, we perform improvements to address limitations:
- Enhancing Cross-Domain Accuracy: Explore more flexible approaches such as adversarial-based domain adaptation (DA) techniques instead of distance-based methods like MMD.
- Multi-source Simulation Suites: Train and test models on all four available CAMELS simulation suites. This expansion would improve the cross-domain efficacy and reliability of assessments.
- Optimizing Computational Efficiency: Overcome computational and time constraints through the efficient file/data reading/writing to allow broader exploration of data and techniques.
To install all the dependencies
sh install.sh
To download the data
python3 src/scripts/utils/downloading_data.py
To reproduce paper results with the pretrained model files, run
python3 src/assessment.py
We have added argparse to assessment.py. To run the script with different chosen arguments:
python3 src/assessment.py --simsuite SIMBA --targetsuite IllustrisTNG --domain_adapt ADV --training --n_sims 500 --seed 42 --model GAT
To retrain the models and assess them, run
python3 src/assessment.py --training
For all optimization running information and details consult the file hyperparams_optimization.py
@ARTICLE{2023arXiv231101588R,
author = {{Roncoli}, Andrea and {{\'C}iprijanovi{\'c}}, Aleksandra and {Voetberg}, Maggie and {Villaescusa-Navarro}, Francisco and {Nord}, Brian},
title = "{Domain Adaptive Graph Neural Networks for Constraining Cosmological Parameters Across Multiple Data Sets}",
journal = {arXiv e-prints},
keywords = {Astrophysics - Cosmology and Nongalactic Astrophysics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning},
year = 2023,
month = nov,
eid = {arXiv:2311.01588},
pages = {arXiv:2311.01588},
doi = {10.48550/arXiv.2311.01588},
archivePrefix = {arXiv},
eprint = {2311.01588},
primaryClass = {astro-ph.CO},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv231101588R},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
The authors of this paper have committed themselves to performing this work in an equitable, inclusive, and just environment, and we hold ourselves accountable, believing that the best science is contingent on a good research environment. We acknowledge the Deep Skies Lab as a community of multi-domain experts and collaborators who have facilitated an environment of open discussion, idea generation, and collaboration. This community was important for the development of this project.
This manuscript has been supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics.
This work was supported by the EU Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie Grant Agreement No. 690835, 734303, 822185, 858199, 101003460.
The CAMELS project is supported by the Simons Foundation and the NSF grant AST2108078.
