Skip to content

yctao7/CosmoInfer_GNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep learning models have been shown to outperform methods that rely on summary statistics, like the power spectrum, in extracting information from complex cosmological data sets. However, due to differences in the subgrid physics implementation and numerical approximations across different simulation suites, models trained on data from one cosmological simulation show a drop in performance when tested on another. Similarly, models trained on any of the simulations would also likely experience a drop in performance when applied to observational data. Training on data from two different suites of the CAMELS hydrodynamic cosmological simulations, we examine the generalization capabilities of Domain Adaptive Graph Neural Networks (DA-GNNs). By utilizing GNNs, we capitalize on their capacity to capture structured scale-free cosmological information from galaxy distributions. Moreover, by including unsupervised domain adaptation via Maximum Mean Discrepancy (MMD), we enable our models to extract domain-invariant features. We demonstrate that DA-GNN achieves higher accuracy and robustness on cross-dataset tasks (up to $28\%$ better relative error and up to almost an order of magnitude better $\chi^2$). Using data visualizations, we show the effects of domain adaptation on proper latent space data alignment. This shows that DA-GNNs are a promising method for extracting domain-independent cosmological information, a vital step toward robust deep learning for real cosmic survey data.

Comparison of models without (top row) and with DA (bottom row), trained on the SIMBA suite. Training data graphs include 3D positions, maximum circular velocity, stellar mass, stellar radius, and stellar metallicity. From left to right, we report: a scatter plot for the value of Ωm on 1) the same domain, 2) cross-domain and 3) the isomap showing how the GNN is encoding the two datasets in the latent space (SIMBA - triangles, IllustrisTNG - circles). In the non-domain adapted isomap, ellipses highlight regions where distributions lie, showing the difference between simulation encodings that leads to a substantial drop in performance on the cross-domain task.

This work was accepted to the Machine Learning and the Physical Sciences Workshop at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

Paper is available on: arXiv, NeurIPS workshop website.

Based on this paper, we perform improvements to address limitations:

  • Enhancing Cross-Domain Accuracy: Explore more flexible approaches such as adversarial-based domain adaptation (DA) techniques instead of distance-based methods like MMD.
  • Multi-source Simulation Suites: Train and test models on all four available CAMELS simulation suites. This expansion would improve the cross-domain efficacy and reliability of assessments.
  • Optimizing Computational Efficiency: Overcome computational and time constraints through the efficient file/data reading/writing to allow broader exploration of data and techniques.

Quickstart

To install all the dependencies

sh install.sh

To download the data

python3 src/scripts/utils/downloading_data.py

To reproduce paper results with the pretrained model files, run

python3 src/assessment.py

We have added argparse to assessment.py. To run the script with different chosen arguments:

python3 src/assessment.py --simsuite SIMBA --targetsuite IllustrisTNG --domain_adapt ADV --training --n_sims 500 --seed 42 --model GAT

To retrain the models and assess them, run

python3 src/assessment.py --training

For all optimization running information and details consult the file hyperparams_optimization.py

Citation

@ARTICLE{2023arXiv231101588R,
       author = {{Roncoli}, Andrea and {{\'C}iprijanovi{\'c}}, Aleksandra and {Voetberg}, Maggie and {Villaescusa-Navarro}, Francisco and {Nord}, Brian},
        title = "{Domain Adaptive Graph Neural Networks for Constraining Cosmological Parameters Across Multiple Data Sets}",
      journal = {arXiv e-prints},
     keywords = {Astrophysics - Cosmology and Nongalactic Astrophysics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning},
         year = 2023,
        month = nov,
          eid = {arXiv:2311.01588},
        pages = {arXiv:2311.01588},
          doi = {10.48550/arXiv.2311.01588},
archivePrefix = {arXiv},
       eprint = {2311.01588},
 primaryClass = {astro-ph.CO},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv231101588R},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Acknowledgement

The authors of this paper have committed themselves to performing this work in an equitable, inclusive, and just environment, and we hold ourselves accountable, believing that the best science is contingent on a good research environment. We acknowledge the Deep Skies Lab as a community of multi-domain experts and collaborators who have facilitated an environment of open discussion, idea generation, and collaboration. This community was important for the development of this project.

This manuscript has been supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics.

This work was supported by the EU Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie Grant Agreement No. 690835, 734303, 822185, 858199, 101003460.

The CAMELS project is supported by the Simons Foundation and the NSF grant AST2108078.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors