image credit: 1, 2, 3

A  Fast Method to

Predict Distributions of

Binary Black Hole Masses

Based on Gaussian Process Regression

                             Yuqi Yun

                Michael Zevin

            Laura Sampson

Prof. Vicky Kalogera

A  Fast Method to Predict Distributions of Binary Black Hole Masses Based of Gaussian Process Regression


A fast and accurate method to interpolate distributions can be widely applicable. Motivation: With more observations from LIGO in the upcoming years, we will be able to construct an observed mass distribution of black holes to compare with binary evolution simulations. This will allow us to investigate the physics of binary evolution, like common envelope efficiency and wind strength, , or the property of the population, like the initial mass function..
Challenge: Current computer simulations such as the Binary Stallar Evolution codes (BSE) become computationally expensive when running large populations of binaries over a multi-dimensional grid of input parameters, and may simulate accurately only a limited combination of input parameter values. Interpolation is needed for distributions over the entire parameter space. A Case Study: we developed a fast machine-learning method that utilizes Gaussian Mixture Model (GMM) and Gaussian Process (GP) regression, which together can predict distributions over the entire parameter space based on a limited number of simulated models. We also present a case study on applying this new method to predicting chirp mass distributions for binary black hole systems (BBHs) in Milky-way like galaxies of different metallicities.

Case Study Data

The case study used 7 datasets provided by Katie Breivik. They contain the primary and secondary masses of all binary black hole systems included in simulation runs using the binary evolution code BSE. Each simulation used a different metallicity (Z = 0.0001, 0.0003, 0.001, 0.004, 0.01, 0.02, 0.03).


Quantify a Distribution: Gaussian Mixture Model

Gaussian Mixture Model (GMM) describes a distribution as the weighted sum of a finite number of normal distributions. GMM is an intuitive choice since the model is easily quantified by the mean, covariance and weight of each Gaussian component in the mixture. This results in a finite set of parameters for interpolation.
Bayesian Information Criteria (BIC) can be used to select the optimal number of components (N). For each metallicity, N with the lowest BIC score is selected and analysis will proceed with the the largest N among all metallicities. The method recovered the right number of components (N) for toy data of a Gaussian mixture with known parameters. However, with chirp mass distributions, BIC often yields a result of N>10 and the model becomes unnecessarily complicated. Alternatively, we can define a criterion based on the change in BIC and choose the N before the decrease in BIC deops below a certain threshold.
The optimal number of Gaussian components N selected by the absolute Bayesian Information Criteria (BIC) score tends to be unnecessarily high. In this case study, N=19 Alternatively, select the optimal number of components based on the change in BIC. In this case study, the drop in BIC is below the 0.75% threshold after N=4.


Interpolate Parameters: Gaussian Process

Gaussian Process (GP) priors are a non-parametric method of supervised machine learning used for regression. It is the assumption that our data are a collection of random variables, any finite number of which have joint Gaussian distributions. I use GP to interpolate the mean, covariance and weight of each Gaussian component respectively, to predict these parameters for a new metallicity no simulation has been run. The data has been processed in two ways: Metallicity is converted to its log to spread the data. Components with weight < 1e-20 are discarded to avoid sudden drop to 0.
GP priors (top) and GP posteriors (bottom) using rational quadratic kernel.
The strengths of GP regression include: The prediction at each point is a normal distribution, so one can compute the confidence intervals. Future observations can be taken at the region with maximum uncertainty. The correlation function, or “kernel”, that measures the similarity between inputs can be customized: Here I use the weighted sum of four kernels: Squared exponential kernel Rational quadratic kernel Matern kernel with v=3/2 and v=5/2 The hyperparameters in the kernel are tunable and can be optimized by maximizing the marginal likelihood.


Reconstruct Distributions

We leave one distribution (at metallicity Z) out of our data set as a “test distribution,” perform GP regression (and piece-wise linear regression for comparison) over the remaining data sets, and reconstruct the test distribution using the interpolated values of each Gaussian component.


Model Evaluation

Uncertainty in Prediction The uncertainty of the predicted distribution is not given explicitly. However, we can draw randomly from the normally distributed predictions on means, covariances and weights to visualize a region of possible distributions.
Visualize the uncertainty of a predicted distribution. The black curve shows the best prediction and the gray region shows the uncertainty. The red curve represents the GMM fitting of the actual distribution.
Method Comparison Likelihood function gives the probability of the observed outcomes given a model and its parameters. As a measure for “goodness of fit,” we calculate the pre-sample average log-likelihood for each of our predictions. We also use cross-validation and obtain the sum of log-likelihood for each combination of method and number of components. The results show that the best combination in our case study is two Gaussian components and piecewise linear regression, and where metallicities are converted to their logs.
Table 1. Evaluate model performance by calculating the log likelihood and leave-one-out cross validation. The results show that a combination of N=2 Gaussian components and piecewise linear regression gives the most accurate prediction.


Future Prospects

This is an ongoing project and there are many possible improvements as well as new opportunities to apply this method. • Modify Gaussian Mixture Model: • Instead of assuming the distribution is a Gaussian mixture, we can use instead Voigt profiles or gamma distributions to count for the long tails in our data histograms. • Improve on Gaussian Process: • The current method of GP does not take into account that covariance of a Gaussian component should always be non-negative and the weight of it should be within [0,1] interval. That has resulted in negative covariance and weight predictions. Restrictions on the range of GP priors may fix the problem. • Investigate other parameters: • Instead of looking at metallicity, we can run binary stellar evolution for other parameters, which will give us a denser dataset and enable us to perform higher-dimensional analysis.

About Me

Submitting Form...

The server encountered an error.

Form received.

I’m Yuqi Yun, a rising junior at Duke University in Durham, NC, and I am double majoring in physics and mathematics. This website is about my summer REU project at the Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA) of Northwestern University in Evanston, IL. The REU experience will be invaluable as I pursue my career as a physicist in academia.

Email Me