Using LaTeX and MathJax

Source code for proofs and definitions in "The Book of Statistical Proofs" uses a combination of Markdown, MathJax and LaTeX. On this page, we collect a set of rules, recommendations and suggestions for applying LaTeX markup to typeset formulas.

Basic rules

Use $...$ for in-line math, e.g.

Let $X$ be an $n \times 1$ random vector.

Use $$...$$ for stand-alone equations, e.g.

$$
y = Ax + b \sim \mathcal{N}(A\mu + b, A \Sigma A^\mathrm{T}) \; .
$$

Use $$ \begin{split} ...&... \\ ...&... \end{split} $$ to write multi-line equations, e.g.

$$
\begin{split}
M_y(t) &= \exp \left[ t^\mathrm{T} b \right] \cdot M_x(At) \\
&= \exp \left[ t^\mathrm{T} b \right] \cdot \exp \left[ t^\mathrm{T} A \mu + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \\
&= \exp \left[ t^\mathrm{T} \left( A \mu + b \right) + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \; .
\end{split}
$$

Label each stand-alone equation (also splitted ones) using \label{eq:XYZ}, e.g.

$$ \label{eq:mvn-pdf}
f_X(x) = \frac{1}{\sqrt{(2 \pi)^n |\Sigma|}} \cdot \exp \left[ -\frac{1}{2} (x-\mu)^\mathrm{T} \Sigma^{-1} (x-\mu) \right] \; .
$$

You can then reference them in in-line math or other equations using \eqref{eq:XYZ}, e.g.

$$ \label{eq:y-mgf-s2}
\begin{split}
M_y(t) &\overset{\eqref{eq:y-mgf-s1}}{=} \exp \left[ t^\mathrm{T} b \right] \cdot M_x(At) \\
&\overset{\eqref{eq:mvn-mgf}}{=} \exp \left[ t^\mathrm{T} b \right] \cdot \exp \left[ t^\mathrm{T} A \mu + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \\
&= \exp \left[ t^\mathrm{T} \left( A \mu + b \right) + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \; .
\end{split}
$$

Things to avoid

Do not use a vertical bar (|) in in-line math, because it will be interpreted as indicating a table.

Solution: Use \vert, \lvert, \rvert or \mid, depending on your specific formula and context.
Do not use two consecutive curly opening braces ({{) in any equation, because it will cause a build error.

Solution: Put a space between the two braces in order to avoid the build error: { {.

Suggested notation

Chapter I: General Theorems

A, B, C – arbitrary random events
A_1, \ldots, A_k – mutually exclusive random events
\bar{A}, \bar{B}, \bar{C} – complements of random events
X, Y, Z – scalar random variables, random vectors or random matrices
x, y, z – realizations or values of random variables (exception: random matrices)
\mathcal{X}, \mathcal{Y}, \mathcal{Z} – sets of possible values of random variables
x \in \mathcal{X}, y \in \mathcal{Y}, z \in \mathcal{Z} – indexing all possible values
p(x), q(x) – probability densities or probability masses
\mathrm{Pr}(X=a), \mathrm{Pr}(X \in A) – specific statements about random variables
p(x,y) – joint probability
p(x|y) – conditional probability
f_X(x) – probability density (PDF) or probability mass function (PMF)
F_X(x) – cumulative distribution function (CDF)
Q_X(p) – quantile function (QF) a.k.a. inverse CDF.
M_X(t) – moment-generating function (MGF)
\mathrm{E}(X) – expected value (mean)
\mathrm{Var}(X) – variance
\mathrm{Cov}(X,Y) – covariance
\mathrm{Corr}(X,Y) – correlation
\Sigma_{XX} – covariance matrix
C_{XX} – correlation matrix
\mu_n – n-th (central) moment
\mathrm{H}(X) – (Shannon) entropy
\mathrm{H}(X|Y) – conditional entropy
\mathrm{H}(X,Y) – joint entropy (of two random variables)
\mathrm{H}(P,Q) – cross-entropy (of two probability distributions)
\mathrm{h}(X) – differential entropy
\mathrm{h}(X|Y) – conditional differential entropy
\mathrm{h}(X,Y) – joint differential entropy (of two random variables)
\mathrm{h}(P,Q) – differential cross-entropy (of two probability distributions)
\mathrm{I}(X,Y) – mutual information
\mathrm{KL}[P||Q] – Kullback-Leibler divergence (between two probability distributions)
\mathrm{KL}[p(x)||q(x)] – Kullback-Leibler divergence (between two PMFs or PDFs)

Chapter II: Probability Distributions

\lambda – hyper-parameters, parameters of a distribution
\mathcal{D}(\lambda) – parametrized probability distribution
X \sim \mathcal{D}(\lambda) – random variable following probability distribution
p(x|\lambda) = \mathcal{D}(x; \lambda) – PDF or PMF of probability distribution
\int_{-\infty}^x \mathcal{D}(z; \lambda) \, \mathrm{d}z – CDF of probability distribution
Y = AX + b – linear transformation of random variable
\mu – mean of random variable
\Sigma – covariance of random variable
\mathcal{N}(\mu, \Sigma) – multivariate normal distribution
\mathrm{E}(X) – expected value of random variable
\mathrm{median}(X) – median of random variable
\mathrm{mode}(X) – mode of random variable
\mathrm{Var}(X) – variance of random variable
\mathrm{Cov}(X) – covariance of random vector

Chapter III: Statistical Models

y, Y – univariate/multivariate measured data
x, X – single predictor/design matrix
\beta, B – univariate/multivariate regression coefficients
\varepsilon, E – univariate/multivariate noise
\sigma^2, \Sigma – noise variance/measurement covariance
I_n – noise covariance matrix (i.i.d.)
V – noise covariance matrix (not i.i.d.)
n – number of observations
v – number of measurements
p – number of regressors
y_i – i-th observation
y_j – j-th measurement
y_{ij} – i-th observation of j-th measurement
m – generative model
\theta – model parameters
\lambda – model hyper-parameters
p(y|\theta,m) – likelihood function
\mathrm{LL}(\theta) – log-likelihood function
\hat{\theta} – estimated model parameters
\hat{y} – fitted/predicted data
p(\theta|m) – prior distribution
p(\theta|y,m) – posterior distribution
p(y|m) – marginal likelihood
\log p(y|m) – log model evidence

Chapter IV: Model Selection

\sigma^2 – noise variance
\hat{\sigma}^2 – residual variance
R^2 – coefficient of determination
R^2_\mathrm{adj} – adjusted coefficient of determination
\mathrm{SNR} – signal-to-noise ratio
y – measured data
m – generative model
f – generative model family
n – number of observations
k – number of free model parameters
\mathrm{MLL}{m} – maximum log-likelihood
\mathrm{IC}{m} – information criterion
p(y|m) – model evidence
\mathrm{LME}{m} – log model evidence
\mathrm{Acc}{m} – (Bayesian) model accuracy (term)
\mathrm{Com}{m} – (Bayesian) model complexity (penalty)
m \in f – indexing all models in a family
p(y|f) – family evidence
\mathrm{LFE}{f} – log family evidence
\mathrm{BF}_{12} – Bayes factor
\mathrm{LBF}_{12} – log Bayes factor
p(m|y) – posterior model probability
p(\theta|y) – marginal posterior distribution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using LaTeX and MathJax

Basic rules

Things to avoid

Suggested notation

Chapter I: General Theorems

Chapter II: Probability Distributions

Chapter III: Statistical Models

Chapter IV: Model Selection

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally