Skip to content

Using LaTeX and MathJax

Joram Soch edited this page Aug 26, 2020 · 7 revisions

Source code for proofs and definitions in "The Book of Statistical Proofs" uses a combination of Markdown, MathJax and LaTeX. On this page, we collect a set of rules, recommendations and suggestions for applying LaTeX markup to typeset formulas.

Basic rules

  1. Use $...$ for in-line math, e.g.
Let $X$ be an $n \times 1$ random vector.
  1. Use $$...$$ for stand-alone equations, e.g.
$$
y = Ax + b \sim \mathcal{N}(A\mu + b, A \Sigma A^\mathrm{T}) \; .
$$
  1. Use $$ \begin{split} ...&... \\ ...&... \end{split} $$ to write multi-line equations, e.g.
$$
\begin{split}
M_y(t) &= \exp \left[ t^\mathrm{T} b \right] \cdot M_x(At) \\
&= \exp \left[ t^\mathrm{T} b \right] \cdot \exp \left[ t^\mathrm{T} A \mu + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \\
&= \exp \left[ t^\mathrm{T} \left( A \mu + b \right) + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \; .
\end{split}
$$
  1. Label each stand-alone equation (also splitted ones) using \label{eq:XYZ}, e.g.
$$ \label{eq:mvn-pdf}
f_X(x) = \frac{1}{\sqrt{(2 \pi)^n |\Sigma|}} \cdot \exp \left[ -\frac{1}{2} (x-\mu)^\mathrm{T} \Sigma^{-1} (x-\mu) \right] \; .
$$
  1. You can then reference them in in-line math or other equations using \eqref{eq:XYZ}, e.g.
$$ \label{eq:y-mgf-s2}
\begin{split}
M_y(t) &\overset{\eqref{eq:y-mgf-s1}}{=} \exp \left[ t^\mathrm{T} b \right] \cdot M_x(At) \\
&\overset{\eqref{eq:mvn-mgf}}{=} \exp \left[ t^\mathrm{T} b \right] \cdot \exp \left[ t^\mathrm{T} A \mu + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \\
&= \exp \left[ t^\mathrm{T} \left( A \mu + b \right) + \frac{1}{2} t^\mathrm{T} A \Sigma A^\mathrm{T} t \right] \; .
\end{split}
$$

Things to avoid

  1. Do not use a vertical bar (|) in in-line math, because it will be interpreted as indicating a table.

    Solution: Use \vert, \lvert, \rvert or \mid, depending on your specific formula and context.

  2. Do not use two consecutive curly opening braces ({{) in any equation, because it will cause a build error.

    Solution: Put a space between the two braces in order to avoid the build error: { {.

Suggested notation

Chapter I: General Theorems

  • A, B, C – arbitrary random events
  • A_1, \ldots, A_k – mutually exclusive random events
  • \bar{A}, \bar{B}, \bar{C} – complements of random events
  • X, Y, Z – scalar random variables, random vectors or random matrices
  • x, y, z – realizations or values of random variables (exception: random matrices)
  • \mathcal{X}, \mathcal{Y}, \mathcal{Z} – sets of possible values of random variables
  • x \in \mathcal{X}, y \in \mathcal{Y}, z \in \mathcal{Z} – indexing all possible values
  • p(x), q(x) – probability densities or probability masses
  • \mathrm{Pr}(X=a), \mathrm{Pr}(X \in A) – specific statements about random variables
  • p(x,y) – joint probability
  • p(x|y) – conditional probability
  • f_X(x) – probability density (PDF) or probability mass function (PMF)
  • F_X(x) – cumulative distribution function (CDF)
  • Q_X(p) – quantile function (QF) a.k.a. inverse CDF.
  • M_X(t) – moment-generating function (MGF)
  • \mathrm{E}(X) – expected value (mean)
  • \mathrm{Var}(X) – variance
  • \mathrm{Cov}(X,Y) – covariance
  • \mathrm{Corr}(X,Y) – correlation
  • \Sigma_{XX} – covariance matrix
  • C_{XX} – correlation matrix
  • \mu_n – n-th (central) moment
  • \mathrm{H}(X) – (Shannon) entropy
  • \mathrm{H}(X|Y) – conditional entropy
  • \mathrm{H}(X,Y) – joint entropy (of two random variables)
  • \mathrm{H}(P,Q) – cross-entropy (of two probability distributions)
  • \mathrm{h}(X) – differential entropy
  • \mathrm{h}(X|Y) – conditional differential entropy
  • \mathrm{h}(X,Y) – joint differential entropy (of two random variables)
  • \mathrm{h}(P,Q) – differential cross-entropy (of two probability distributions)
  • \mathrm{I}(X,Y) – mutual information
  • \mathrm{KL}[P||Q] – Kullback-Leibler divergence (between two probability distributions)
  • \mathrm{KL}[p(x)||q(x)] – Kullback-Leibler divergence (between two PMFs or PDFs)

Chapter II: Probability Distributions

  • \lambda – hyper-parameters, parameters of a distribution
  • \mathcal{D}(\lambda) – parametrized probability distribution
  • X \sim \mathcal{D}(\lambda) – random variable following probability distribution
  • p(x|\lambda) = \mathcal{D}(x; \lambda) – PDF or PMF of probability distribution
  • \int_{-\infty}^x \mathcal{D}(z; \lambda) \, \mathrm{d}z – CDF of probability distribution
  • Y = AX + b – linear transformation of random variable
  • \mu – mean of random variable
  • \Sigma – covariance of random variable
  • \mathcal{N}(\mu, \Sigma) – multivariate normal distribution
  • \mathrm{E}(X) – expected value of random variable
  • \mathrm{median}(X) – median of random variable
  • \mathrm{mode}(X) – mode of random variable
  • \mathrm{Var}(X) – variance of random variable
  • \mathrm{Cov}(X) – covariance of random vector

Chapter III: Statistical Models

  • y, Y – univariate/multivariate measured data
  • x, X – single predictor/design matrix
  • \beta, B – univariate/multivariate regression coefficients
  • \varepsilon, E – univariate/multivariate noise
  • \sigma^2, \Sigma – noise variance/measurement covariance
  • I_n – noise covariance matrix (i.i.d.)
  • V – noise covariance matrix (not i.i.d.)
  • n – number of observations
  • v – number of measurements
  • p – number of regressors
  • y_i – i-th observation
  • y_j – j-th measurement
  • y_{ij} – i-th observation of j-th measurement
  • m – generative model
  • \theta – model parameters
  • \lambda – model hyper-parameters
  • p(y|\theta,m) – likelihood function
  • \mathrm{LL}(\theta) – log-likelihood function
  • \hat{\theta} – estimated model parameters
  • \hat{y} – fitted/predicted data
  • p(\theta|m) – prior distribution
  • p(\theta|y,m) – posterior distribution
  • p(y|m) – marginal likelihood
  • \log p(y|m) – log model evidence

Chapter IV: Model Selection

  • \sigma^2 – noise variance
  • \hat{\sigma}^2 – residual variance
  • R^2 – coefficient of determination
  • R^2_\mathrm{adj} – adjusted coefficient of determination
  • \mathrm{SNR} – signal-to-noise ratio
  • y – measured data
  • m – generative model
  • f – generative model family
  • n – number of observations
  • k – number of free model parameters
  • \mathrm{MLL}{m} – maximum log-likelihood
  • \mathrm{IC}{m} – information criterion
  • p(y|m) – model evidence
  • \mathrm{LME}{m} – log model evidence
  • \mathrm{Acc}{m} – (Bayesian) model accuracy (term)
  • \mathrm{Com}{m} – (Bayesian) model complexity (penalty)
  • m \in f – indexing all models in a family
  • p(y|f) – family evidence
  • \mathrm{LFE}{f} – log family evidence
  • \mathrm{BF}_{12} – Bayes factor
  • \mathrm{LBF}_{12} – log Bayes factor
  • p(m|y) – posterior model probability
  • p(\theta|y) – marginal posterior distribution

Clone this wiki locally