Asymptotic distribution of the Pearson chi-square statistic

Imagen tomada de ResearchGate.

Statistical Odds & Ends

I recently learned of a fairly succinct proof for the asymptotic distribution of the Pearson chi-square statistic (from Chapter 9 of Reference 1), which I share below.

First, the set-up: Assume that we have $latex n$ independent trials, and each trial ends in one of $latex J$ possible outcomes, which we label (without loss of generality) as $latex 1, 2, dots, J$. Assume that for each trial, the probability of the outcome being $latex j$ is $latex p_j > 0$. Let $latex n_j$ denote that number of trials that result in outcome $latex j$, so that $latex sum_{j=1}^J n_j = n$. Pearson’s $latex chi^2$-statistic is defined as

$latex begin{aligned} chi^2 = sum_{text{cells}} dfrac{(text{obs} – text{exp})^2}{text{exp}} = sum_{j=1}^J dfrac{(n_j – np_j)^2}{np_j}. end{aligned}$

Theorem. As $latex n rightarrow infty$, $latex chi^2 stackrel{d}{rightarrow} chi_{J-1}^2$, where $latex stackrel{d}{rightarrow}$ denotes convergence in distribution.

Before proving the theorem, we prove a lemma that we will…

View original post 614 more words

Advertisement

General chi-square tests

Imagen tomada de Lifeder.

Statistical Odds & Ends

In this previous post, I wrote about the asymptotic distribution of the Pearson $latex chi^2$ statistic. Did you know that the Pearson $latex chi^2$ statistic (and the related hypothesis test) is actually a special case of a general class of $latex chi^2$ tests? In this post we describe the general $latex chi^2$ test. The presentation follows that in Chapters 23 and 24 of Ferguson (1996) (Reference 1). I’m leaving out the proofs, which can be found in the reference.

(Warning: This post is going to be pretty abstract! Nevertheless, I think it’s worth a post since I don’t think the idea is well-known.)

Let’s define some quantities. Let $latex Z_1, Z_2, dots in mathbb{R}^d$ be a sequence of random vectors whose distribution depends on a $latex k$-dimensional parameter $latex theta$ which lies in a parameter space $latex Theta$. $latex Theta$ is assumed to be a non-empty open subset…

View original post 696 more words