A Gentle Introduction to Functional Data Analysis

How to Use these Slides

Accompanying Textbook:

How to Use these Slides

The HTML-slides of this short lecture and the R-codes for reproducing the examples can be found at my GitHub repository Intro2FDA:

https://github.com/lidom/Intro2FDA

Functional Data

Functional Data

From Raw to Functional Data

From Raw to Functional Data

The simplest data set encountered in functional data analysis is a sample of the form: \[ \begin{align} x_i(t_j),\quad t_j\in[a,b],\quad i=1,\dots,n \quad j=1\dots,J. \end{align} \] The theoretical objects we wish to study are smooth curves: \[ \begin{align} X_i(t),\quad t\in[a,b],\quad i=1,\dots,n \quad j=1\dots,J. \end{align} \]

From Raw to Functional Data

Basis expansions:

Typically, the first step in working with functional data is to express them by means of a basis expansion \[ X_i(t)\approx\sum_{m=1}^M c_{im} B_m(t),\quad t\in[a,b], \] where \(B_1(t),\dots,B_M(t)\) are some standard collection of basis functions like:

splines
wavelets
sine and cosine functions
etc.

From Raw to Functional Data

B-spline basis functions \(B_1(t),\dots,B_M(t)\):

bspl_bf <- create.bspline.basis(rangeval=c(0,1), norder=3, 
                                     breaks=seq(0,1,len=7))
plot(bspl_bf, main="B-spline Basis Functions", xlab="[a,b]=[0,1]")

From Raw to Functional Data

Fourier basis functions \(B_1(t),\dots,B_M(t)\):

fourier_bf <- create.fourier.basis(rangeval=c(0,1), nbasis=5)
plot(fourier_bf, main="Fourier Basis Functions", xlab="[a,b]=[0,1]")

From Raw to Functional Data

Example: Berkeley growth study raw data

matplot(x=growth$age, y=growth$hgtf[,1:5], type="p", lty=1, pch=21, 
        xlab="age", ylab="height (cm)", cex.lab=1.2,col=1:5,bg=1:5,
        main="5 Girls in Berkeley Growth Study")

From Raw to Functional Data

Example: Berkeley growth study pre-processed data

SmObj <- smooth.basisPar(growth$age,y=growth$hgtf[,1:5],lam=0.1)
plot(SmObj$fd, xlab="age", ylab="height (cm)", cex.lab=1.2,
     main="5 Girls in Berkeley Growth Study", lty=1)

## [1] "done"

From Raw to Functional Data

Example: Berkeley growth study 1st derivative

plot(deriv(SmObj$fd, 1), xlab="age", 
     ylab="growth rate (cm / year)", cex.lab=1.2,
     main="5 Girls in Berkeley Growth Study (1st derivative)", lty=1)

## [1] "done"

From Raw to Functional Data

Example: Berkeley growth study 2nd derivative

plot(deriv(SmObj$fd, 2), xlab="age", cex.lab=1.2,
     ylab="growth acceleration (cm / year^2)",
     main="5 Girls in Berkeley Growth Study (2nd derivative)", lty=1)

## [1] "done"

Sample Mean and Covariance

Sample Mean and Covariance

Pointwise mean: \[ \bar{X}_n(t)=\frac{1}{n}\sum_{i=1}^n X_i(t) \] Pointwise standard deviation: \[ S_n(t)=\sqrt{\frac{1}{n-1}\sum_{i=1}^n\Big(X_i(t)-\bar{X}_n(t)\Big)^2} \]

Sample Mean and Covariance

Pointwise mean and standard deviation:

BM.Mean <- rowMeans(BM.mat)
BM.SD   <- apply(BM.mat, 1, sd)

Sample Mean and Covariance

Pointwise covariance function: \[ \hat{c}_n(t,s)=\frac{1}{n-1}\sum_{i=1}^n\Big(X_i(t)-\bar{X}(t)\Big)\Big(X_i(s)-\bar{X}(s)\Big) \]

The sample covariance function is extensively used in Functional Data Analysis (FDA).
The interpretation of the values of \(\hat{c}_n(t,s)\) is the same as for the usual variance-covariance matrix.

Sample Mean and Covariance

Pointwise covariance function:

BM.Cov <- var(BM.mat)

Principal Component Functions

Principal Component Functions

Idea: Use the Estimated Functional Principal Components (EFPC's) \(\hat{v}_j\) as basis functions for \(X_i\): \[ X_i(t)\approx\bar{X}_n(t) + \sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(t) \]
Estimated scores: \(\hat{\xi}_{ij}=\int_a^b (X_i(t)-\bar{X}_n(t))\hat{v}_j(t)dt\)
EFPC's \(\hat{v}_j\) are orthonormal, i.e. \[\textstyle\int_a^b\hat{v}_j(t)\hat{v}_k(t)dt=\begin{cases}1, &j=k\\0,&j\neq k.\end{cases}\]
Best basis property

Principal Component Functions

Eigendecomposition of \(\hat{c}_n(t,s)\): \[ \begin{align} \hat{c}_n(t,s)&\approx\sum_{j=1}^p\hat{\lambda}_j\hat{v}_j(t)\hat{v}_j(s), \end{align} \] where \(\hat{\lambda}_1>\dots >\hat{\lambda}_p\) denote the estimated eigenvalues.

Principal Component Functions

\[ \begin{align} X_i(t)&\approx\bar{X}_n(t) + \sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(t) \end{align} \]

Case Study:
BOA Stock Returns

Case Study: BOA Stock Returns

Bank of America (BOA) stock values \(P_i(t_j)\)
- \(i=1,\dots,n=2511\) denotes the trading days
  (from April 9th, 1997 to April 2nd, 2007).
- \(t_j\in[0,6.5]\) denotes intra-day trading time
  (6.5 hours / per trading day).
- Minute-wise measurements: \(j=1,\dots,J=390\).

We study the daily cumulative log-return functions: \[R_i(t):=\log(P_i(t))-\log(P_i(0))\approx\frac{P_i(t)-P_i(0)}{P_i(0)}\]

Case Study: BOA Stock Returns

# Plot functional data
plot(log_BOA_fd, xlab="Trading Hours",ylab="",lwd=1, col=gray(.5), 
     main="Cumulative Log-Return Functions")
lines(log_BOA_fd[1:10],lwd=1.5, lty=1)

Case Study: BOA Stock Returns

Plot of mean function for BOA cumulative returns. Point-wise 95% confidence intervals are included in red.

Case Study: BOA Stock Returns

EFPC's of BOA cumulative returns versus the theoretical eigenfunctions of Brownian Motions.

Mathematical Framework:
1. Square Integrable Functions

Square Integrable Functions

A function \(f\) is said to be square integrable, i.e., \(f\in L^2([a,b])\) if \[ \textstyle\int_a^b \big(f(t)\big)^2 dt<\infty. \]
Square integrable functions form a vector space, i.e., for \(c,d\in\mathbb{R}\): \[g,f\in L^2([a,b])\quad\Rightarrow\quad cf+dg\in L^2([a,b]),\] where addition is pointwise, i.e.
\((cf+dg)(t)=cf(t)+dg(t)\) for almost all \(t\in[0,1]\).

(We will ignore measure-theoretic considerations.)

Square Integrable Functions

What makes the space \(L^2([a,b])\) so convenient is its structure:

Inner product of two functions is given by \[ \textstyle\langle f,g\rangle=\int_a^bf(t)g(t)dt \]
Two functions \(f\) and \(g\) are orthogonal if \[\langle f,g\rangle=0\]
The norm \(||f||=\sqrt{\langle f,f\rangle}\), gives us a notion for the distance between two functions: \[d(f,g)=||f-g||\]

Square Integrable Functions

As we have already seen, basis expansions play an important role in methodology for functional data.

We say that a set of functions \(\{e_1 ,e_2,e_3,\dots\}\) is a basis in \(L^2([a,b])\) if every function \(f\in L^2([a,b])\) admits a unique expansion \[ \textstyle f(t)=\sum_{j=1}^\infty a_j e_j(t) \]
We say that \(\{e_1 ,e_2,e_3,\dots\}\) is an orthonormal basis if, in addition \[\langle e_j , e_k\rangle=\begin{cases}1, &j=k\\0&j\neq k.\end{cases} \]

Mathematical Framework:
2. Random Functions

Random Functions

Let \(X\) denote a random function defined on a probability space, say \(\Omega\).

We assume that all realizations \(X(\omega)\), \(\omega\in\Omega\), are elements of the space \(L^2([a,b])\) of square integrable functions, i.e

\(||X(\omega)||=\sqrt{\int_a^b \big(X(\omega)(t)\big)^2dt} < \infty\) for all \(\omega\in\Omega\).
\(||X||\in\mathbb{R}\) is thus a real random variable.
If \(E(||X||^2)<\infty\), we say that the random function \(X\) is square integrable. Caution: Integration is here with respect to \(\omega\), not \(t\). It might be more pedagogical to say that the random function \(X\) has a finite second moment.

Random Functions

Mean function: \[ \mu(t)=E(X(t)) \]
Covariance function: \[ c(t,s)=E\Big((X(t)-\mu(t))(X(s)-\mu(s))\Big) \]
The sample mean function \(\bar{X}_n(t)\) and the sample covariance function \(\hat{c}_n(t,s)\) are viewed as estimators of the population parameters \(\mu(t)\) and \(c(t,s)\).

Random Functions

The population covariance function leads to Functional Principal Component Analysis (FPCA), which allows us to represent a square integrable random function \(X\) as: \[ \textstyle X(t)=\mu(t)+\sum_{j=1}^\infty\xi_j v_j(t) \]

The eigenfunctions \(v_j\) are the solutions of the eigen-equation \(\int_a^bc(t,s)v_j(s)ds=\lambda_jv_j(t)\).
\(\lambda_1\geq\lambda_2\geq\dots\) denote the eigenvalues.
The random variables \(\xi_j\) are called the scores \[ \textstyle \xi_j=\langle X-\mu,v_j\rangle=\int_a^b(X(t)-\mu(t))v_j(t)dt \]

Random Functions

FPCA allows us to represent a square integrable random function \(X\) as ( Karhunen-Loéve expansion): \[ \textstyle X(t)=\mu(t)+\sum_{j=1}^\infty\xi_j v_j(t) \]

It can be shown that: \[ E(\xi_j)=0,\quad V(\xi_j)=\lambda_j,\quad \operatorname{Cov}(\xi_j,\xi_k)=0,\,j\neq k, \] and \(E\int_a^b(X(t)-\mu(t))^2dt=E||X-\mu||^2=\sum_{j=1}^\infty\lambda_j\).

That is, \(\lambda_j\) is the variance of the random function \(X\) in the principal direction \(v_j\) and the sum of these variances is the total variance of \(X\).

Functional Regression Models

Functional Regression Models

Scalar–on–function regression: \[Y_i=\int_a^b\beta(s)X_i(s)ds+\varepsilon_i\] Function–on-scalar regression: \[Y_i(t)=\sum_{k=1}^p\beta_k(t)x_{ik}+\varepsilon_i(t)\] Function–on-function regression: \[Y_i(t)=\int_a^b\beta(t,s)X_{i}(s)ds+\varepsilon_i(t)\]

Functional Regression Models

Nonparametric scalar–on–function regression: \[Y_i=m(X_i)+\varepsilon_i,\] where \(Y_i\in\mathbb{R}\) and \(X_i\in L^2([a,b])\).

Nonparametric function–on–function regression: \[Y_i=m(X_i)+\varepsilon_i,\] where \(Y_i,\,X_i\in L^2([a,b])\).

The above models are just prototypes illustrating the general idea. The main point is that the functional regression models are infinite dimensional objects which must be estimated from a finite sample.

Difficulties in Functional Regression

Difficulties in Functional Regression

Remember: The usual regression model \[ \textstyle Y_i=X\boldsymbol{\beta}+\varepsilon \] Normal equations and solutions (population & sample): \[ \textstyle \mathbf{C}_{XY}=\mathbf{C}_{XX}\boldsymbol{\beta}\quad\Rightarrow\quad \boldsymbol{\beta}=\mathbf{C}_{XX}^{-1}\mathbf{C}_{XY} \] \[ \textstyle \widehat{\mathbf{C}}_{XY}=\widehat{\mathbf{C}}_{XX}\widehat{\boldsymbol{\beta}}_n\quad\Rightarrow\quad \widehat{\boldsymbol{\beta}}_n=\widehat{\mathbf{C}}_{XX}^{-1}\widehat{\mathbf{C}}_{XY} \] where
\([\mathbf{C}_{XX}]_{r,\ell}=E(X_rX_\ell)\),
\([\mathbf{C}_{XY}]_r=E(X_rY)\),
\(\big[\widehat{\mathbf{C}}_{XX}\big]_{r,\ell}=\frac{1}{n}\sum_{i=1}^n X_{ir}X_{i\ell}\), and
\(\big[\widehat{\mathbf{C}}_{XY}\big]_r=\frac{1}{n}\sum_{i=1}^n X_{ir}Y_{i}\).

Difficulties in Functional Regression

Normal equations and solutions (population & sample): \[ \begin{align} \textstyle \mathbf{C}_{XY}=\mathbf{C}_{XX}\boldsymbol{\beta}\quad&\Rightarrow\quad \boldsymbol{\beta}=\mathbf{C}_{XX}^{-1}\mathbf{C}_{XY}\\ \textstyle \widehat{\mathbf{C}}_{XY}=\widehat{\mathbf{C}}_{XX}\widehat{\boldsymbol{\beta}}_n\quad&\Rightarrow\quad \widehat{\boldsymbol{\beta}}_n=\widehat{\mathbf{C}}_{XX}^{-1}\widehat{\mathbf{C}}_{XY} \end{align} \]

To obtain \(\boldsymbol{\beta}\) and \(\widehat{\boldsymbol{\beta}}_n\), we need invertibility of the \(p\times p\) matrices \(\mathbf{C}_{XX}\) and \(\widehat{\mathbf{C}}_{XX}\).

Difficulties in Functional Regression

Functional regression \[ \textstyle Y_i=\int_a^b\beta(t)X_i(t)+\varepsilon_i \] Normal equations (population & sample): \[ \textstyle c_{XY}(t)=\int_a^b c_{XX}(t,s)\beta(s)ds \] \[ \textstyle \hat{c}_{XY}(t)=\int_a^b \hat{c}_{XX}(t,s)\beta(s)ds \] where
\(c_{XX}(t,s)=E(X(t)X(s))\), \(c_{XY}(t)=E(X(t)Y)\),
\(\hat{c}_{XX}(t,s)=\frac{1}{n}\sum_{i=1}^n(X_i(t)X_i(s))\), and
\(\hat{c}_{XY}(t)=\frac{1}{n}\sum_{i=1}^n(X_i(t)Y_i)\).

Difficulties in Functional Regression

Functional regression \[ \textstyle Y_i=\int_a^b\beta(t)X_i(t)+\varepsilon_i \] Normal equations (sample): \[ \textstyle \hat{c}_{XY}(t)=\int_a^b \hat{c}_{XX}(t,s)\beta(s)ds \]

Problem:

Ill-posed inversion problem
Eigenvalues \(\approx 0\)

Classical solutions:

Schrinkage methods (ridge regression, principal component regression, etc)

Estimation Through Basis Expansion

Estimation Through Basis Expansion

Let us consider the scalar-on-function model \[ \textstyle Y_i=\alpha + \int_a^b \beta(s)X_i(s)ds+\varepsilon_i,\quad i=1,\dots,n \]

Simplest approach: Expand \(\beta\) using deterministic basis functions (splines, fourier, etc) \[ \textstyle \beta(t)=\sum_{k=1}^Kc_kB_k(t) \]

Rewriting: \[ \begin{align} \textstyle\int_a^b\beta(s)X_i(s)ds &\textstyle =\sum_{k=1}^Kc_k\underbrace{\int_a^bB_k(s)X_i(s)ds}_{=x_{ik}} %&\textstyle =\sum_{k=1}^Kc_k \end{align} \]

Estimation Through Basis Expansion

\[ \begin{align} \textstyle Y_i &\textstyle = \alpha + \int_a^b \beta(s)X_i(s)ds +\varepsilon_i\\ \textstyle Y_i &\textstyle = \alpha + \sum_{k=1}^Kc_k x_{ik} +\varepsilon_i\\ \textstyle \mathbf{Y} &\textstyle = \mathbf{X}\mathbf{c}+\varepsilon_i,\quad\quad \end{align} \]

The estimator of the parameter vector \(\mathbf{c}=(\alpha,c_1,\dots,c_{K})'\) \[ \textstyle \hat{\mathbf{c}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}, \] leads to an estimator of the parameter function \(\beta(t)\) \[ \textstyle \widehat{\beta}(t)=\sum_{k=1}^K\hat{c}_kB_k(t). \]

Estimation Through Basis Expansion

Biased estimation due to the truncation error: \(\delta_K(t)\) \[ \begin{align} \textstyle\beta(t) &\textstyle =\sum_{k=1}^\infty c_k B_k(t)\\ &\textstyle =\sum_{k=1}^K c_k B_k(t) + \underbrace{\sum_{\ell=K+1}^\infty c_\ell B_\ell(t)}_{=\delta_K(t)} \end{align} \]

It can be shown that (see pp. 55-56 of the accompanying textbook): \[ \textstyle\hat{\mathbf{c}}=\mathbf{c}+(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\delta}_K+(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}, \] where \([\boldsymbol{\delta}_K]_i=\int_a^b\delta_K(s)X_i(s)ds\).

Estimation with a Roughness Penalty

Estimation with a Roughness Penalty

\[ \begin{align} \textstyle P_{\lambda}(\alpha,\beta)= &\textstyle \sum_{i=1}^n\Big(Y_i-\alpha-\int_a^b\beta(s)X_i(s)ds\Big)^2\\ &\textstyle +\lambda\int_a^b\big[(L\beta)(s)\big]^2ds \end{align} \]

Idea:

Use (again) the expansion \(\beta(s)=\sum_{k=1}^K c_kB_k(t)\), but with a very high \(K\gg 0\).
Regularization is controlled by \(\lambda\) and \(L\).
Usually: \((L\beta)(s)=\beta''(s)\).

It can be shown that (see pp. 56-57 of the accompanying textbook): \[ \textstyle \mathbf{c}_{\lambda}=(\mathbf{X}'\mathbf{X}+\lambda \mathbf{R})^{-1}\mathbf{X}'\mathbf{Y}. \]

Estimation with a roughness penalty

Continued:

\[ \textstyle \mathbf{c}_{\lambda,K}=(\mathbf{X}'\mathbf{X}+\lambda \mathbf{R})^{-1}\mathbf{X}'\mathbf{Y} \] where \[ \mathbf{R}= \left(\begin{matrix} 0 &0 &\dots &0 \\ 0 &R_{11}&\dots &R_{1K}\\ 0 &R_{21}&\dots &R_{2K}\\ \vdots &\vdots&\vdots&\vdots\\ 0 &R_{K1}&\dots &R_{KK}\\ \end{matrix}\right) \] with \(R_{rk}=\int_a^b(LB_r)(s)(LB_k)(s)ds\).

Regression on FPCs

Regression on FPCs

\[ \begin{align} \textstyle Y_i &\textstyle = \alpha + \int_a^b \beta(s)X_i(s)ds +\varepsilon_i\\ \textstyle Y_i &\textstyle = \alpha + \int_a^b \beta(s)\big(\hat{\mu}(s)+\sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(s)\big)ds +\varepsilon_i\\ \textstyle Y_i &\textstyle =\beta_0+\sum_{j=1}^p\beta_j\hat{\xi}_{ij} +\varepsilon_i, \end{align} \] where \[\textstyle\beta_0=\alpha+\int_a^b\beta(s)\hat{\mu}(s)ds\] and \[\textstyle\beta_j=\int_a^b\beta(s)\hat{v}_j(s)ds\] are treated as unknown parameters.

Regression on FPCs

The parameter vector \(\mathbf{\beta}=(\beta_0,\beta_1,\dots,\beta_p)'\) can be estimated by (see page 59 of the accompanying textbook) \[ \textstyle\mathbf{\beta}=(\mathbb{X}'\mathbb{X})^{-1}\mathbb{X}'Y \]

where \[ \mathbb{X}= \left(\begin{matrix} 1 &\hat{\xi}_{11}&\dots &\hat{\xi}_{1K}\\ \vdots&\vdots &\vdots&\vdots \\ 1 &\hat{\xi}_{K1}&\dots &\hat{\xi}_{KK}\\ \end{matrix}\right) \]

Regression on FPCs

The estimators \(\hat\beta_0,\hat\beta_1,\dots,\hat\beta_p\) lead to the estimators \[\textstyle \hat{\beta}(t)=\sum_{j=1}^p\beta_j\hat{v}_j(t)\] and \[\textstyle \hat{\alpha} =\hat\beta_0-\sum_{j=1}^p\beta_j\int_a^b\hat{v}_j(s)\hat\mu(s)ds\] of our actual interest.

Functional Regression:
Hands On

Functional Regression: Hands On

set.seed(9000)
n    <- 1000
grid <- seq(0, 1, length = 101)
##
beta <- sin(grid * 2 * pi)
X    <- matrix(0, nrow=n, ncol=length(grid))
for(i2 in 1:n){
 X[i2,] <- X[i2,]+rnorm(length(grid), 0, 1)
 X[i2,] <- X[i2,]+runif(1, 0, 5)
 X[i2,] <- X[i2,]+rnorm(1, 1, 0.2)*grid
 for(j2 in 1:10){
  e      <- rnorm(2, 0, 1/j2^(2))
  X[i2,] <- X[i2,]+e[1]*sin((2*pi)*grid*j2)
  X[i2,] <- X[i2,]+e[2]*cos((2*pi)*grid*j2)
}}
## Note the Integral-Approximation using '* .01':
Y = X %*% beta * .01 + rnorm(n, 0, .4)

Functional Regression: Hands On

matplot(x = grid, y=t(X), type = "l", lty=1)

Functional Regression: Hands On

Estimating with the roughness penalty approach and the FPCA-appoach:

library("refund")

## Roughness Penalty
fit.RP   <- pfr(Y ~ lf(X, bs = "ps", k = 50))

## FPCA
fit.FPCA <- pfr(Y ~ fpc(X, pve = 0.99))

Functional Regression: Hands On

A more complex \(\beta(t)\) function:

##
beta <- -1 * dnorm(grid, mean=.20, sd=.03) + 
         3 * dnorm(grid, mean=.50, sd=.04) + 
         1 * dnorm(grid, mean=.75, sd=.05)

## Note the Integral-Approximation using '* .01':
Y = X %*% beta * .01 + rnorm(n, 0, .4)

Functional Regression: Hands On

Nonparametric Functional Regression

Nonparametric Functional Regression

Nonparametric scalar–on–function regression: \[Y_i=m(X_i)+\varepsilon_i,\] where \(Y_i\in\mathbb{R}\) and \(X_i\in L^2([a,b])\).

Functional version of the Nadaraya-Watson estimator:

\[ \widehat{m}(X)=\frac{\sum_{i=1}^n K(h^{-1}d(X,X_i))Y_i}{\sum_{j=1}^nK(h^{-1}d(X,X_j))}, \] where \(d\) is a semi-metric such as, for instance, the FPCA-based semi-metric: \[ \textstyle d(X_i,X_j)=\sqrt{\sum_{k=1}^K(\hat{\xi}_{ik}-\hat{\xi}_{jk})^2} \]

Nonparametric Functional Regression

suppressMessages(library("fda.usc")); data(tecator)
absorp <- tecator$absorp.fdata
x      <- absorp[1:129,]
y      <- tecator$y$Fat[1:129]

Nonparametric Functional Regression

res.pca1 <- fregre.np(x,y,Ker=AKer.tri,metri=semimetric.pca,q=1)
summary.fregre.fd(res.pca1, draw = FALSE)

##  *** Summary Functional Non-linear Model *** 
## 
## -Call: fregre.np(fdataobj = x, y = y, Ker = AKer.tri, metric = semimetric.pca, 
##     q = 1)
## 
## -Bandwidth (h):  2.88226
## -R squared:  0.3282283
## -Residual variance:  115.5629 on  119.2519  degrees of freedom
## -Names of possible atypical curves: No atypical curves 
## -Names of possible influence curves: 35 43 44 99 125