Human Growth: From data to functions - PowerPoint PPT Presentation

About This Presentation
Title:

Human Growth: From data to functions

Description:

Human Growth: From data to functions Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. Height changes over the day ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 38
Provided by: JO28
Category:

less

Transcript and Presenter's Notes

Title: Human Growth: From data to functions


1
Human Growth From data to functions
2
Challenges to measuring growth
  • We need repeated and regular access to subjects
    for up to 20 years.
  • Height changes over the day, and must be measured
    at a fixed time.
  • Height is measured in supine position in infancy,
    followed by standing height. The change involves
    an adjustment of about 1 cm.
  • Measurement error is about 0.5 cm in later years,
    but is rather larger in infancy.

3
Challenges to functional modeling
  • We want smooth curves that fit the data as well
    as is reasonable.
  • We will want to look at velocity and
    acceleration, so we want to differentiate twice
    and still be smooth.
  • In principle the curves should be monotone i.
    e., have a positive derivative.

4
The monotonicity problem
  • The tibia of a newborn measured daily shows us
    that over the short term growth takes places in
    spurts.
  • This babys tibia grows as fast as 2 mm/day!
  • How can we fit a smooth monotone function?

5
Weighted sums of basis functions
  • We need a flexible method for constructing curves
    to fit the data.
  • We begin with a set of basic functional building
    blocks fk(t), called basis functions.
  • Our fitting function x(t) is a weighted sum of
    these

6
What are the main choices for basis functions?
  • Fourier series
  • a constant term,
  • a sine/cosine pair of fixed frequency, and
  • followed by a series of sine/cosine pairs with
    integer multiples of the base frequency.
  • Fourier series are best for periodic data.

7
Five Fourier basis functions
8
B-splines
  • These basis functions are piecewise polynomials
    defined by a set of discrete values called knots.
  • The order of the polynomials (degree 1)
    controls their smoothness.
  • Each basis function is nonzero only over a number
    of contiguous inter-knot intervals equal to the
    order.
  • Polynomials are a special type of B-spline, and
    are thus included within the system.

9
When should I use B-splines?
  • B-splines are the basis of choice for most
    non-periodic.
  • They give complete control over flexibility,
    allowing more flexibility where needed and less
    where not needed.
  • Computing with B-splines is extremely efficient.

10
Five order 2 B-spline basis functions A basis
for polygonal lines
11
Eight order 4 B-spline basis functions A basis
for twice-differentiable functions
12
B-splines for growth data
  • We use order 6 B-splines because we want to
    differentiate the result at least twice. Order 4
    splines look smooth, but their second derivatives
    are rough.
  • We place a knot at each of the 31 ages.
  • The total number of basis functions order
    number of interior knots. 35 in this case.

13
Isnt using 35 basis functions to fit 31
observations a problem?
  • Yes. We will fit each observation exactly.
  • This will ignore the fact that the measurement
    error is typically about 0.5 cm.
  • But well fix this up later, when we look at
    roughness penalties.

14
Okay, lets see what happens
  • These two Matlab commands define the basis and
    fit the data
  • hgtbasis
  • create_bspline_basis(1,18, 35, 6, age)
  • hgtfd
  • data2fd(hgtfmat, age, hgtbasis)

15
Why we need to smooth
  • Noise in the data has a huge impact on derivative
    estimates.

16
Please let me smooth the data!
  • This command sets up 12 B-spline basis functions
    defined by equally spaced knots. This gives us
    about the right amount of fitting power given the
    error level.
  • hgtbasis
  • create_bspline_basis(1,18, 12, 6)

17
  • These are velocities are much better.
  • They go negative on the right, though.

18
Lets see some accelerations
  • These acceleration curves are too unstable at the
    ends.
  • We need something better.

19
A measure of roughness
  • What do we mean by smooth?
  • A function that is smooth has limited curvature.
  • Curvature depends on the second derivative. A
    straight line is completely smooth.

20
Total curvature
  • We can measure the roughness of a function x(t)
    by integrating its squared second derivative.
  • The second derivative notation is D2x(t).

21
Total curvature of acceleration
  • Since we want acceleration to be smooth, we
    measure roughness at the level of acceleration

22
The penalized least squares criterion
  • We strike a compromise between fitting the data
    and keeping the fit smooth.

23
How does this control roughness?
  • Smoothing parameter ? controls roughness.
  • When ? 0, only fitting the data matters.
  • But as ? increases, we place more and more
    emphasis on penalizing roughness.
  • As ? ? 8, only roughness matters, and functions
    having zero roughness are used.

24
  • We can either smooth at the data fitting step, or
    smooth a rough function.
  • This Matlab command smooths the fit to the data
    obtained using knots at ages. The roughness of
    the fourth derivative is controlled.
  • lambda 0.01
  • hgtfd smooth_fd(hgtfd, lambda, 4)

25
Accelerations using a roughness penalty
  • These accelerations are much less variable at the
    extremes.

26
The corresponding velocities
27
How did you choose ??
  • We smooth just enough to obtain tolerable
    roughness in the estimated curves (accelerations
    in this case), but not so much as to lose
    interesting variation.
  • There are data-driven methods for choosing ?, but
    they offer only a reasonable place to begin
    exploring.
  • But smoothing inevitably involves judgment.

28
What about monotonicity?
  • The growth curves should be monotonic.
  • The velocities should be non-negative.
  • Its hard to prevent linear combinations of
    anything from breaking the rules.
  • We need an indirect approach to constructing a
    monotonic model

29
A differential equation for monotonicity
  • Any strictly monotonic function x(t) must satisfy
    a simple linear differential equation

The reason is simple because of strict
monotonicity, the first derivative Dx(t) will
never be 0, and function w(t) is therefore
simply D2x(t)/Dx(t).
30
The solution of the differential equation
  • Consequently, any strictly monotonic function
    x(t) must be expressible in the form

This suggests that we transform the monotone
smoothing problem into one of estimating function
w(t), and constants ß0 and ß1.
31
What we have learned
  • B-spline bases are a good choice for fitting
    non-periodic functions Fourier series are right
    for periodic situations.
  • We can control smoothness by either using a
    restricted number of basis functions, or by
    imposing a roughness penalty.
  • Roughness penalty methods generally work better.
  • Differential equations can play a useful role
    when fitting constrained functions to data.

32
More information
  • Ramsay Silverman (1997, 2004), Chs. 3, 4, 13
  • Ramsay Silverman (2002), Ch. 6.
  • The long-term growth data are from the Berkeley
    growth study.
  • The infant growth data were collected by Michael
    Hermanussen.

33
Where do we go from here?
  • We need to look more systematically at how to
    smooth data.
  • This involves deciding what basis function system
    to use.
  • Splines are so important that we have to look at
    them in more detail.
  • Heres a serious problem

34
(No Transcript)
35
Whats wrong with the mean?
  • The cross-sectional mean is the heavy blue line.
  • It has less amplitude variation than any single
    curve.
  • The pubertal growth spurt for the mean lasts
    longer than does any single curve.
  • The problem is that we are averaging over curves
    in quite different stages of growth.

36
Whats wrong with the mean?
  • The cross-sectional mean is the heavy blue line.
  • It has less amplitude variation than any single
    curve.
  • The pubertal growth spurt for the mean lasts
    longer than does any single curve.
  • The problem is that we are averaging over curves
    in quite different stages of growth.

37
Phase and Amplitude Variation
  • Functional data like growth curves often show
    variation in the timing of events, like the
    pubertal growth spurt.
  • This is called phase variation.
  • We have to find out how to separate phase from
    amplitude variation before we can do even simple
    things like compute mean curves.
Write a Comment
User Comments (0)
About PowerShow.com