Gaussian Processes for Statistical Soil Modeling of the Tropics

About This Presentation
Title:

Gaussian Processes for Statistical Soil Modeling of the Tropics

Description:

Gaussian Processes for Statistical Soil Modeling of the Tropics –

Number of Views:126
Avg rating:3.0/5.0
Slides: 40
Provided by: juanpablo1
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Gaussian Processes for Statistical Soil Modeling of the Tropics


1
Gaussian Processes for Statistical Soil Modeling
of the Tropics
  • CMU/TechBridgeWorld Juan Pablo Gonzalez
    Drew Bagnell
  • CIAT Team Simon Cook, Thomas Oberthur,
    Andrew Jarvis, Mauricio Rincon

2
Introduction What is CIAT?
  • International Center for Tropical Agriculture
  • Is a not-for-profit organization
  • Conducts socially and environmentally progressive
    research in developing countries aimed at
  • reducing hunger and poverty
  • preserving natural resources
  • Works through partnerships with farmers,
    scientists, and policy makers
  • 800 people, 120 researchers from 37 different
    countries

3
Introduction CIAT locations
  • One of 15 future harvest centers in
  • Cali, Colombia (headquarters)
  • Kampala, Uganda (African Regional Office)
  • Vientiane, Lao (Asian Regional Office)
  • Honduras, Ecuador, Nicaragua, Bolivia, Kenya,
    Brazil, Sri Lanka and Thailand, amongst others.
  • Funded by CGIAR
  • Consultative Group on International Agricultural
    Research
  • 58 countries, private foundations, and
    international organizations

CGIAR Members World Bank, FAO, Ford Foundation,
Rockefeller Foundation, Kellog Foundation, USA,
Canada, U.K., Australia, New Zealand, Sweden,
Portugal, Norway, Denmark, Austria, Italy, India,
Pakistan, Kenya, Nigeria, Bangladesh, Belgium,
Brazil, China, Colombia, Cote d'Ivoire, Egypt,
Finland, France, Germany, Indonesia, Iran,
Ireland, Israel, Japan, Korea, Luxembourg,
Malaysia, Mexico, Morocco, The Netherlands, Peru,
The Philippines, Portugal, Republic of South
Africa, Romania, Russian Federation, Spain,
Switzerland, Syrian, Arab Republic,
Thailand,Turkey, Uganda
4
Introduction What is CMU?
  • Carnegie Mellon University
  • World-leader in technology development
  • Computer Science
  • Robotics
  • Birthplace of Artificial Intelligence
  • Located in Pittsburgh, PA, USA

5
Introduction What is TechBridgeWorld?
  • An initiative within
    Carnegie Mellon University
  • Mission
  • To collaboratively design and implement creative
    technological solutions that will benefit
    developing communities around the world
  • To bridge the world with technology

6
Introduction Task at Hand
  • Input
  • Soil scientists from CIAT
  • Computer Scientists from CMU/TechBridgeWorld
  • 2500 Field samples from Honduras
  • Result
  • Statistical Soil Modeling for The Tropics

7
Introduction
  • Statistical soil modeling
  • The development of statistical soil models for
    large areas based on soil samples and digital
    maps of environmental variables
  • Exploiting easy-to-measure variables
  • Also known as predictive soil mapping (PSM)

8
Introduction
  • Importance
  • To detect opportunities
  • Target soil-sensitive crops confidently within
    new areas
  • To reduce risk of failure in new crops
  • To detect threats
  • Assess impact of climate change
  • To understand soil interactions with land use
  • Understand local hydrology
  • Make decisions about appropriate changes in land
    use

9
Introduction
  • Why in the tropics?
  • Most developing countries are located in the
    tropics
  • Most funding for soil analysis and modeling does
    not go to the tropics
  • The tropics have distinct climate patterns from
    the rest of the globe
  • Only dry/wet season (instead of four seasons)
  • Almost constant day length
  • Main determinant factor for temperature is
    elevation

10
Introduction Current Soil Map Coverage
Throughout the World
  • Detailed soil maps
  • USA complete coverage at 124,000 very
    extensive and expensive (30 m grid size)
  • 68 of the countries (31 by area) have complete
    coverage at 11,000,000 or better (1 km grid
    size)
  • Rest of the World
  • 69 by area
  • FAO World Map

11
Introduction Current Soil Map Coverage
Throughout the World
  • Food and Agricultural Organization (FAO)
    Worldwide Soil Map
  • Published in 1974
  • Worldwide coverage at 15,000,000 (5 km grid
    size)
  • Based on U.S. Soil Taxonomy
  • 26 classes with subcategories

NITOSOLS (N) Subclass UHTa-3 Soils having an
argillic B horizon with a clay distribution where
the percentage of clay does not decrease from its
maximum amount by as much as 20 percent within
150 cm of the surface lacking plinthite within
125 cm of the surface lacking vertic and ferric
properties. Low pH (high acidity)
12
Introduction Current Soil Map Coverage
Throughout the World
NITOSOLS (N)
13
Previous Work FAO Soil Map
  • Problems
  • Made with information and technology of 1960
  • Significant changes in technologies such as GPS,
    remote sensing and GIS
  • Categorical data
  • Most soil types explain only a small proportion
    of the actual variation of properties
  • Soil variation is continuous
  • Soil attributes do not cluster perfectly a cut
    on the basis of one attribute may split the
    variance of another attribute near its peak
  • Dependent on subjective expert opinion
  • Dependent on soil classification used
  • Low resolution

14
Traditional Soil Survey
  • Three steps
  • Observation and measurement of ancillary data and
    soil profile
  • Observations incorporated into implicit
    conceptual model
  • Apply conceptual model to predict soil variation
    in unobserved sites
  • Conceptual model uses factors of soil formation
  • Soil is a function of climate, topography,
    organisms, parent material, time (H. Jenny, 1941)

15
Predictive Soil Mapping (PSM)
  • Statistical model using factors of soil formation
  • Soil is a function of climate, topography,
    organisms, parent material, time
  • Goals
  • Exploit relationships between environmental
    variables and soil properties to improve data
    collection efficiency
  • Produce and present data that better represent
    soil landscape continuity
  • Explicitly incorporate expert knowledge in the
    design

16
PSM Existing Approaches
  • Ordinary Kriging
  • Weighted local spatial averaging
  • Spatial interpolation
  • Does not use knowledge of soil materials or
    processes
  • Requires a large number of closely-spaced samples
  • Block Kriging, Indicator Kriging, Co-Kriging
  • Extensions to include ancillary data
  • Difficult to extend to more than one ancillary
    variable

17
PSM Existing Approaches
  • Expert Systems
  • Use expert knowledge to establish rule-based
    relationships between environment and soil
    properties
  • Do not use soil data to determine soil-landscape
    relationships
  • Regression Trees
  • Decision trees with linear models
  • Promising Good results in Australia (Henderson,
    2004)

18
New Approach Gaussian Processes
  • Generalization of Gaussian distribution to
    function space of infinite dimension
  • Probabilistic (Bayesian) model
  • Completely determined by mean and covariance
    function
  • Prediction with mean and variance (confidence
    intervals)
  • Non-parametric
  • Very powerful
  • Complexity of model increases with more data
  • Not new. It started as kriging and has evolved as
    a replacement for supervised Neural Networks

19
New Approach Gaussian Processes
  • Generalization of Gaussian distribution to
    function space of infinite dimension
  • Probabilistic (Bayesian) model
  • Prediction with mean and variance (confidence
    intervals)
  • Non-parametric
  • Very powerful
  • Not new. It started as kriging and has evolved as
    a replacement for supervised Neural Networks

20
Gaussian Processes
  • Interpolation technique equivalent to
  • Neural Network with infinite number of hidden
    units
  • Radial Basis functions, with infinite number of
    basis functions
  • Least squares SVMs
  • Kernel Ridge Regression

21
Gaussian Processes
  • Covariance function

22
Available Data
  • 2500 soil samples from Honduras
  • Digital maps of Honduras with
  • Climate
  • Temperatures (max, min, average, etc)
  • Precipitation (max, min, average, etc)
  • Topography
  • 90-m elevation maps
  • Vegetation Index
  • Measurement of vegetation cover
  • And derived variables

23
Gaussian Processes
  • Learning the hyperparameters
  • Maximize the probability of the hyperparameters
    given the data
  • Use scaled conjugate gradient descent
  • Takes approximately 20 minutes with current data
    set
  • Selecting variables
  • Select most promising variables and incrementally
    add them to the model
  • Would take 54 hrs for each variable selected!

24
Gaussian Processes Variable Selection
  • Greedy search on R2 of validation set
  • Learn parameters for all variables _at_10 of
    training set
  • Calculate R2 on validation set for all variables
    _at_10 of training set
  • Select variable with best R2
  • Learn parameters _at_ 80 of training set with
    selected variables
  • Calculate R2 with selected variables _at_80 of
    training set
  • Decide whether to continue based on R2 on
    validation set for parameters

R2 coefficient of determination. Percentage of
the variance explained by the model
25
Gaussian Processes Variable Selection
R2 coefficient of determination. Percentage of
the variance explained by the model
26
Training Time
  • With 10/80 approach
  • 15 s per R2 calculation _at_10
  • 50 minutes for all variables (68), with three
    length scale priors on each
  • 20 minutes per R2 calculation _at_80
  • Total 1h 10 per variable. Up to 9 h for 8
    variables
  • With 25/80 approach
  • 1 minute per R2 calculation
  • Total 3h 30 per variable. Up to 27 h for 8
    variables
  • With 80 approach
  • 20 minutes per R2 calculation
  • Total 54 h per variable. Up to 18 days for 8
    variables

27
Results FAO Map of Honduras
NITOSOLS (N)Soils having an argillic B horizon
with a clay distribution where the percentage of
clay does not decrease from its maximum amount by
as much as 20 percent within 150 cm of the
surface lacking plinthite within 125 cm of the
surface lacking vertic and ferric properties.
Low pH (high acidity)
28
Results pH in topsoil
29
Results pH in topsoil, no X, Y
30
Results Accuracy Of Current Techniques
  • A soil survey is good if the map units have the
    right soil more than 50 of the time
  • Most measurements have a variability of 20 or
    more between laboratories
  • Most quantitative prediction methods explain less
    than 10 of variation
  • Exception Henderson 2004 in Australia

31
Results pH in Topsoil
  • Experiment 554, PHW1 vs. inputs. Training set
    82
  • out_variable PHW1
  • variables 'XUTM' 'YUTM' 'P5'
  • final hyperparameters
  • in_params 0.1414 -1.3439 4.3123 3.5009
    -1.9544 -0.8364 -1.3607
  • Train/Test2 error
  • Data 0.7547/0.7567
  • Model 0.4800/0.5590
  • Train/Test2 r2
  • 0.5954/0.4544
  • bias 1.151939
  • noise 0.260834 (std 0.51072)
  • lengthscale
  • XUTM 0.115770 (11067.51)
  • YUTM 0.173696 (11198.10)

P5 Maximum temperature of warmest month
32
Results pH in Topsoil
P5 Maximum temperature of warmest month
33
Results pH in Topsoil, variable selection
34
Results pH in Topsoil
P5 Maximum temperature of warmest month
35
Results pH in Topsoil, No X, Y
P5 Maximum temperature of warmest month P2 Mean
Diurnal Temp. Range P16 Precipitation of wettest
quarter
36
Results pH in Topsoil, No X, Y
  • Experiment 504, PHW1 vs inputs. Training set
    82
  • out_variable PHW1
  • variables 'P5' 'P2' 'P16' 'XGeology_Code_SA1'
  • final hyperparameters
  • in_params -0.1648 -0.9890 1.6712 2.1778
    -3.1989 3.5034 -3.7036 -1.8056
  • Train/Test2 error
  • Data 0.7546/0.7567
  • Model 0.5522/0.6029
  • Train/Test2 r2
  • 0.4645/0.3652
  • bias 0.848064
  • noise 0.371960 (std 0.60989)
  • lengthscale
  • P5 0.433610 ( 1.08)
  • P2 0.336585 ( 0.32)

P5 Maximum temperature of warmest month P2 Mean
Diurnal Temp. Range P16 Precipitation of wettest
quarter
37
Results pH in Topsoil, No X, Y, variable
selection
38
Results pH in Topsoil, No X, Y
P5 Maximum temperature of warmest month P2
Mean Diurnal Temp. Range P16 Precipitation of
wettest quarter
39
Results Sand in topsoil ()
P13 Precipitation of wettest month P19
Precipitation of coldest quarter P14
Precipitation of driest month
40
Results Sand in topsoil ()
  • Experiment 654, SA1 vs inputs. Training set
    82
  • out_variable SA1
  • variables 'XUTM' 'YUTM' 'ZDEM' 'mean_ndvi'
    'intra_var' 'XGeology_Code_SA1' 'P13' 'P19' 'P14'
    'P13'
  • final hyperparameters
  • in_params -0.0829 5.0725 0.8620 1.8612 1.3229
    0.9330 0.0655 -3.2179 -3.2194 0.0184 0.0115
    -3.2189 -0.4414 3.8994
  • Train/Test2 error
  • Data 14.9129/14.4163
  • Model 11.5649/12.6090
  • Train/Test2 r2
  • 0.3986/0.2350
  • bias 0.920486
  • noise 159.578757 (std 12.63245)
  • lengthscale
  • XUTM 0.649868 (62143.46)
  • YUTM 0.394312 (25394.81)

P13 Precipitation of wettest month P19
Precipitation of coldest quarter P14
Precipitation of driest month
41
Results Sand in topsoil (), variable selection
42
Results Sand in topsoil ()
P13 Precipitation of wettest month P19
Precipitation of coldest quarter P14
Precipitation of driest month
43
Results Sand in topsoil (), no X, Y
P12 Annual Precipitation P13 Precipitation of
wettest month
44
Results Sand in topsoil (), no X, Y
  • Experiment 604, SA1 vs inputs. Training set
    82
  • out_variable SA1
  • variables 'mean_ndvi' 'XFeat_1km_9_SA1'
    'XGeology_Code_SA1' 'P12' 'intra_var' 'P13'
  • final hyperparameters
  • in_params 0.2806 5.2131 0.9563 -3.2208
    -3.2170 0.5258 0.0168 -3.2173 -0.3648 2.1717
  • Train/Test2 error
  • Data 14.8333/14.3487
  • Model 13.4789/13.5924
  • Train/Test2 r2
  • 0.1743/0.1026
  • bias 1.323985
  • noise 183.653649 (std 13.55189)
  • lengthscale
  • mean_ndvi 0.619923 (10.34)
  • XFeat_1km_9_SA1 5.004900 ( 4.32)

P12 Annual Precipitation P13 Precipitation of
wettest month
45
Results Sand in topsoil (), no X, Y, variable
selection
46
Results Sand in topsoil (), no X, Y
P12 Annual Precipitation P13 Precipitation
of wettest month
47
Results Sand in topsoil ()
48
Results Sand in topsoil (), no X, Y
49
Results Clay in topsoil ()
P16 Precipitation of wettest quarter
50
Results Clay in topsoil ()
  • Experiment 754, CL1 vs inputs. Training set
    82
  • out_variable CL1
  • variables 'XUTM' 'YUTM' 'Geology_Code' 'P16'
  • final hyperparameters
  • in_params -0.0301 4.7231 1.9283 1.0973
    -0.1593 0.0280 -0.7034 3.3708
  • Train/Test2 error
  • Data 12.1955/11.3255
  • Model 10.4334/10.3302
  • Train/Test2 r2
  • 0.2681/0.1680
  • bias 0.970348
  • noise 112.516514 (std 10.60738)
  • lengthscale
  • XUTM 0.381307 (36462.41)
  • YUTM 0.577729 (37207.37)

P16 Precipitation of wettest quarter
51
Results Clay in topsoil (), variable selection
52
Results Clay in topsoil ()
P16 Precipitation of wettest quarter
53
Results Clay in topsoil (), no X, Y
P13 Precipitation of wettest month P2 Mean
Diurnal Temp. Range P19 Precipitation of coldest
quarter P4 Temperature Seasonality
54
Results Clay in topsoil (), no X, Y
  • Experiment 704, CL1 vs inputs. Training set
    82
  • out_variable CL1
  • variables 'P13' 'XGeology_Code_SA1' 'P2'
    'P19' 'mean_ndvi' 'P4' 'P2'
  • final hyperparameters
  • in_params 0.2078 4.7290 -1.0324 -1.2979
    0.2249 0.1773 -0.4783 0.9329 1.3486 0.2713 2.8471
  • Train/Test2 error
  • Data 12.1955/11.3255
  • Model 10.4058/10.5010
  • Train/Test2 r2
  • 0.2720/0.1403
  • bias 1.231002
  • noise 113.184496 (std 10.63882)
  • lengthscale
  • P13 1.675687 (111.26)
  • XGeology_Code_SA1 1.913570 ( 6.29)

P13 Precipitation of wettest month P2 Mean
Diurnal Temp. Range P19 Precipitation of coldest
quarter P4 Temperature Seasonality
55
Results Clay in topsoil (), no X, Y, variable
selection
56
Results Clay in topsoil (), no X, Y
P13 Precipitation of wettest month P2 Mean
Diurnal Temp. Range P19 Precipitation of coldest
quarter P4 Temperature Seasonality
57
Results Clay in topsoil ()
58
Results Clay in topsoil (), no X, Y
59
Prediction Time
  • 21 ms/cell 1700 training points, Pentium 4
    1.8GHz
  • Honduras (112,000 km2)
  • 40 minutes _at_ 1km
  • 3.4 days _at_ 90m
  • 30 days _at_ 30m
  • Africa (30,000,000 km2)
  • 7.2 days _at_ 1km
  • 2.4 years _at_ 90m
  • 22 years _at_ 30m
  • USA (9,158,000 km2)
  • 2.2 days _at_ 1km
  • World (148,940,000 km2)
  • 37 days _at_ 1km

60
Results Impact
  • Gaussian Processes for PSM
  • Provide quantitative predictions
  • Provide quantitative estimate of confidence
  • Combine pedogenic factors and spatial
    interpolation
  • Allow for complete coverage
  • Enable continued improvement
  • Match or advance state of the art in predictive
    soil mapping

61
Future Work
  • In Gaussian Processes for Predictive Soil Mapping
  • Validate Results
  • Improve existing variables
  • Find new variables to improve results
  • Compare with leading approach Regression Trees
  • Participate in international workshop to assess
    viability of worldwide coverage with latest
    techniques

62
Future Work
  • In TechBridgeWorld work with CIAT
  • Computer Vision for monitoring and management of
    agricultural fields and natural resources from
    low cost flying platforms
  • Digital elevation map generation
  • Automated image mosaicing
  • Segmentation of individual tree crowns
  • Disease monitoring and detection
  • Developing weather insurance schemes for
    small-holder farmers in developing countries
  • Species/crop distribution modeling for targeting
    conservation and identifying new opportunities
    for farmers
  • Temporal analysis of land cover data

63
Future WorkWeather Index Insurance for Small
Farmers
  • Rather than insuring yield loss
  • Insure for weather most likely cause of yield
    loss is lack of or excess of rain
  • Reduces fraud
  • Reduces cost
  • Challenges
  • Event timing is critical
  • Needs very low false positive and false negative
    rate
  • Impact of rainfall depends on terrain and soil
    type

64
Future Work Analysis of Digital Aerial Imagery
  • Captured with low-cost hot air balloon or kite
  • Automatic image mosaicing
  • Generation of elevation maps from images

65
Future WorkMonitoring of Rainforest Tree Species
66
Future WorkAutomatic Coast Line Extraction
  • 90 m Digital Elevation Maps available for the
    world, from shuttle mission.

67
Future WorkTemporal Analysis of Vegetation Cover
  • To monitor natural changes and human impact

68
Conclusions
  • Great contributions can be made by applying
    computer science techniques to other fields
  • Scientists in other fields are frequently limited
    to off-the-shelf solutions
  • Working with existing groups in developing
    countries can maximize impact of short-term work
Write a Comment
User Comments (0)
About PowerShow.com