Chapter-7: Case Study - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Chapter-7: Case Study

Description:

Chapter-7: Case Study www.AhsanAbdullah.com Background Data being recorded for decades by several organization, mostly never digitized and never used for decision making. – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 39
Provided by: aa7098
Category:
Tags: case | chapter | complex | data | olap | study

less

Transcript and Presenter's Notes

Title: Chapter-7: Case Study


1
  • Chapter-7 Case Study
  • www.AhsanAbdullah.com

2
Background
  • Data being recorded for decades by several
    organization, mostly never digitized and never
    used for decision making. Under-utilization.
  • Data is horizontally wide i.e. 100 attributes
    and vertically deep i.e. tens of thousands of
    rows.
  • Huge potential for long-term and short-term
    decision making.
  • Decision making not data driven, but based on
    expert judgment, sometimes with tragic results.

3
Area under study
  • Punjab is the bread-basket of Pakistan, and
    administratively divided into eight divisions.
  • Multan division is the cotton hub of Punjab,
    which consists of district Multan.
  • District Multan has three tehsils, which are
    furhter divided into central points or Markaz.
  • This study is about 10 tehsils of Multan using
    data for years 2001 and 2002.

4
Area under study Map
  • Key Markaz
  • 1 Bosan
  • 2 Qadirpurran
  • 3 Multan
  • Makhdum Rashid
  • Mumtazabad
  • 6 Shujabad
  • 7 Hafizwala
  • 8 Jalalpur Pirwala
  • 9 Qasba Marral

5
Major Players
  • A pest is an insect that eats the crop. There are
    two types of cotton pests
  • Sucking pests (White Fly, Jassid, Thrips etc.)
  • Boll Worms (spotted boll-worm, pink boll-worm)
  • A predator is an insect that eats the pest.
  • Such as Lady bug Beetle, Spiders, Ants etc.
  • Cotton is also effected by virus
  • Cotton Leaf Curl Virus (CLCV)

6
Economic Threshold Level (ETL_A)
  • The pest population beyond which it is cost
    effective to use pesticide.
  • Pesticide is a poison which is used to kill
    pests.
  • Note that eradicating pests is NOT feasible,
    controlling pest population is feasible.

7
Pests Predators
Pests
Predators
Source United States Department of Agriculture
(USDA)
8
ETL_A Graph
Economic Injury
ETL_A
Pest Population
Time
9
The need
  • Extensive losses to cotton crop due to pest
    attacks in 1983 resulted in the establishment of
    Directorate of Pest Warning in 1984.
  • Since 1984 scouts from the Directorate have been
    sampling fields and recording data and advising
    farmers.
  • During 2003-04 season, Boll Worm attack on the
    cotton crop resulted in a loss of nearly 0.5 M
    bales.
  • Weather not the only factor, but points to a
    multitude of factors, requiring efficient and
    effective data analysis, for better decision
    making.

10
The need IT in Agriculture
  • The volume of pest scouting data accumulated
    todate is enormous both horizontally and
    vertically.
  • A typical pest scouting sheet consists of 35
    variables or attributes.
  • Metrological data consists of 50 variables.
  • Coarse estimate of pest scouting data recorded
    for the cotton crop alone stands at 5 million
    records, and growing.
  • Tasking the human brain alone, for synthesis of
    information from this data is not only
    impractical but dangerous too.
  • Need a Data Warehouse, OLAP tools and Data Mining
    to analyze the data.

11
Agro-Informatics
  • I.T. sector is at the heart of the economic
    revival of Pakistan President of Pakistan,
    Launching of VU, Mar. 23, 2003.
  • Agriculture is the backbone of our economy, upto
    74 of the population is dependent on it.
  • IT is an enabler, and has the potential to
    benefit everyone when applied in Agriculture.
  • IT Agriculture A win-win scenario.

12
Agro-Informatics Relevant URL
To know more about Agro-Informatics,
visit www.agroict.org
13
How to go about?
  • Discussed several DWH implementation
    methodologies in lectures 32-35.
  • Will adopt a pilot project approach, because
  • A full-blown DWH requires extensive investment.
  • Show users the value of DSS.
  • Establish blue print for full-blown system.
  • Identify problem areas.
  • Reveal true data demographics.
  • Pilot projects are supposed to work with limited
    data.

14
The 12-step Approach of Shaku Atre
15
Step-1 Determine Users needs
  • A data warehouse or a data mart?
  • User not educated, think DWH is a big Excel
    sheet.
  • Some confuse DSS with Decision Making Systems.
  • DWH at Federal level.
  • Data marts at district level.
  • What are the business objectives?
  • Good governance via E-Governance.
  • Timely and data driven decision making.
  • Fore warning of farmers.
  • Timely planning for purchase/stocking of
    pesticides.

16
Step-1 Determine Users needs
  • Cost/benefit analysis, project estimation risk
    assessment
  • Cost of a prototype system fraction of the loss
    in one year.
  • Required data either not made available or
    expensive.
  • Required data in very raw format.
  • People hungry for ideas looking for a solution on
    a plate.
  • Where do I get the data from?
  • Data available at Directorate of pest warning.
  • Raw form
  • Metrological department asking too much for data
    collected using public funds..
  • Weather forecasts downloaded from web site of
    www.dawn.com

17
Steps-23 Determine DBMS Sever Hardware
Platform
  • Hardware and Software along with trained people
    was already available on Campus.
  • The pilot DWH was implemented using NCR Teradata
    Data Warehousing solution for NT running on a
    dual Intel 950 Mhz Xeon processors server and 1GB
    of RAM.
  • Total internal Hard Disk capacity of the server
    amounted to 36 GB while external RAID control
    supports 8 additional SCSIs of 18 GB each.

18
Step-4 Information Data Modeling
Dimensional Modeling
19
Step-4 Information Data Modeling
Simplified ERD
Other field inputs such as irrigation, fertilizer
etc. not included as data not available
20
Step-4 Information Data Modeling
KEY WFN White Fly Nymph WFA White Fly Adult W
White B Brown S Small Larvae L Large Larvae
(optional)
Logical Physical Design
21
Step-5 Construct Metadata Repository
  • What means what was important.
  • Who access what was not important as single user
    environment.
  • Meetings held similar to requirements definition,
    and questions transformed into SQL queries.
  • Meta data issues still cropped up, more of
    business rules.

22
Step-5 Surprise case
Sucking pests
Ball Worm Complex
SBW Spotted Ball Worm ABW Army Ball Worm PBW
Pink Ball Worm
If pest population is low, predator population
will also be low, because there will be less
food for predators to live on i.e. pests.
23
Step-6 Data Acquisition Cleansing
Hand filled pest scouting sheet
Typed pest scouting sheet
24
Step-6 Issues
  • The pest scouting sheets are larger than A4 size
    (8.5 x 11), hence the right end was cropped
    when scanned on a flat-bed A4 size scanner.
  • The right part of the scouting sheet is also the
    most troublesome, because of pesticide names for
    a single record typed on multiple lines i.e. for
    multiple farmers.
  • As a first step, OCR (Optical Character Reader)
    based image to text transformation of the pest
    scouting sheets was attempted. But it did not
    work even for relatively clean sheets with very
    high scanning resolutions.
  • Subsequently DEOs (Data Entry Operators) were
    employed to digitize the scouting sheets by
    typing.

25
Step-6 Why the issues?
  • Major issues of data cleansing had arisen due to
    data processing and handling at four levels by
    different groups of people
  • Hand recordings by the scouts at the field level.
  • Typing hand recordings into data sheets at the
    DPWQCP office.
  • Photocopying of the typed sheets by DPWQCP
    personnel.
  • Data entry or digitization by hired data entry
    operators.

26
Step-7 Transform, Transport Populate
27
Motivation For Transformation
  • Trivial queries give wrong results.
  • Static and dynamic attributes
  • Static attributes recorded repeatedly.

28
Step-7 Resolving the issue
  • Solution Individualization of cultivated fields.
  • Technique similar to BSN used to fix names.
  • Unique ID assigned to farmers.
  • BSN used again, and unique ID assigned to fields.
  • Results
  • Limitation Field individualization not perfect.
    Some cases of farmers with same geography, sowing
    date, same variety and same area. Such cases
    were dropped.

29
Step-8 Middleware Connectivity
  • Since the source data is maintained in a non
    digital format, hence connectivity with the data
    warehouse was irrelevant.
  • Once digitized, it was rather straightforward to
    load data into the warehouse.
  • Furthermore, in the foreseeable future, it was
    not anticipated that the scouting sheets were
    going to be maintained in a digitized form.

30
Step-9-11 Prototyping, Querying Reporting
  • Implemented the prototype with user involvement.
  • Applications developed
  • 10. A data mining tool was also developed based
    on an indigenous technique that used crossing
    minimization paradigm for unsupervised
    clustering.
  • 11. A low-cost OLAP tool was indigenously
    developed actually it was a Multi dimensional
    OLAP or MOLAP.
  • Use querying reporting tools
  • The following SQL query was used for validation
  • SELECT Date_of_Visit, AVG(Predators),
    AVG(Dose1Dose2Dose3Dose4)
  • FROM Scouting_Data
  • WHERE Date_of_Visit lt 12/31/2001 and predators
    gt 0
  • GROUP BY Date_of_Visit

31
Step-12 Deployment System Management
  • Since a pilot project, therefore, the
    traditional deployment methodologies and system
    management techniques were not followed to the
    word, and are not discussed here.

32
Agri-DSS usage Data Validation
  • Quality and validity of the underlying data is
    the key to meaningful and authentic analysis.
  • After ensuring a satisfactory level of data
    quality (based on cost-benefit trade-off)
    extremely important to scientifically validate
    the data that the DWH will constitute.
  • Some very natural checks were employed for this
    purpose. Relationship between the pesticide
    spraying and predator (beneficial insects)
    population is a fact well understood by
    agriculturists.
  • Predator population decreases as pesticide spray
    increases and then continually decreases till the
    end of season.

33
Agri-DSS usage Data Validation Graph
34
Agri-DSS usage Spray Dates
  • Working Behaviors at Field Level Spray dates
  • As expected, apparently spray dates and spray
    frequency for 2001 and 2002 do not displayed any
    correlation as it is dependent on pest
    populations, availability of pesticides etc.
  • For deep analysis, moving average of sprays for
    five days, and a moving correlation of sprays for
    five days were calculated and plotted.
  • For the sake of uniformity, the moving average of
    spray was normalized using the maximum spray
    frequency.

35
Agri-DSS usage Spray Dates Graph
No relationship should have existed for the two
years. But note the surprising finding that most
sprays occurring on and around 12th Aug. in BOTH
years with high correlation, appearing as a
spike. Also note the dip in sprays around 11th
Sep.! Sowing at predetermined time makes sense,
as it is under the control of the farmer, but
that is not true for spraying. Pests dont follow
calendars therefore, whenever, ETL_A is crossed
pesticides are sprayed.
36
Agri-DSS usage Explaining Findings
14th Aug. is the independence day of Pakistan and
a national holiday. In Pakistan, people are in
a habit of sandwiching gazetted holidays with
casual leaves consequently businesses are closed
for a longer period, including that of pesticide
suppliers. 14th Aug. occurred on Tue and Wed in
2001 and 2002, respectively, thus making it ideal
to stretch the weekend. During Aug/Sep.
humidity is also high, with correspondingly high
chances of pest infestations. Therefore,
apparently the farmers decided not to take any
chances, and started spraying around 11th Aug.
evidently even when it was not required. The
weather forecast for 13 Aug. 2001 and 2002 was
showers and cloudy, respectively. Therefore, most
likely the pesticide sprayed was washed-off.
Decline in sprays around 9/11 could not be
explained.
37
Agri-DSS usage Sowing Dates
The results of querying the sowing date based on
the day of the week are shown below Obs
erve least number of sowings done on Thursdays,
in each year. This finding was later confirmed by
extension personnel. Multan is famous for its
shrines. Thursdays are usually related with
religious festivals and activities, a mix of
devotion and recreation, and usually held at
shrines, hence a tendency of doing less work on
Thursdays. Similar behavior was observed for
spraying too.
38
Conclusions Lessons
  • ETL is a big issue. There are no digitized
    operational databases, data entry of these sheets
    is very expensive, slow and prone to errors.
  • Each farmer is repeatedly visited by agriculture
    extension people. This results in repetition of
    information, about land, sowing date, variety
    etc. Hence, farmer and land individualization are
    critical. Such an individualization task is hard
    to implement for multiple reasons.
  • There is a skewness in the scouting data. Public
    extension personnel (scouts) are more likely to
    visit educated or progressive farmers, as it
    makes their job of data collection easy.
    Furthermore, large land owners and influential
    farmers are also more frequently visited by the
    scouts. Thus the data does not give a true
    statistical picture of the farmer demographics.
  • All pests are not present all the time (such as
    ball worms), most of the time second spray is not
    done (or not recorded), and hence tables are
    sparse. Therefore, it is better to split tables
    so as to decrease header size and table space.
  • Unlike traditional data warehouse where the end
    users are decision makers, here the
    decision-making goes all the way down to the
    extension level. This presents a challenge to the
    analytical operations designer, as the findings
    must be fairly simple to understand and
    communicate.
Write a Comment
User Comments (0)
About PowerShow.com