Development of large-scale applications with Stata - PowerPoint PPT Presentation

About This Presentation
Title:

Development of large-scale applications with Stata

Description:

Growth-incidence curves (gicurves.ado) Stochastic dominance ... users in the WB, international research institutions, universities, government agencies. ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 22
Provided by: mlok9
Category:

less

Transcript and Presenter's Notes

Title: Development of large-scale applications with Stata


1
Development of large-scale applications with Stata
  • Michael Lokshin, Sergiy Radyakin and Zurab Sajaia
  • World Bank

2
Analytical work at the World Bank
  • Each year World Bank produces
  • 10-15 poverty assessments
  • 5-10 Labor market studies
  • 10 Education and Health assessments
  • Gender studies
  • Nutritional Studies
  • Reports on Social protection and
    Benefit-Incidence analysis, etc.
  • Most analytical work for these reports is done in
    Stata
  • Research Department (DECRG) of the World Bank
    develops new methods and tools that are used in
    these reports and need to be make accessible to a
    wide audience of practitioners of applied
    economic analysis

3
Stata in the World Bank
  • Stata is the main statistical package used in the
    Bank
  • Hundreds of users both in the HQ and regional
    offices
  • Many users are short-term consultants with
    limited skills in Stata programming
  • Consultants are hired on a project and leave the
    Bank after the project is completed
  • Difficult to impose rules of a programming style,
    code documentation, archiving
  • Many Stata programs are lost or undocumented and
    are difficult to reuse
  • There is a need to automate the analytical work
    conducted in the Bank

4
Stata routines developed in DECRG
  • Poverty analysis toolkit
  • Growth-inequality decomposition
    (gedecomposition.ado)
  • Sectoral poverty decomposition (sedecomposition.ad
    o)
  • Growth-incidence curves (gicurves.ado)
  • Stochastic dominance analysis (pov_robust.ado)
  • egen extension for inequality and poverty
    measures
  • Fast algorithm for calculation of Gini
    coefficients (fastgini.ado)
  • Applied Economic Research
  • FIML algorithm of two-equation ordered probit
    models with endogeneity
  • FIML estimation of the endogenous switching
    regression model
  • Selection models based on ordered probit
  • Semi-parametric difference-based estimation of
    partial linear regression models
  • Selecting a subset of variables providing the
    models best fit
  • Efficient estimation of regressions based on
    pseudo-panel data
  • LOOKFOR_ALL - an extention of a Stata program
    lookfor
  • xml_tab.ado Saving the outputs from Stata
    estimation procedures in Microsoft Excel
  • usespss.ado use10.ado read SPSS files into
    Stata read Stata 10 files in Stata 9.
  • Many other Stata routines

5
Automated Economic Analysis
  • Speed-up production of basic (required) results
  • Minimize human errors
  • To free resources for more meaningful and
    interesting tasks.
  • Easily introduce new techniques and methods
  • Allow easy replication of previous results
  • Generate standard, comparable results across the
    countries/years.
  • A tool for simulations
  • A tool for sensitivity analysis and training.
  • Helpful in situation of limited data access
  • Simple checking of previous reports/results
  • Minimize training time and skills requirements

6
(No Transcript)
7
ADePT Software platform for automated economic
analysis
Stata Computation Kernel
Request for computations
ADePT User Interface
Output in XLS or PDF format xml_tab.ado
Set of Stata and MATA routines plug-ins
Version 3 Customized Stata dialogs,
classes Version 4 User interface in C
100,000 lines of code Multiple version
support Team Development
8
ADePT Solutions
  • ADePT offers users a solution of a particular
    problem.
  • Modules of ADePT set of analytical results
    (tables, graphs) sufficient to give an answer to
    a particular question.
  • Combination of software tools and the substantive
    contributions from the experts in a field.
  • Garry Fields (Cornell) Labor
  • Martin Ravallion (WB) Poverty
  • Adam Wagstaff (WB) Health
  • Two main directions of ADePT
  • Assessments of the current situation
  • Projections and simulations

9
ADePT V4.0
  • Accepts individual-level and household data in
    Stata and SPSS format. Uses Stata for
    computations.
  • Possibility of remote computing
  • No prior knowledge of Stata is required
  • Minimal data preparation
  • Extensive checks on possible problems with the
    data
  • Control for influential outliers
  • Tested on the datesets from more than 50
    countries LSMS, HBS, DHS
  • Estimated 500 users in the WB, international
    research institutions, universities, government
    agencies.
  • Expected increase in the number of users when new
    modules are released

10
ADePT V4.0 The roadmap
  • ADePT Poverty Public Release June 2007
  • ADePT MAPS Public Release October 2007
  • ADePT Labor Public Release November 2007
  • ADePT Gender Public Release November 2008
  • ADePT Social
  • Protection Public release
    June 2009
  • ADePT Education Public Release June 2009
  • ADePT Targeting Planned Release August 2009
  • ADePT PLINES Development stage
  • ADePT HEALTH Planned Release August 2009
  • ADePT Inequality Planned Release August 2009

11
ADePT Website
  • www.worldbank.org/adept
  • Download installation and updates,
    documentation, examples.

12
Practical issues
  • Interface
  • Performance (-ftabstat2-)
  • Interaction/communication with other programs
    (IniFile.class, -smtp-)
  • Graphics (-twoway parea-, -amap-)
  • Custom file formats (-usespss-, -use10-)
  • Installation and updates (-pkg2script-)
  • Certification

13
Practical issues Interface
  • Dialogs in Stata can be created to facilitate the
    use of custom written commands. But they are
    highly oriented on forming a command line
    command with parameters and options, not the full
    application interface.
  • Some additional features were added in Stata 10
    to expand the dialog possibilities, but they are
    still very limited, and we had a constraint to
    remain compatible with Stata9.2.
  • After exhausting standard dialogs features of
    Stata we decided to remove the interface part
    into an external application written in C
    (Microsoft Visual Studio).

Released version 3.0 of ADePT used Stata dialogs
14
Practical Issues Interface
Current version 4.0 of ADePT uses Windows forms
for dialogs
15
Practical Issues Performance
  • Statas built in routines seem to be very
    efficient, but the code implemented in .ado
    files is often quite slow.
  • In particular, -tabstat- has shown inadequate
    performance for our tasks despite of its simple
    nature.
  • It was rewritten as a plugin -ftabstat2- in C
    (Microsoft Visual Studio) and modified to suit
    our particular needs it now returns means,
    totals, counts, and various proportions matrices
    for each specified variable with support of
    by()-rows and by()-cols
  • Trade-off no MP because plugins are (currently?)
    single-threaded.

16
Practical Issues Communication
  • Interaction/communication with other programs we
    needed to solve two problems
  • To provide an easy to handle job-file, which
    would contain the description of all the
    parameters and options for a large project (not
    possible to fit everything in command line).
    Transition from txt to ini-files. IniFiles.class
  • To provide communication between Stata and
    another program while the computations are
    performed in Stata, the external interface part
    needs to be updated about the status of
    calculations. We solved this by writing a C
    plugin smtp- (SendMessageToPipe), which utilizes
    Windows pipes for IPC

17
Practical Issues Graphics
  • We have faced some limitations of the Stata
    graphics. Some of them were circumvented with
    custom graphics commands or adaptations of
    existing commands (-twoway parea-).
  • We didnt find any way to interact with the mouse
    in Stata graphics (version 9.2).
  • We decided to move our mapping program amap- out
    of Stata to external program and communicate with
    it seamlessly via ini-files.

Demonstration only, not actual data
18
Practical Issues File Formats
  • We needed to have a support of SPSS files in
    ADePT
  • We developed usespss- plugin to import SPSS data
    to Stata
  • -usespss- was presented at SNASUG 2008 in Chicago
    and made available to the public immediately
    afterwards
  • We needed to provide Stata 9 users possibility to
    process datasets saved in Stata 10 format.
  • We developed (using Mata) a new command use10-
    for this purpose. Available at SSC.

http//repec.org/snasug08/radyakin_usespss.ppt fin
dit usespss findit use10
19
Practical Issues Installation and Updates
  • We have experienced problems with installing and
    updating packages from our web site into Stata.
  • The problem was not due to Stata, but we received
    a number of very helpful responses from the
    StataCorps Tech Support Team on this issue.
  • Effectively, this problem ruled out -net install-
  • We have developed a tool -pkg2script- to create
    autonomous installations from one or more Stata
    packages with the help of NSIS installation
    system.
  • The tool will work in Windows only empty path
    take package from SSC
  • In theory, all SSC could be packed into one
    distributive like the one shown here

20
Practical Issues Certification
  • We have faced the problem of verification of
    results. Checking the numbers by hand is long and
    unreliable.
  • We have included a test-mode for ADePT, where it
  • launched from an external application (tests
    manager),
  • runs requested jobs, and
  • verifies the output against a predefined set of
    benchmarks, which were verified (confirmed by
    non-team members).
  • We monitor whether the test succeeds (results
    are produced), whether the results are correct,
    and what time does it take to produce them.

If the benchmark for the current test does not
exist, ADePT will generate them from the current
results, and verify against this saved output
next time.
21
Practical Issues Wishes for Stata12
  • Access to registry (at least read-only) to detect
    presence of other programs, their versions, and
    location. (Currently solved with a plugin).
  • IPC pipes (currently solved with a plugin).
  • Preserve/restore to RAM (currently solved with a
    RAMDrive).
  • Extend plugins possibilities allow execute
    commands like Mata can do it stata(command).
  • Support of Cyrillics/Local fonts
  • Unicode??
Write a Comment
User Comments (0)
About PowerShow.com