Title: Development of large-scale applications with Stata
1Development of large-scale applications with Stata
- Michael Lokshin, Sergiy Radyakin and Zurab Sajaia
- World Bank
2Analytical work at the World Bank
- Each year World Bank produces
- 10-15 poverty assessments
- 5-10 Labor market studies
- 10 Education and Health assessments
- Gender studies
- Nutritional Studies
- Reports on Social protection and
Benefit-Incidence analysis, etc. - Most analytical work for these reports is done in
Stata - Research Department (DECRG) of the World Bank
develops new methods and tools that are used in
these reports and need to be make accessible to a
wide audience of practitioners of applied
economic analysis
3Stata in the World Bank
- Stata is the main statistical package used in the
Bank - Hundreds of users both in the HQ and regional
offices - Many users are short-term consultants with
limited skills in Stata programming - Consultants are hired on a project and leave the
Bank after the project is completed - Difficult to impose rules of a programming style,
code documentation, archiving - Many Stata programs are lost or undocumented and
are difficult to reuse - There is a need to automate the analytical work
conducted in the Bank
4Stata routines developed in DECRG
- Poverty analysis toolkit
- Growth-inequality decomposition
(gedecomposition.ado) - Sectoral poverty decomposition (sedecomposition.ad
o) - Growth-incidence curves (gicurves.ado)
- Stochastic dominance analysis (pov_robust.ado)
- egen extension for inequality and poverty
measures - Fast algorithm for calculation of Gini
coefficients (fastgini.ado) - Applied Economic Research
- FIML algorithm of two-equation ordered probit
models with endogeneity - FIML estimation of the endogenous switching
regression model - Selection models based on ordered probit
- Semi-parametric difference-based estimation of
partial linear regression models - Selecting a subset of variables providing the
models best fit - Efficient estimation of regressions based on
pseudo-panel data - LOOKFOR_ALL - an extention of a Stata program
lookfor - xml_tab.ado Saving the outputs from Stata
estimation procedures in Microsoft Excel - usespss.ado use10.ado read SPSS files into
Stata read Stata 10 files in Stata 9. - Many other Stata routines
5Automated Economic Analysis
- Speed-up production of basic (required) results
- Minimize human errors
- To free resources for more meaningful and
interesting tasks. - Easily introduce new techniques and methods
- Allow easy replication of previous results
- Generate standard, comparable results across the
countries/years. - A tool for simulations
- A tool for sensitivity analysis and training.
- Helpful in situation of limited data access
- Simple checking of previous reports/results
- Minimize training time and skills requirements
6(No Transcript)
7ADePT Software platform for automated economic
analysis
Stata Computation Kernel
Request for computations
ADePT User Interface
Output in XLS or PDF format xml_tab.ado
Set of Stata and MATA routines plug-ins
Version 3 Customized Stata dialogs,
classes Version 4 User interface in C
100,000 lines of code Multiple version
support Team Development
8ADePT Solutions
- ADePT offers users a solution of a particular
problem. - Modules of ADePT set of analytical results
(tables, graphs) sufficient to give an answer to
a particular question. - Combination of software tools and the substantive
contributions from the experts in a field. - Garry Fields (Cornell) Labor
- Martin Ravallion (WB) Poverty
- Adam Wagstaff (WB) Health
- Two main directions of ADePT
- Assessments of the current situation
- Projections and simulations
9ADePT V4.0
- Accepts individual-level and household data in
Stata and SPSS format. Uses Stata for
computations. - Possibility of remote computing
- No prior knowledge of Stata is required
- Minimal data preparation
- Extensive checks on possible problems with the
data - Control for influential outliers
- Tested on the datesets from more than 50
countries LSMS, HBS, DHS - Estimated 500 users in the WB, international
research institutions, universities, government
agencies. - Expected increase in the number of users when new
modules are released
10ADePT V4.0 The roadmap
- ADePT Poverty Public Release June 2007
- ADePT MAPS Public Release October 2007
- ADePT Labor Public Release November 2007
- ADePT Gender Public Release November 2008
- ADePT Social
- Protection Public release
June 2009 - ADePT Education Public Release June 2009
- ADePT Targeting Planned Release August 2009
- ADePT PLINES Development stage
- ADePT HEALTH Planned Release August 2009
- ADePT Inequality Planned Release August 2009
11ADePT Website
- www.worldbank.org/adept
- Download installation and updates,
documentation, examples.
12Practical issues
- Interface
- Performance (-ftabstat2-)
- Interaction/communication with other programs
(IniFile.class, -smtp-) - Graphics (-twoway parea-, -amap-)
- Custom file formats (-usespss-, -use10-)
- Installation and updates (-pkg2script-)
- Certification
13Practical issues Interface
- Dialogs in Stata can be created to facilitate the
use of custom written commands. But they are
highly oriented on forming a command line
command with parameters and options, not the full
application interface. - Some additional features were added in Stata 10
to expand the dialog possibilities, but they are
still very limited, and we had a constraint to
remain compatible with Stata9.2. - After exhausting standard dialogs features of
Stata we decided to remove the interface part
into an external application written in C
(Microsoft Visual Studio).
Released version 3.0 of ADePT used Stata dialogs
14Practical Issues Interface
Current version 4.0 of ADePT uses Windows forms
for dialogs
15Practical Issues Performance
- Statas built in routines seem to be very
efficient, but the code implemented in .ado
files is often quite slow. - In particular, -tabstat- has shown inadequate
performance for our tasks despite of its simple
nature. - It was rewritten as a plugin -ftabstat2- in C
(Microsoft Visual Studio) and modified to suit
our particular needs it now returns means,
totals, counts, and various proportions matrices
for each specified variable with support of
by()-rows and by()-cols - Trade-off no MP because plugins are (currently?)
single-threaded.
16Practical Issues Communication
- Interaction/communication with other programs we
needed to solve two problems - To provide an easy to handle job-file, which
would contain the description of all the
parameters and options for a large project (not
possible to fit everything in command line).
Transition from txt to ini-files. IniFiles.class - To provide communication between Stata and
another program while the computations are
performed in Stata, the external interface part
needs to be updated about the status of
calculations. We solved this by writing a C
plugin smtp- (SendMessageToPipe), which utilizes
Windows pipes for IPC
17Practical Issues Graphics
- We have faced some limitations of the Stata
graphics. Some of them were circumvented with
custom graphics commands or adaptations of
existing commands (-twoway parea-). - We didnt find any way to interact with the mouse
in Stata graphics (version 9.2). - We decided to move our mapping program amap- out
of Stata to external program and communicate with
it seamlessly via ini-files.
Demonstration only, not actual data
18Practical Issues File Formats
- We needed to have a support of SPSS files in
ADePT - We developed usespss- plugin to import SPSS data
to Stata - -usespss- was presented at SNASUG 2008 in Chicago
and made available to the public immediately
afterwards - We needed to provide Stata 9 users possibility to
process datasets saved in Stata 10 format. - We developed (using Mata) a new command use10-
for this purpose. Available at SSC.
http//repec.org/snasug08/radyakin_usespss.ppt fin
dit usespss findit use10
19Practical Issues Installation and Updates
- We have experienced problems with installing and
updating packages from our web site into Stata. - The problem was not due to Stata, but we received
a number of very helpful responses from the
StataCorps Tech Support Team on this issue. - Effectively, this problem ruled out -net install-
- We have developed a tool -pkg2script- to create
autonomous installations from one or more Stata
packages with the help of NSIS installation
system. - The tool will work in Windows only empty path
take package from SSC - In theory, all SSC could be packed into one
distributive like the one shown here
20Practical Issues Certification
- We have faced the problem of verification of
results. Checking the numbers by hand is long and
unreliable. - We have included a test-mode for ADePT, where it
- launched from an external application (tests
manager), - runs requested jobs, and
- verifies the output against a predefined set of
benchmarks, which were verified (confirmed by
non-team members). - We monitor whether the test succeeds (results
are produced), whether the results are correct,
and what time does it take to produce them.
If the benchmark for the current test does not
exist, ADePT will generate them from the current
results, and verify against this saved output
next time.
21Practical Issues Wishes for Stata12
- Access to registry (at least read-only) to detect
presence of other programs, their versions, and
location. (Currently solved with a plugin). - IPC pipes (currently solved with a plugin).
- Preserve/restore to RAM (currently solved with a
RAMDrive). - Extend plugins possibilities allow execute
commands like Mata can do it stata(command). - Support of Cyrillics/Local fonts
- Unicode??