Title: CountryLevel Variation in Open Source Software Policy and Environment
1Country-Level Variation in Open Source Software
Policy and Environment
ICT Roundtable February 19, 2009
2Outline
- OSS background/theory/empirics
- Interviews
- Data collection
- Index construction
- Maps
- SLOSI
3OSS Background
- Free/Libre Open Source Software (FLOSS)
- Passion, noise, potential, and accomplishments
- Relatively new area for academic inquiry
- Lots of interesting questions about
- Decisions to contribute / coordinate / adopt
- Quality / Cost advantages over closed-source
- Policy levers to affect OSS development
4Understanding OSS
- Economics and business of OSS
- altruism, signaling, reputation
- alumni effects, lock-in, standards
- business models (complementarity), hierarchy
- Law and economics of OSS
- licensing and IP
- Cultural studies of OSS
- hacker, international norms
5Overview of OSS knowledge
- Beliefs, hype, arguments and prophecies
- Lots of anecdotal evidence
- Firm- or project-specific stories
- Sampling on the dependent variable
- Snapshots of participants, adopter groups
- Paucity of systematic data collection and robust
hypothesis-testing
6OSS empirics
- Empirical evidence on OSS activity limited
- Open nature limits data collection
- Informal participation, org. structure
- difficult to monitor / efforts proprietary
- No transactions
- Dynamic, disaggregated, dispersed activity
- Predictors difficult to observe
- culture, training, altruism, reputation, etc.
- IT costs for OSS vs. other (IT) costs, output
7Interviews
- Red Hat requests development of OSS
- OSS activity? OSS potential?
- Background/Context
- Interviews with OSS professionals
- Red Hat executives/developers
- International OSS experts
- - (Brazil/Latin America/India/Singapore/Germany/Fr
ance)
8The goal?
- Develop an OSS Potential Index (OSPI)
- Global, country-by-country index
- akin to HDI, Index of Economic Freedom, etc.
- let policymakers and advocates (e.g., Red Hat?)
point to a particular countrys rank - Compare to neighbors, link to policies
9Index Design
- Develop OSS Potential Index (OSPI)
- theoretically relevant
- consistent data available
- direct and indirect measures
- Compile the index
- Test for sensitivity to construction
- Keep OSPI construction/composition open
10Data collection
- 750 variables collected
- All publicly available data
- Issues of coverage
- Across time, across countries
- Priority is establishing the framework
- open index to entice data provision
11Index construction
- Conceptual approaches
- Activity vs. Potential
- OSPI , OSAI
- Index composed of
- Dimensions government business
community/education - Indicators transformations of variables
- Variables
12INDEX f(Dimension 1, ..., Dimension i, .,
Dimension I) Dimension i g(Indicator 1,
..., Indicator j, , Indicator
J) Indicator j h(Variable j)
13INDEX f(Dimension 1, ..., Dimension i, .,
Dimension I) Dimension i g(Indicator 1,
..., Indicator j, , Indicator
J) Indicator j h(Variable j)
Active INDEX f(GA, FA, CA) Potential INDEX
f(GP, FP, CP) G government F firms or
commercial enterprises C community and
educational system
14Index construction
- Conceptual approaches
- Activity vs. Potential
- OSPI , OSAI
- Index composed of
- Dimensions government business
education/community - Indicators transformations of variables
- Variables
- direct (related to or impacting OSS) or indirect
(contextual) - (Arbitrary) weights
15Index construction
- Index theory
- Weights/transformations/aggregations affect
rankings - - E.g., the HDI rankings shuffle if ln(GDP) is
used - - No theory for appropriate transformation,
weights, in OSS index - Every numeric variable is ratio or interval
- - ratio has natural zero (e.g., pop., Firefox
installs) - - interval does not (e.g., F, Linux language
support) - Geometric means of ratio vars preserves rank
ordering (preferred as g function)
16proposed index structures
Index A
Index B direct indirect
Index C
OSAI
G
C
GA
FA
CA
OSPI
GP
FP
CP
17Index construction
- Variables classified as
- Active / Potential
- Direct / Indirect
- Long / Short
- Ratio / Interval
- Missing values create problems
- Transformations of variables to remove scale
- Z-scores used here (as h function)
- Aggregations (g function)
- Weights arbitrary? endogenous?
18Index construction
- Lots going on
- 2 primary indices (active potential)
- 5 aggregation rules for f
- arithmetic mean, maximin, minimean, geometric
mean, R2 weights - 2 sets of variables (long short)
- 3 dimensions (govt, firms, community/edu)
- A total of 60 combinations
19- 11 out of 57 cells lack suitable and available
variables - 2 out of 23 indicators had no available variables
- All Potential indicators have an available
variable
20Summary of results
- Correlations among rank-orderings were high
across different aggregation rules - Rankings are fairly stable
- Geometric mean rankings were least correlated
with other rules - Recommended indices (a.m. g.m., su.am.) are
correlated in value but do differ in ranks - Correlated at 0.79 for OSAI, OSPI
- a.m. and g.m. correlated at 0.87 (P) and 0.67 (A)
- Coverage a.m. has most (Ngt132), g.m. has least
(N51), su.am. in middle L has double Ss
coverage
21Frontier analysis
- Stochastic frontier analysis uses background
attributes or endowment to predict OSPI / OSAI - Lets us inductively see if factors (e.g., income,
education) predicts index score - Lets us see which countries under- and
over-achieve based on their endowments - Another index thus becomes available
22Maps
OSAI (arithmetic mean, long) Green indicates
low, Red indicates high
23Maps
OSPI (arithmetic mean, long) Green indicates
low, Red indicates high
24Maps
OSAI (geometric mean, long) Green indicates low,
Red indicates high
25Maps
OSPI (geometric mean, long) Green indicates low,
Red indicates high
26Maps
OSAI (weighted mean, long) Green indicates low,
Red indicates high
27Maps
OSPI (weighted mean, long) Green indicates low,
Red indicates high
28Maps
OSAIeff (geometric mean, long) Green indicates
low, Red indicates high
29Maps
OSPIeff (geometric mean, long) Green indicates
low, Red indicates high
30State-level index
- Experimented with constructing a within-US index
(SLOSI) - Take 5 indicators (per capita)
- Rank each 1 lowest value
- Sum the states rank across all 5 indicators
- Rank the sum the SLOSI
31State-level index
- 5 indicators
- open source hits in state website,
- firefox hits in state website,
- Linux jobs _at_ Monster,
- open source jobs _at_ Monster,
- Linux user groups in state
- Also collected a statepolicy variable
- State-level policy activity concerning OSS
binary (0/1)
32(No Transcript)
33State maps
34State maps
35Predictions
- logit model predicts statepolicy using things
like - SLOSI
- Number of software companies per capita
- Not so successful.
- At best, only OSjobs significant and positively
related - Measurement error? Endogeneity?
- OSS policy may arise when OSS is strong or
because it is weak and needs policy help
36Acknowledgements
- The authors wish to acknowledge the generous
support of Red Hat, Inc. in funding this
research. - The authors also wish to recognize research
assistance provided by the following individuals
Art Seavey, Nathan W. Moon, Ankit Kharadi and
Saswat Anand.