Title: Handling social science data: Challenges and responses
1Handling social science data Challenges and
responses
- Paul Lambert, University of Stirling
- DAMES research Node, www.dames.org.uk
2What is social science data?
Example Accessing surveys via UK Data
Archive Shibboleth authentication Download and
analyse in Stata, SPSS, etc
3Principal forms of data
- Large and complex social surveys
- Longitudinal cross-national hierarchical
- Small scale social surveys
- Administrative data (e.g. ADMIN node ADLS
commercial data) - Supplementary (digital) data
- E.g. GESDE services at DAMES
- Qualitative material auido / video / textual
17/MAR/2010
DIR workshop Handling Social Science Data
3
4Large and complex social surveys
- several thousand variables
- tens of thousands of cases (micro-data)
- additional complex survey data features (e.g.
household clustering)
17/MAR/2010
DIR workshop Handling Social Science Data
4
5Complex data example British Household Panel
Survey dataset SN 5151
- This example shows BHPS being analysed in Stata.
BHPS re-contacts subjects annually (since 1991) - 4294 interviewed as adults every year for 17
years. - Analysis methods, and measurement issues over
time, are challenging.
6Supplementary (digital) data
- E.g. Occupational information resources data
files within information on occupations, which
can be usefully linked to micro-data about
occupations - e.g. GEODE acts as a
- library of OIRs,
- www.geode.stir.ac.uk
- Such resources are often
- not widely known about,
- but have the ability to
- enhance analysis
17/MAR/2010
DIR workshop Handling Social Science Data
6
7Example Qualitative data used by Digital
Records for e-Social Science (DReSS)
video
- transcribed talk
- audio / video
- digital records
- system logs
- location
code tree
transcript
system log
8Three well-known challenges
- Were data rich, but analysts poor
- UK Data Forum (2007) Wiles et al (2009)
- Under-use of suitably complex statistical models
- Coordination and communication on data processing
- Recodes / Standardisation / harmonisation /
documentation - Not rewarded/incentivised to researchers
- Lack of generic/accessible representation of
tasks - Limited disciplinary/project/researcher
cross-over when dealing with data - Specific software orientations
- These are not generally problems of scale, but of
organisation
9Managed responses?
- Data handling/analysis capacity-building
- ESRC programmes (NCRM, RDI, RMP) training
workshops/materials P/G funds strategic
research grant investment - Documentation/replication policies
- Dale (2006)
- Software for data access and analysis
- NESSTAR UK Data Archive data/metadata browser
- Long (2009) on the Stata software
- Remote access to data (e.g. SDS)
10..train and/or constrain the analysts..
11..constrain the analysis..
12Non-hierarchical responses?
- Technological collaborative services might
support effective, unmanaged data access,
coordination and exploitation - (in principle)
- UK e-Social Science investment in data oriented
social science research support - NeISS E-Stat DAMES Obesity e-Lab CQeSS
13..some examples..
- National e-Infrastructure for Social Simulation
- Expert led simulation demonstrations
- Combining data resources
- Workflows for the simulation analysis
- Modify and re-specify existing simulation
templates
- Design a tool to specify complex statistical
models in generic / visual terms - Multilevel models
- Multiple data permutations and analytical
alternatives - Ready access to a suite of complex modelling tools
14DAMES online services for data
coordination/organisation
- Tools for handing variables in social science
data - Recoding measures standardisation /
harmonisation Linking Curating
15GESDE Search and browse supplementary data on
occupations educational qualifications ethnicity
16- Data curation tool (for collecting metadata)
17Handling data analysis-oriented data management
priorities
- Data collection or creation
- Data preservation or curation
- Data enhancement/modification
- Data analysis
- Multiple permutations of related analyses
- Documentation and replication
18Ideas on the future of social science research
data
- Enduring challenges of documentation for
replication, and coordination - More and more comparative analysis
- Harmonisation and standardisation
- Data linkage and data enhancement
- Models for complex multiprocess systems
- Fluency increasing uptake by more users
19References and Links
- ADLS http//www.adls.ac.uk/
- ADMIN Node http//www.ncrm.ac.uk/about/organisati
on/Nodes/ADMIN/ - DAMES Node http//www.dames.org.uk/
- DReSS http//web.mac.com/andy.crabtree/NCeSS_Digi
tal_Records_Node/ - Secure Data Service http//securedata.ukda.ac.uk/
- UK Data Archive http//www.data-archive.ac.uk/
- Dale, A. (2006). Quality Issues with Survey
Research. International Journal of Social
Research Methodology, 9(2), 143-158. - Long, J. S. (2009). The Workflow of Data Analysis
Using Stata. Boca Raton CRC Press. - Wiles, R., Bardsley, N., Powell, J. L. (2009).
Consultation on research needs in research
methods in the UK social sciences. Southampton
University of Southampton / ESRC National Centre
for Research Methods, and http//eprints.ncrm.ac.u
k/810/