Title: pROCESI DELA V ARHIVU DRU
1pROCESI DELA V ARHIVU DRUŽBOSLOVNIH PODATKOV
SODELOVANJE S STATISTICNIM URADOM RS IN
PARTNERSTVO PRI PROJEKTU DWB
- Irena Vipavc Brvar in Sebastian Kocar
- Arhiv družboslovnih podatkov
- Fakulteta za družbene vede, Univerza v Ljubljani
- 19. marec 2013
2 - PREGLED PREDSTAVLJENIH VSEBIN
- 1.) Povzetek sodelovanja ADP SURS
- 2.) Hramba podatkov v Arhivu družboslovnih
podatkov - 3.) ADP metapodatki (DDI standard) in Nesstar
- 4.) Projekt DwB in dostop do mikropodatkov uradne
statistike - 5.) Predstavitev opravljenega dela pri
sodelovanju ADP SURS - 6.) Splošna diskusija
3 Povzetek sodelovanja ADP SURS
4 - ZGODOVINA SODELOVANJA ADP - SURS
- distribucija anonimiziranih mikropodatkov in
pripravljenih metapodatkov na spletni strani ADP - Anketa o delovni sili, Anketa o porabi v
gospodinjstvih, Anketa o žrtvah kriminala, Anketa
o porabi casa, Popis 2002 (vzorec) - manj intenzivno sodelovanje v obdobju 2002-2011
- partnerstvo obeh organizacij pri mednarodnem
projektu DwB
5 - DATA WITHOUT BOUNDARIES (PODATKI BREZ MEJA)
- mednarodni (Evropska komisija FP7) projekt, 29
partnerjev statisticni uradi, arhivi,
raziskovalni centri, univerze - povecano zanimanje raziskovalcev za mikropodatke
uradne statistike, ki so premalo rabljeni v
raziskovalne namene (znanstvene/akademske) cilj
projekta je omogociti lažji dostop in
kakovostnejše delo raziskovalcev s podatki uradne
statistike - portal z vsemi potrebnimi informacijami za
raziskovalce
6 - POMEN IN CILJI SODELOVANJA ADP - SURS
- izboljšati stanje na podrocju dostopa
raziskovalcev do podatkov uradne statistike v
Sloveniji - skupaj promovirati in povecati rabo podatkov
uradne statistike v znanstvene in akademske
namene - lažje doseci cilje projekta DwB, bolj kakovostno
opraviti delo na svojih delovnih paketih - doprinos na podrocjih, za katere sta
organizaciji specializirani
7 - PODROCJA SODELOVANJA ADP SURS
- priprava mikropodatkov, namenjenih takojšnjim
statisticnim analizam varni sobi in preko dostopa
z daljave (distribucija SURS) - crpanje metapodatkov iz metapodatkovnih sistemov
SURS-a - priprava strukturiranih metapodatkov (standard
DDI) - priprava anonimiziranih mikropodatkov za manj
zahtevne uporabnike (distribucija ADP) - promocija rabe mikropodatkov uradne statistike v
raziskovalne namene
8 - PRIPRAVA NEZAŠCITENIH MIKROPODATKOV
- priprava poteka v varni sobi
- uporablja se programski paket SPSS
- dodajanje label iz vprašalnika, dolocanje
manjkajocih vrednosti, logicna kontrola, brisanje
odvecnih spremenljivk, povezava podatkovne baze
in šifrantov - možnost izvoza podatkov v razlicne formate, ki
jih berejo razlicni statisticni programi
9 - CRPANJE METAPODATKOV IZ METOPODATKOVNIH
- VIROV SURSA
- opravljena je bila analiza stanja na podrocju
metapodatkov - ugotovili smo, da so vsebine hranjene v
razlicnih bazah, aplikacijah in dokumentih in jih
brez osebnega napora vecje skupine ljudi ni
mogoce neposredno združiti v dokument opisa
raziskave - crpanje metapodatkov je mogoce iz LPSR-jev,
standardnih porocil, metodoloških pojasnil,
porocil Eurostatu, Eurostatovih dokumentov
10 - PRIPRAVA STRUKTURIRANIH METAPODATKOV
- uporabljen je standard DDI
- pripravi se opis raziskave, razdeljen na
podrocja, kot so vsebina raziskave, metodologija,
opis datotek, opis podatkov - pripravi se vsa za raziskovalca potrebna
dokumentacija, kot so šifranti, vprašalniki,
publikacije, datoteke za cišcenje podatkovne
datoteke - metadokumentacija se crpa iz SURS in EUROSTAT
porocil (spletne strani) ter preko sodelovanja s
SURS oddelki
11 - PRIKAZ STRUKTURIRANIH METAPODATKOV ADP, SPLET
12 - PRIKAZ STRUKTURIRANIH METAPODATKOV ADP, NESSTAR
13 - PRIKAZ STRUKTURIRANIH METAPODATKOV SURS, OPISI
14 - PRIPRAVA ANONIMIZIRANIH VERZIJ MIKROPODATKOV
- anonimizacija originalne podatkovne datoteke
- izbor podvzorca, najnižje tveganje za razpoznavo
respondenta, ohranitev vzorcnih statistik
kljucnih spremenljivk, ohranitev vzorcne
strukture vzorca na podvzorcu - uporabljena programska paketa SPSS in R!,
najnovejše metode za zašcito podatkov - sodelovanje s Sektorjem za splošno metodologijo
in standarde - datoteka bo distribuirana na spletni strani ADP,
namenjena širšemu krogu manj zahtevnih
uporabnikov
15 - PROMOCIJA RABE MIKROPODATKOV IN PROMOCIJA
SODELOVANJA
- nacionalne konference (Informacijska družba
2012, Statisticni dnevi 2012, Sociološko
srecanje) - mednarodne konference (DwB regionalna
konferenca, IASSIST 2013, ESRA 2013) - spletni strani ADP, FDV
- spletna stran SURS (po posodobitvi)
- obvestilni seznam, sprotno obvešcanje
raziskovalcev preko elektronske pošte - promocija v sklopu projekta DwB
16 - REZULTATI SODELOVANJA IN SODELOVANJE V PRIHODNOSTI
- pripravljeni in distribuirani mikropodatki ter
metapodatki za serijo raziskav Anketa o delovni
sili (2001-2011) - sledi priprava podatkov za razvoj
mikrosimulacijskega modela, Popisa 2011 ter,
kasneje, drugih raziskovanj - spoznavanje postopkov za ucinkovito delo,
poglobljeno sodelovanje z oddelki, razvoj
postopkov dela, ucenje - sodelovanje z drugimi oddelki relativno majhen
input zaposlenih za velik output (korist
raziskovalcem) - kontinuirana priprava mikropodatkov in
metapodatkov ob podpori ADP-ja
17 Hramba podatkov v Arhivu družboslovnih podatkov
18Evaluation
- Self evaluation study in 2011
- Metadata standards
- Local file system for storing weekly backups.
- No user and version control.
- Using several applications.
- Need to automate capture of materials (SIP).
- Need to use permanent identifiers.
- -Good practice in partners institutions (UKDA,
ICSPR). - -Up to date technology support / new application
should be tailor maid to address current
challenges / issues/ gaps.
Need for new policy
19MANAGEMENT
-
- used for bug tracking, issue tracking,
- and PROJECT MANAGEMENT
20Evaluation
21Application
22URN
- URN (UNIFORM RESOURCE NAME)
23 ADP metapodatki (DDI standard) in Nesstar
24Metapodatki
Metapodatki
- Metapodatke lahko definiramo kot vse informacije
potrebne za obvešcanje in procesiranje
statisticnih struktur. (Grossmann v Vipavc in
Klep, 2003). - Pri kakovostnem oblikovanju metapodatkovnih
standardov imajo velik pomen uporabniki
informacij. - ? razvoj standardov (DDI)
- ? Mednarodna izmenjava opisov raziskav (zapis v
XML) - ? Možnost analize podatkov
25Metapodatki
Kaj hraniti
- - podatke,
- - spremljajoco dokumentacijo,
- - informacije o vzorcenju,... podatke, ki se
lahko zgubijo. -
- Spremljajoca dokumentacija naj vsebuje
informacije kot izvor podatkov kaj je bil
osnovni namen zbiranja kdo so bili avtorji in
narocniki oz. sponzorji kako so bili podatki
zbrani kakšni so pravni pogoji uporabe podatkov
opis spremenljivk kako so bili podatki združeni
kodirna shema v kakšnem formatu je hranjena
racunalniško berljiva podatkovna datoteka na
katerem mediju je hranjena.......
26Opis raziskave
Standard DDI 2.0
- Standard na katerem temelji priprava vsebin za
ADP je XML DDI (The Data Documentation
Initiative). - Po tem standardu je kodirna knjiga sestavljena
iz - Opis dokumenta (Document Description)
- Opis raziskave (Study Description)
- - Naslov, avtor, izdelava in distribucija
- - Vsebina raziskave
- - Metodologija
- - Dostop do podatkov
- Opis podatkov (Data Files Description)
- Opis spremenljivk (Variable Description)
- Ostali dokumenti (Other Documentation)
27Opis raziskave
Standard DDI 2.1 in njegova uporaba
Celotna DDI shema
ADP uporabljena shema
DwB WP5 uporabljena shema
28Opis raziskave
Standard DDI 2.1 opis podatkovne datoteke in
metapodatki SURS-a
ADP uporabljena file description polja
ltfileDscr URI"../podatki/ads/ads11_p1_sl_v1_r1.t
xt" ID"ADS11_P1_SL_V1_R1"gtltfileTxtgtltfileName
xmllang"sl-SI" ID"F1"gtADS11 - Anketa o delovni
sili, 2011 datlt/fileNamegtltdimensnsgtltcaseQnty
gt61888lt/caseQntygtltvarQntygt214lt/varQntygtlt/dimensn
sgtltfileType xmllang"sl-SI"gtf1lt/fileTypegtltfil
ePlacgtSURSlt/filePlacgtltdataChckgtPodatkovna
datoteka, na kateri je Statisticni urad Republike
Slovenije že izvedel logicno kontrolo.lt/dataChckgt
ltsoftwaregtSASlt/softwaregtltverStmtgtltversion
date"2012-08-29"gtavg 2012lt/versiongtltverRespgtS
URSlt/verRespgtltnotesgtOriginalna ASCII delimited
datoteka.lt/notesgtlt/verStmtgtlt/fileTxtgtlt/fileDscr
gt
29Opis raziskave
Standard DDI 2.1 opis podatkovne datoteke in
metapodatki SURS-a
Dodatno uporabljena polja pri pripravi
dokumentacije?
lttitl xmllang"en-GB"gttilt/titlgt ltAuthEnty
affiliation"SORS"gtAuthEnlt/AuthEntygt
ltproducer abbr"ADP" affiliation"ULJ"
xmllang"en-GB"gtADPlt/producergt ltprodDate
date"datizd" xmllang"en-GB"gtdatbeslt/prodDat
egt ltprodPlac xmllang"en-GB"gtLjubljana,
silt/prodPlacgt ltverRespgtresponsi, ADP
Irena Svetin, SORS Lenart Milan Lah, SORS Katja
Rutar, SORS Andreja Smukaveclt/verRespgt ltnotes
xmllang"en-GB"gtSebastian Kocar prepared the
study description with a help by SORS employees,
.lt/notesgt ltkeyword xmllang"en-GB"gthousehold
structurelt/keywordgt ltabstract
source"archive" xmllang"en-GB"gtSlovenian
Labour Force Survey 2010 was conducted...lt/abstrac
tgt lttimePrd event"start" date"2011"gt2011lt/t
imePrdgt ltcollDate event"start"
date"2010-01-04" xmllang"en-GB"gt2010-01-04lt/col
lDategt ltgeogCovergtsilt/geogCovergt
ltanlyUnitgtposlt/anlyUnitgt ltuniverse
clusion"I" xmllang"en-GB"gtThe target
population is the jure population,...lt/universegt
ltdataCollector abbr"SORS" affiliation"Governm
ent of the Republic of Slovenia"
xmllang"en-GB"gtSURSlt/dataCollectorgt
ltsampProc xmllang"en-GB"gtThe labour force
survey is based on the sample taken...lt/sampProcgt
ltcollMode xmllang"en-GB"gtcolP. Only when
surveying a household for the first
time.lt/collModegt ltresInstru xmllang"en-GB"gtr
ilt/resInstrugt ltcollSitugtThe interviewing is
held by the experienced interviewers under
...lt/collSitugt ltactMingtProxy interviewing is
allowed to achieve high response rate,
...lt/actMingt ltweight xmllang"en-GB"gtThe data
are weighted for unequal probability ...lt/weightgt
ltcleanOps xmllang"en-GB"gtThe data were
cleaned for ...lt/cleanOpsgt ltrespRate
xmllang"en-GB"gt79,7 - response rate of
households.lt/respRategt ltdataDscrgt
ltvargtv1_8lt/vargt ltlablgtSpollt/lablgt ltvarGrp
ID"VG1F1" type"subject" var"V1 V2 V3 V4 V11
V12 V13 V14 V15 V176 V178 V183"gt
ltlablgtDemography (household)lt/lablgt
30Procesi dela v Arhivu družboslovnih podatkov
sodelovanje s Statisticnim uradom RS in
partnerstvo pri projektu DwB
31Iskanje podatkov po serijah
32Iskanje podatkov po serijah
33Iskanje podatkov po serijah
Opis spremenljivk
Opis raziskave
Opis podatkov
Ostali dokumenti
34Iskanje podatkov po serijah
35Iskanje podatkov po serijah
36Iskanje podatkov po serijah
37Iskanje podatkov po serijah
38Iskanje podatkov po serijah
39Iskanje podatkov po serijah
40Opis raziskave
- NESSTAR je
- virtualna podatkovna knjižnica, ki omogoca
iskanje, lociranje, pregledovanje in snemanje
mnogo raznovrstnih statisticnih in drugih
podatkov in metapodatkov.
41Opis raziskave
42Opis raziskave
43Opis raziskave
Preprosta analiza podatkov z Nesstarjem primer
SJM
44Opis raziskave
45Opis raziskave
46Opis raziskave
- Za analizo podatkov potrebujemo uporabniško ime
in geslo za NESSTAR. - IZPOLNIM NAROCILO
47registracija
3
2
1
48registracija
Pri uporabniškem imenu namesto "_at_" vpišete "AT
Geslo je veljavno do konca tekocega študijskega
leta
!
49Opis raziskave
50Opis raziskave
51Opis raziskave
Rezultate dvo dimenzionalnih tabel (predvsem,
kadar se število enot med skupinami razlikuje)
prikazujemo v odstotkih po neodvisni
spremenljivki. Obicaj je, da se neodvisna
spremenljivka (npr. spol) prikazuje v stolpcu,
odvisna pa v vrstici.
52Opis raziskave
53Opis raziskave
54Iskanje
Iskanje spremenljivke
55Iskanje
Okno za napredno iskanje
56Iskanje
Rezultat iskanja besede ZRTEV
57Projekt DwB in dostop do mikropodatkov uradne
statistike
The DwB Project, a Short Overview
58Toward a European Research Infrastructure
The DwB Project, a Short Overview
Introduction
Project Focus and Mechanism
- A four-year EU-funded FP7-13 project (2011-2015)
- Aims
- Linking the capacity of the research community
with the important resources of the official
micro data in Europe - Enhancing researchers access to official micro
data in Europe - Surveys and administrative datasets, combined
files - Focus on confidential (highly detailed) data
- Focus on crossing national boundaries
- Mechanism Coordination of existing
infrastructures - CESSDA Data Archives, and the ESS (NSIs
coordinated by Eurostat, ECB) - Based on volunteers
59Partners
The DwB Project, a Short Overview
Introduction
Partnership
- 29 partners
- 1/3 CESSDA Archives CNRS/RQ, GESIS, NSD, SND,
FSD, DANS, UKDA, FORS, EKKE, CIS, RODA - 1/3 NSIs and Statistical departments ONS, CBS,
INSEE/GENES, SORS, IAB, SCB, DESTATIS, CSIC,
CNPS-INS - 1/3 Universities URV, UL, UPC, ULL, SOTON,
UoMan, CED (IPUMS) - MT (SME)
60From Current Situation
The DwB Project, a Short Overview
Context
- Access to official statistics both anonymized and
highly detailed is still uneven in Europe, both
at national and at European levels - Access to Eurostat highly anonymized datasets is
still burdensome - Increasing level of anonymization does not meet
the researchers needs - Though crucial for comparative Research, crossing
borders is even worse - different legal frameworks, institutional
arrangements and criteria for accreditation, - different providers (NSIs, Archives),
- different modes of access (no access, safe
centres, remote execution, remote access), - different languages,
- different views about security, anonymization,
output checking
61 To DwB Project Main Issues
The DwB Project, a Short Overview
Context
- Building a central point of access what are the
available data? How can they be accessed? - Metadata standards and interoperability NSIs
tend to use SDMX as a standard for metadata
exchange, CESSDA Archives use DDI as a standard
for documentation - Legal issues and accreditation towards a
European accreditation - Servicing the use of OS data provide tools
(format, routines for harmonization), train the
researchers for using European micro data - Technical, standardization and methodological
issues in developing a European distributed
remote access both for national and for European
micro data, flexible to national institutional
arrangements (NSI or data archives as provider)
propose and implement a test case
62New Conditions to Build
The DwB Project, a Short Overview
New Conditions to Build
- At national level
- Strong cooperation between Data Archives and NSIs
in some countries - Changes in the legal framework in several
countries - Increasing numbers of RDCs providing in site
access, remote execution or remote access - Some RDCs providing access to foreign researchers
- At European level
- The ESFRI roadmap and the CESSDA ERIC process
- Projects and initiatives within the framework of
the European Statistical System (ESSnet, WGSC ) - Discussions about a new EC Regulation on European
micro data access for researchers expected about
2012/2013
63Three Blocks, Twelve Work Packages
The DwB Project, a Short Overview
Project Architecture
- Block 1 Access Facilities (WP3, WP4, WP9, WP10
and WP11) - Block 2 Front Office (WP5, WP7, WP8 and WP12)
- Block 3 Enlarging Cooperation (WP6)
- WP1 (Project Management)
- WP2 (Internal External Communication)
64Block 1 Access
The DwB Project, a Short Overview
Project Architecture
Block 1
- Legal issues and accreditation examine current
national situation, agree on best practices
common standards, test an accreditation pilot and
suggest changes in the legal framework (WP3) - Technical issues in developing RA and SDC
procedures discuss and agree on security
standards, architecture for a distributed remote
access (WP3 and WP4) - Methodological issues anonymization and output
checking (WP11) - Implement a case study for a distributed European
remote access (WP4) building on current
possibilities for national micro-data conditional
to a change in 2012 on new possibilities for
Eurostat micro-data - Immediately offer transnational access through
open calls for researchers to access data either
on site or remotely (WP9 and WP10)
65WP3 Tasks
The DwB Project, a Short Overview
Project Architecture
WP5
- Task 1
- To devise and promulgate a fit for purpose
standard for researcher accreditation for the use
of official data that NSIs (and archives) will
find credible and addresses cross border issues.
To achieve a widely-recognised standard for
accreditation which will reduce administration
and costs, improve efficiency and improve
confidence in custodians of official data when
providing access - thus helping to remove
barriers for research use of official data. - Task 2
- To conduct an audit and describe the legal
frameworks for research access to official data
in the European Research Area. To present the
results in a useable form as a contribution to
effective policy planning and legal changes where
opportunities arise. To act as a definitive guide
for all interested parties on the legal
frameworks and accreditation processes for access
to data across the ERA.
66WP3 Tasks
The DwB Project, a Short Overview
Project Architecture
WP5
- Task 3
- To identify the challenges and solutions for
building and operating a Remote Access facility
in compliance with internationally recognised
Information Security standards. - To analyse a set of alternative suitable
organisational architectures for a sustainable
co-operational model for linking data centres,
and to propose a suitable organisational
architecture as a credible working model and if
possible a proved concept.
67Block 2 Front Office
The DwB Project, a Short Overview
Project Architecture
Block 2
- Provide a single point of access within the
context of the CESSDA portal and the current
CESSDA ERIC process (WP12) - Discuss standards (SDMX and DDI) and develop
tools to harmonize metadata (WP7) - Devise techniques to harvest NSI metadata by
CESSDA (WP8) - Service the use of the OS micro data Improve
metadata (including translations issues), formats
and provide routines for OS and Eurostat micro
data (WP5)
68WP5 Tasks
The DwB Project, a Short Overview
Project Architecture
WP5
- Task 1
- Set up the framework for a permanent virtual
service centre for OS microdata. This virtual
centre will consist of different components such
as user group, blog, email list. - Task 2
- Collect, structure and code information on
available microdata from official statistics at
the national level for each country in Europe. - Task 3
- Collect, structure and code information on
integrated European microdata from official
statistics held by Eurostat and national
censuses. - Task 4
- Write routines, i.e. syntax for statistical
analysis software, to read raw data from
Eurostats scientific use files into several
statistical programs.
69WP5 Work
The DwB Project, a Short Overview
Project Architecture
WP5
70WP5 Work
The DwB Project, a Short Overview
Project Architecture
WP5
71WP5 Work
The DwB Project, a Short Overview
Project Architecture
WP5
72WP5 Work
The DwB Project, a Short Overview
Project Architecture
WP5
73Block 3 Enlarging Cooperation
The DwB Project, a Short Overview
Project Architecture
Block 3
- All WPs work in cooperation and aim at
identifying best practices, agreeing on standards
and building on volunteers while bridging the
different communities (NSIs, Archives,
Researchers) - Yet the long term success requires involving the
whole ESS, the whole CESSDA , and the researchers
who are the final users , also making bridges
with non European partners (WP6) - European Data Access Forum and regional workshops
on data access - Users conferences
- Training activities
- Staff visits in RDCs where remote access
solutions exist
74To Summarize
The DwB Project, a Short Overview
Conclusions
- A challenging project
- Need to build trust and common understanding
between NSIs, Archives and Research Communities - Need to agree on standards, provide a model and
implement a pilot - Need to enlarge cooperation and strong
coordination with other initiatives ongoing
discussions - A crucial step toward a European research
infrastructure within the context of the CESSDA
ERIC - Building a single point of entry,
- Paving the way for a European accreditation,
- Enhancing access to anonymized official data,
- Providing a flexible infrastructure for accessing
confidential data
75Some Major Steps in 2013-2014
The DwB Project, a Short Overview
Future Events 2013-2014
- Continuous calls for transnational access to
highly detailed microdata - France, Germany, Netherlands, UKDA RDCs
- Give support to comparative research projects
that required transnational access - A business case for a European accreditation and
a distributed remote access - Staff Visits in RDCs providing remote access to
confidential OS microdata - Directed to NSIs and data archives that have not
implemented any RDC yet and RDCs interested in
shifting to or implementing a RA solution. - More info http//dwbproject.org/events/visits.htm
l - 21 22 March 2013 Users Conference 2013
(Mannheim, Germany) - Working with European Labour Force Survey
(EU-LFS) and European Union Statistics on Income
Living Conditions (EU-SILC) - More info http//www.gesis.org/en/events/conferen
ces/european-user-conference-3/ or
http//www.dwbproject.org/events/users_conf1.html
- 24 25 April 2013 Regional Workshop in Eastern
Europe (Slovenia) - To foster and/or trigger cooperation between
NSIs, data archives, researchers and other
stakeholders in 15 countries of this region - More info http//dwbproject.org/events/regional_w
orkshop1.html
76 Predstavitev opravljenega dela pri sodelovanju
ADP SURS
77 - PREGLED PRIPRAVLJENIH MATERIALOV IN PROCES DELA
- 1.) Priprava mikropodatkov za distribucijo v
varni sobi - 2.) Priprava metapodatkov s pomocjo SURS in
EUROSTAT metadokumentacije - 3.) Pregledovalnik mikro- in metapodatkov
- 4.) Priprava anonimizirane mikropodatkovne
datoteke - 5.) Oblikovanje spletnih podstrani ADP,
namenjenih partnerstvu DwB in sodelovanju ADP SURS
78 - Priprava mikropodatkov za distribucijo v varni
sobi
79 - Priprava metapodatkov s pomocjo SURS in EUROSTAT
metadokumentacije
80 - Pregledovalnik mikro- in metapodatkov
81Hvala za pozornost!Sedaj lahko nadaljujemo z
diskusijo
- Irena Vipavc Brvar, Sebastian Kocar
- Arhiv družboslovnih podatkov
- Fakulteta za družbene vede, Univerza v Ljubljani