Title: Costs of vocabulary mapping
1Costs of vocabulary mapping
http//www.willpowerinfo.co.uk
2Different kinds of subject vocabularies
- Classification schemes
- Subject headings
- Thesauri
- Free text searching
- Terms and phrases extracted from text by computer
3Ways of mapping
4Input and output mapping
Users local thesaurus
User
Users free text terms
Local preferred terms
Mapping service thesaurus
Information providers preferred terms
Information providers thesaurus
Information providers database
Information providers preferred terms (perhaps
expanded)
5Output mapping via remote thesaurus
Possible off-line consultation
Users local thesaurus
User
Users free text terms
Possible off-line term information
Mapping service thesaurus
Information providers preferred terms
Information providers thesaurus
Information providers database
Information providers preferred terms (perhaps
expanded)
6Output mapping direct to remote database
Possible off-line consultation
Users local thesaurus
User
Users free text terms
Possible off-line term information
Mapping service thesaurus
Off-line provision of term information
Information providers preferred terms
Information providers thesaurus
Information providers database
7Other projects and tools
- CARMEN
- RENARDUS
- AQUARELLE
- SIS-TMS
- UMLS
- GenThes
- CERES
- Knowledgecite Library
8Personal experience
- Editing terms into thesaurus structure consistent
with AAT - 25 terms per hour 150 terms per day
- 4500 terms take 30 days
- Assigning Dewey numbers to UNESCO terms
- 15 terms per hour 90 terms per day
- 4500 terms take 50 days
- Note AAT has about 125,000 terms
- At 90 terms per day would need 1400 days 6.3
years
9English Heritage estimator for thesaurus
construction
- Based on creating 10-20 simple terms per day
- Complex terms 2-8 per day
- At 10 terms per day,
- 4,500 terms take 450 person-days
- just over 2 person-years
10Factors affecting calculation
- Number of terms
- Number of uses
- Candidate terms per year
- Number of external terms mapped per year
- Number of licenses
11Start up requirements (EH)
- User assessment / market testing 3 days
- Introduction 10 days
- Peer review 5-15 days
- Initial documentation 2 days
- Promotion 3 days
- Audit of existing usage 0.1 day per 500 uses
- Research (e.g. reconciliation of names)
10 require 1 day per 5 terms - Training 1 day per 5 licences
12Annual maintenance tasks
- Candidate term evaluation
- 1 day per 5 terms received
- Mapping of existing terminology
- 1 day per 50 terms received
- Tracking and version control
- 1 day per 1000 terms
- License management
- 0.5 days per license
- (c) English Heritage 2001 .
13One to one mapping
- Abandoned children ? 305.906945 Abandoned
children - Abbreviations ? 401.48 Abbreviations
- Ability ? 153.9 Intelligence and aptitudes
- Ability grouping ? 371.254 Homogeneous grouping
14One to many mapping
- Abortion ?
- 179.76 Abortion (ethics)
- 294.356976 Abortion (ethics - religion -
Buddhism) - 304.667 Abortion (demographic effects)
- 342.084 Abortion (law and comprehensive works)
- 342.085 Abortion (rights of fetuses)
- 342.0878 Abortion (rights of women)
- 344.04192 Abortion (medical law)
- 363.46 Abortion (social problems)
- 363.96 Abortion (birth control)
- 364.185 Abortion (criminal offences)
- 615.766 Abortion (drugs causing)
- 618.392 Abortion (spontaneous)
- 618.88 Abortion (surgical)