Title: Implementing Coding Tools for a New Classification
1Implementing Coding Tools for a New Classification
- Andrew Allen, UK Office for National Statistics
2Operation 2007 - The players
- In the UK The Standard Industrial
Classification of Economic Activities (SIC)
(current version SIC (2003) - In Europe NACE, the Nomenclature générale
des activités économiques dans les
Communautés européens (current version NACE
Rev 1.1) - In the UN ISIC, the International Standard
Industrial Classification of all Economic
Activities (current version ISIC Rev
3.1)
3The UK SIC
- is a 5 digit classification system
- is required, by EU legislation, to be identical
to NACE down to and including the 4 digit Class
level - contains a national 5th digit level which does
not exist in NACE
4The Results changes in structure
SIC 2003 SIC 2007
NACE Classes 514 615
NACE Classes not split 414 537
UK Sub Class splits 285 191
Total Sub Classes 699 728
5ACTR as an aid to coding
- ACTR Automatic Coding by Text Recognition
- Developed by Statistics Canada
- ONS standard tool for coding, initially industry
and occupation - Replaces Precision Data Coder for industry coding
- Determines a code from a text description
- Extent of automation of process is controlled by
parameters
6Knowledge Bases SIC2003
- ACTR relies heavily on indexes of standard
descriptions - Business descriptions from responses to the
Business Register Survey - Published index for the SIC2003
- The short descriptions for each SIC2003 code
- Standard descriptions for construction industry
statistics - Trade code descriptions for PAYE (Pay As You Earn
Tax) employers - Farm type descriptions
- With a total of gt 30,000 standard descriptions
7How ACTR works
- Each input description is converted to a standard
form - This is compared with the standard forms of
descriptions held in the knowledge base - The closeness is presented as a score between 0
and 10 - The system has rules to determine whether the
score is sufficient to confirm a match - Requires a score of more than 7.5 to code
automatically (our setting which may differ for
other data sets) - Lower scores are passed through interactive
coding - Coding does not depend on the order in which the
knowledge bases are checked
8Extract from Business Register Survey
Questionnaire
9(No Transcript)
10(No Transcript)
11(No Transcript)
12ACTR Process
- Supplied text Horticultural services
- HORTICULTURAL SERVICE
- Best fit index entry Sales and service of
horticultural machinery - HORTICULTURAL MACHINERY SALE SERVICE
- Score is 6.911 (out of 10)
- ACTR prefers SIC 2003 code 51880 (Wholesale of
agricultural machinery and accessories)
13(No Transcript)
14Interactive coding
- Scores below 7.5 are passed to clerical staff for
coding interactively - The system presents options in descending order
of score - If none of the choices appear good, staff modify
the description - Once a decision is made, the person coding
confirms the choice - The index description is then held on the IDBR.
15Introducing the SIC2007 (NACE Rev 2)
- New index files
- SIC2007 headings
- SIC2007 index
- Initially code forward from the SIC2003 using
bridging codes these are codes for each
knowledge base entry that link the SIC2003 and
SIC2007 - Later will change to code backwards from the
SIC2007 - Eventually dual coding will cease
16Impact of ACTR on IDBR at Micro Level
- Existing SIC 2003 is 01120 (Growing of vegetables
etc) - The preferred ACTR SIC 2003 is 51880 (Wholesale
of agricultural machinery and accessories) - The SIC 2007 comes from the bridging code
- SIC 2003 51880
- Bridging code MTOLR
- SIC 2007 46610
- SIC 2003 code will change but only when agreed
17Conversion to SIC2007
- ACTR will deal with units that have a suitable
business description - Conversion tables will deal with
- Units with descriptions that ACTR is unable to
code (vague descriptions) - Units without a description
- Units supplied through administrative sources
(existing VAT traders, PAYE employers, Registered
Companies)
18Creation of Conversion Tables
- Tables have been created to convert units from
SIC2003 to SIC2007 - Using ACTR bridging codes
- Coding existing data through ACTR
- Producing cross-tabulation of SIC2003 to SIC2007
- Allocating on a probability basis rounded to
nearest 5 - Validate relationships against the acceptable
range of industries - Best fit tables also produced for users who
cannot accommodate probability based conversion
19Codingprocess
20Impact on the IDBR at the Macro Level
- Impact on SIC 2003 is only on those reporting
units that have business descriptions for local
units, where ACTR can code. - ACTR codes 620,000
- ACTR does not code 210,000
- No business description 340,000
- Administrative data only 1,660,000
- Total local units 2,830,000
- SIC 2007 comes from the bridging codes only where
ACTR codes otherwise SIC 2007 comes from
conversion from SIC 2003
21A AGRICULTURE, HUNTING AND FORESTRY SIC
2003 B FISHING C MINING AND QUARRYING D MANUFACTUR
ING E ELECTRICITY, GAS AND WATER SUPPLY F
CONSTRUCTION G WHOLESALE AND RETAIL TRADE
REPAIR OF MOTOR VEHICLES H HOTELS AND
RESTAURANTS I TRANSPORT, STORAGE AND
COMMUNICATION J FINANCIAL INTERMEDIATION K REAL
ESTATE, RENTING AND BUSINESS ACTIVITIES L
PUBLIC ADMINISTRATION AND DEFENCE COMPULSORY
SOCIAL M EDUCATION N HEALTH AND SOCIAL
WORK O OTHER COMMUNITY, SOCIAL AND PERSONAL
SERVICE ACTIVITIES P PRIVATE HOUSEHOLDS
EMPLOYING STAFF AND UNDIFFERENTIATED Q
EXTRA-TERRITORIAL ORGANISATION AND BODIES
22Impact at SIC 2003 broad industry level
(provisional counts)
Section Starting stock In Out Net Change
A B 167,000 0.5 0.6 -0.1
C, D and E 180,000 5.9 5.2 0.7
F 260,000 1.4 0.9 0.5
G 530,000 2.4 2.5 -0.1
H 188,000 2.3 1.6 0.7
I 116,000 2.7 2.4 0.3
J 58,000 6.5 3.3 3.2
K 872,000 1.2 1.3 -0.1
L 29,000 10.4 11.1 -0.7
M, N and O 432,000 2.9 3.8 -0.9
23A Agriculture, Forestry And Fishing SIC
2007 B Mining And Quarrying C Manufacture D Electr
icity, Gas, Steam And Air Conditioning
Supply E Water Supply Sewage, Waste Management
And Remediation Activities F Construction G Wholes
ale And Retail Trade Repair Of Motor Vehicles
And Motorcycles H Transportation And
Storage I Accommodation And Food Service
Activities J Information And Communication K Finan
cial And Insurance Activities L Real Estate
Activities M Professional, Scientific And
Technical Activities N Administrative And Support
Service Activities O Public Administration And
Defence Compulsory Social Security P Education Q
Human Health And Social Work Activities R Arts,
Entertainment And Recreation S Other Service
Activities T Activities Of Households
U Activities Of Extraterritorial Organisations
And Bodies
24Correspondence between SIC 2003 and SIC 2007 for
local units coded by ACTR
25Implementation timetable
December 2006 NACE published
January 2007 SIC 2007 is published on NS website
February 2007 Development and tuning of data coder (ACTR) first release on 2007 basis, subject to revision
June 2007 Re-coding using ACTR
August 2007 New release of ACTR, using SIC 2007 index
November 2007 SIC 2007 Index published (consistent with ACTR August 2007)
January 2008 SIC 2007 fully implemented on the Register
2008 ???? ACTR SIC 2003 overwrites historic SIC 2003
26Conclusions
- The ACTR tool delivers considerable savings in
terms of cost and burden on businesses compared
to traditional survey approaches. - The knowledge base is portable (i.e. independent
of the coding engine), enabling sharing this with
any interested parties, e.g. administrative data
suppliers, to increase the consistency of coding. - The use of bridging codes permits simultaneous
coding to multiple classification systems,
essential if periods of dual-coding are required.
- The knowledge base approach can help to inform
the development of future versions of a
classification, by providing a reference frame of
business activity descriptions.