Title: ASN'1 Specification
1ASN.1 Specification Compiled Code
- March 17, 2000
- Hogue Lab
- Samuel Lunenfeld Research Institute
- Mt. Sinai Hospital
2ASN.1?
- ASN.1 Abstract Syntax Notation.1
- Internationally standardized data specification
language which can be used to build complex data
types from simpler ones in a hierarchical manner
- born in early 80s from Xerox tech. - Easy for a computer to use, read and write
- Used in many commercial sectors (Telephone
systems, air traffic, building and machine
control, toll highways, smart cards, security) - Currently growing more than ever
- Visit http//WWW.OSS.COM
- Used by NCBI to define/store data in GenBank,
PubMed, Taxon and MMDB everything else
3Why NCBI chose ASN.1
Architecture independent (interoperable) Easy to
specify data structure and encoding Many tools
that support it Syntax checking (formal
language) Data sharing Automated code
generation At the time, ASN.1 was as mature as
XML will be in one or two years from
now. Efficient automatic binary encoding (The
main reason that it is not more widely known is
that it is not free.)
4ASN.1 Structure
Type references, identifiers, values Type
reference object name, start with UPPER CASE
letter Identifier field name, start with lower
case letter Values only found in the value
notation. Two types of ASN.1 the specification
and the value notation Comments start with --
5An ASN.1 Specification
-- Revision 1.0 -- Demo Spec Demo-module
DEFINITIONS -- Module-name BEGIN EXPORTS
My-type, -- My-type can be used by other
modules Another IMPORTS Foreign-type
FROM Other-module -- can import types -- we
define an object called My-type My-type
SEQUENCE -- My-type is a Type Reference first
INTEGER, -- first is an identifier second
INTEGER DEFAULT 2, -- second defaults to 2 third
VisibleString OPTIONAL -- third is an optional
string -- end of object definition Another
Foreign-type -- can reference other defined
types END -- end of module, END required
6ASN.1 Value Notation
Two types of encoding ASCII and Binary Always
use binary encoding for production code! Example
text encoding My-type first 42
7Defining repeated lists of types
Specification My-type-set SEQUENCE OF
My-type Value Notation My-type-set --
start SET OF -- a My-type first 42 , --
another My-type first 27 , second 22 , third
"Everything set here" -- end of SET OF
8NCBI Primitive Data Types
BOOLEAN (TRUE or FALSE) INTEGER OCTET STRING
(string of bytes) NULL REAL (although not really
- make your own REAL type using integer mantissa,
exponent) VisibleString (printable
characters) StringStore (NCBI made this one up
for sequence)
9SEQUENCE, SET, OF
SEQUENCE - a series of named types - in
order SEQUENCE OF - repeating series of single
type - in order SET - like SEQUENCE, but order
does not matter (Never use this - especially in
secure applications) SET OF - like SEQUENCE OF,
but order does not matter (Dont use this either)
10CHOICE
CHOICE define a set of alternate
types. Specification Accession-Number CHOICE
gi INTEGER, swiss-prot VisibleString Text
value notation Accession-Number gi
413223 or Accession-Number swiss-prot
P22518 Note no around choice values
11ENUMERATED
ENUMERATED A named set of integer values.
Parser will check that names are valid. Ideal for
controlled vocabulary lists that wont
change. Can also use INTEGER in the same way.
Names wont be type checked. Ideal for
controlled vocabulary lists that will be expanded
over time. Specification Sex ENUMERATED
female (1), male (2), other
(255) Example Sex other
12Modifiers
OPTIONAL marks a value as optional. Can be
added to any type. DEFAULT specifies a default
value (can be any value). May be used with all
types except OCTET STRING, NULL. These add
semantic value to your specification and will be
validated by the parsers.
13ASN.1 Development Process
14Using ASNTOOL
Combination of two older programs - Asncode and
Asntool Lower case parameters are for
validator/converter and upper case parameters are
for the code generator Can be used to validate
any ASN.1 - given the spec, convert between
binary and ASCII value notation, generate C
code C Code For every data type (object) in your
spec, asntool will generate a C structure that
mirrors the data type in the spec and New, Free,
AsnRead and AsnWrite functions for that type.
You never have to write a parser when using
ASN.1!!!
15ASNTOOL Command Line
-m ASN.1 Module File File In -f ASN.1 Module
File File Out Optional -v Print Value File
File In Optional -p Print Value File File
Out Optional -d Binary Value File (type
required) File In Optional -t Binary Value
Type String Optional -e Binary Value File
File Out Optional -o Header File File Out
Optional -b Buffer Size Integer Optional -w
Word length maximum for defines Integer
Optional -G Generate object loader .c and .h
files -M ASN.1 module filenames -B Base for
filename, without extensions, for generated
objects -I In generated .c, add include to this
filename String Optional NOTE run asntool to
see more options
16ASNTOOL is Run Twice To Generate Code!
1. asntool -m tindex.asn -o tindex.h -w 120
-b10000- use tindex.asn to create a tindex.h
file that contains a representation of the ASN.1
spec in C (The parse tree). This is what the
parser uses to read/write and validate your
ASN.1 2. asntool -m tindex.asn -Gt -M
general.asn -B objtindex -w100 -b10000 -I
objgen.h- use tindex.asn to create
objtindex.c/h files containing your object
loaders. You use the functions in objtindex.h.
If you import types from other ASN.1 Modules, you
must put them in a comma separated list after -M.
-I can only handle one include - if you need to
include more - put them into one include and
include that one.
17Example Specification
18Generates this code
This is a part of objtindex.h (implemented in
objtindex.c)
19Asnlib
Asnlib is the NCBI Toolkit library that supports
all ASN.1 functions. You need to know something
about it to use your newly generated code in your
application. When reading and writing ASN.1, you
must open and close an ASN.1 stream. This stream
can write to a file stream, memory or
bytestore. Asn.h contains all the functions you
need.
20Sample Code (Write)
21Sample Code (Read)
22Other Functions
You can do the same thing but with memory or
bytestores. See asnio.c section of
asn.h AsnIoMemOpen and AsnIoMemClose AsnIoBSOpen
and AsnIoBSClose Generic functions Allow you to
copy and compare any two ASN.1 data types that
you have. Pointer AsnIoMemCopy(Pointer from,
AsnReadFunc readfunc, AsnWriteFunc
writefunc) Boolean AsnIoMemComp (Pointer a,
Pointer b, AsnWriteFunc writefunc)
23Key Points
ASN.1 makes writing a program with a data
structure much faster (RAD - rapid application
development). This is true even if you never
read or write your data structure to disk. ASN.1
is good for storing data in a database in a
platform independent manner. (We store ASN.1
objects as binary objects in a database indexed
by usually one primary key e.g gi) TIP As for
any software development task Spend at least 20
of your development time in the initial design
phase. THIS CAN NOT BE MORE IMPORTANT!
24ASN.1 Advanced
You can use the ASN.1 parser on its lowest
public interface level if you wish. This is how
NCBI built their object loaders before they had a
code generator. Parser is very similar to XML
parsers that are getting popular these
days. Functions are available to read/write tag
(or id) - value pairs from a binary or text
encoding of ASN.1 How to use them is presented
quite well in the 1994 Toolkit documentation in
the AsnLib chapter.
25Binary Encoding
There are many different binary encodings for
ASN.1 BER (Basic Encoding Rules) and PER (Packed
Encoding Rules) These are NOT compression
schemes, although BER data takes up about 50
less space on average. PER works even better.
NCBI Toolkit supports only BER. In BER -
tag/values, etc. are represented as binary
objects. So for a number, instead of being
written out as an ASCII character, will be
written as a binary number. This saves space.
(Note XML currently only writes out as ASCII and
is therefore inefficient) Encoding as binary
takes up less CPU cycles than encoding as text!
26The Future of ASN.1 at the NCBI
While the Toolkit is rewritten in C, the code
generator is being rewritten to generate C
code. They are also planning to implement an
ASN.1 to XML converter. The W3C is working on a
binary XML encoding for use where ASN.1 is being
used today (http//www.w3.org/TR/wbxml/) Only
time will tell whether ASN.1 is replaced by XML.
XML still has quite a far way to go to mature.