Title: Data Types
1Data Types
- Primitive Data Types
- User_Defined Ordinal Types
- Array Types
- Record types
- Union Types
- Set Types
- Pointer Types
2Evolution of Data Types
- FORTRAN I (1956)
- - INTEGER, REAL, arrays
- Ada (1983)
- - User can create a unique type for every
category of variables in the problem space and
have the system enforce the types - Abstract Data Type
- - the use of type is separated from the
representation and operations on values of that
type - Def A descriptor is the collection of the
attributes of a variable
3Design Issues for all data types
- What is the syntax of references to variables?
- What operations are defined and how are they
specified ? - Def A descriptor is the collection of the
attributes of a variable -
- In an implementation, a descriptor is a
collection of the memory sells that store the
variables attributes. - We will discuss for each specific types design
issues separately.
4Primitive Data Types
- Data Types that are not defined in terms of
others types are called Primitive Type - Some of this types are merely reflection of
hardware ( integers ) other require only a little
non hardware support for their implementation. - Integer
- - Almost always an exact reflection of the
hardware, so the mapping is trivial - - There may be as many as eight different
integer types in a - language
- Integers are represented as a string of bits(
positive or negative) - Most computers now use a notion of twos
complement to store negative integers ( See any
assembly language programming book)
5Floating Point
- Model real numbers, but only as approximations
- Languages for scientific use support at least two
floating-point types sometimes more - Usually exactly like the hardware, but not
always some languages allow accuracy specs in
code.
6IEEE floating-point formats
(a) Single precision, (b) Double
precision
7Decimal
- - For business applications (money)
- - Store a fixed number of decimal digits (coded)
- - Advantage accuracy
- - Disadvantages limited range, wastes memory
- Primary data types for business data processing (
COBOL) - They are stored very much like cahacter strings
using binary code for decimal digits (BCD)
8Boolean
- First were introduced in Algol 60
- Most of all general purpose languages have that
- Except __
- Could be implemented as bits, but often as bytes
- - Advantage readability
- Character Type
- Character data are stored in computer as numeric
codings. - ASCII 128 different characters ( 0 .. 127)
- A 16 bit character set named Unicode was
developed to include most of worlds
9Character String Types
- Is one in which the Values consist of sequences
of characters - Design issues
- 1. Is it a primitive type or just a
special kind of array? -
- 2. Is the length of objects static or
dynamic? - Operations
- - Assignment
- - Comparison (, gt, etc.)
- - Catenation
- - Substring reference
- - Pattern matching
10Examples
- Pascal, - Not primitive assignment and
comparison only (of packed arrays) - FORTRAN 77, FORTRAN 90, SNOBOL4 and BASIC
- - Somewhat primitive
- Assignment, comparison, catenation, substring
reference - FORTRAN, SNOBOL4 have an intrinsic for
pattern matching - C, C, ADA not primitive, strcpy, ctrcat
strlen, strcmp - comonly used string manipulation
library functions (string.h) N N1 N2
(catenation) N(2..4) (substring reference) (
ADA) - JAVA strings are supported as a primitive type
by the String class ( constant strings) and
StringBuffer class ( changeable strings)
11String Length Options
- There are several design choices regarding the
length of string values - 1. Static length string (length is specified at
the declaration time) - FORTRAN 90, Ada, COBOL,
Pascal - CHARACTER (LEN 15) NAME
(FORTRAN 90) - Static length strings are always full
- 2. Limited Dynamic Length strings can store any
number of characters between 0 and maximum
(specified by the variables declaration) - - C and C
- - actual length is indicated by a null
character - 3. Dynamic - SNOBOL4, Perl, JavaScript
- provides maximum flexibility, but requires of
overhead of dynamic storage allocation and
deallocation
12Evaluation
- - Aid to writability
- - As a primitive type with static length, they
are inexpensive to provide--why not have them? - - Dynamic length is nice, but is it worth the
expense? - Implementation
- - Static length - compile-time descriptor
- - Limited dynamic length - may need a run-time
descriptor for length (but not in C and C) - Dynamic length - need run-time descriptor
allocation/ - deallocation is the biggest implementation
problem
13Implementation Static length
- - Static length - compile-time descriptor name
of type, length( in characters ) and address of
first character - Limited dynamic length - may need a run-time
descriptor for length (but not in C and C)
14Limited dynamic strings
- Limited dynamic strings require a run time
descriptor to store both max length and current
length - Dynamic length - need run-time descriptor( only
current length) allocation/deallocation is the
biggest implementation problem.There are two
possible approaches to this - - linked list, or
- - adjacent storage
15Ordinal Types ( user defined )
-
- An ordinal type is one in which the range of
possible values can be easily associated with the
set of positive integers. - In many languages , users can define two kinds of
ordinal types - enumeration and subrange.
- Enumeration Types
- - one in which the user enumerates all of
the possible values, which - are symbolic constants
- Design Issue
- Should a symbolic constant be allowed to be
in more than one type definition?
16Examples
- Pascal
- - cannot reuse constants they can be used for
array subscripts, for variables, case selectors
NO input or output can be compared - Ada
- - constants can be reused (overloaded
literals) disambiguate with - context or type_name (one of them)
can be used as in Pascal - CAN be input and output
- C and C -
- like Pascal, except they can be input and
output as integers - Java does not include an enumeration type but can
be implemented as a nice class.
17Evaluation
- Enumeration types provide advantages in both
- Aid to readability
- --e.g. no need to code a color as a number
-
- Aid to reliability
- --e.g. compiler can check operations and
ranges of values
18Subrange Type
- Subrange Type - an ordered contiguous
subsequence of an ordinal type - Examples
- Pascal - Subrange types behave as their parent
types can be used as for variables and array
indices - e.g. type pos 0 .. MAXINT
- Ada - Subtypes are not new types, just
constrained existing types (so they are
compatible) can be used as in Pascal, plus case
constants - subtype INDEX is INTEGER range 1 .. 100
- All of operations defined for the parent type are
also defined for subtype, except assignment of
values out of range.
19Implementation of user-defined ordinal types
- Enumeration types are implemented by associating
a nonnegative integer value with each symbolic
constant - in that type.( typically, the first 0
second 1 ) - Subrange types are the parent types with code
inserted - (by the compiler) to restrict assignments to
subrange - variables
20Arrays
- An array is an aggregate of homogeneous data
elements in - which an individual element is identified by its
position in the - aggregate, relative to the first element.
- Specific element of an array is identified by
- i) Aggregate name
- ii) Index (subscript) position relative
to the first element. -
21Design Issues
- 1. What types are legal for subscripts?
- 2. Are subscripting expressions in element
references range checked? -
- 3. When are subscript ranges bound?
-
- 4. When does allocation take place?
-
- 5. What is the maximum number of subscripts?
-
- 6. Can array objects be initialized?
-
- 7. Are any kind of slices allowed?
22Arrays and Indexes
- Indexing is a mapping from indices to elements
-
- map(array_name, index_value_list) ? an element
- Syntax
- - FORTRAN, PL/I, Ada use parentheses
-
- - Most others use brackets
- (Example) simple_array.C (Show me the output)
- include ltstream.hgt
- void main()
- char a8 "abcdefg"
- cout ltlt a0 ltlt a
- cout ltlt a1 ltlt (a1)
-
23Subscript Types
- FORTRAN, C, C, Java - int only
- Pascal - any ordinal type (int, boolean, char,
enum) -
- Ada - int or enum (includes boolean and char)
- In some languages the lower bound of subscript
range is fixed - C, C, Java - 0
- FORTRAN - 1
- In most other languages it needs to be
specified by the - programmer
24Array Categories
- Four Categories of Arrays (based on subscript
binding and binding to storage) -
- 1. Static - range of subscripts and storage
bindings are static - e.g. FORTRAN 77, some arrays in Ada
- Advantage execution efficiency (no
allocation or deallocation) - 2. Fixed stack dynamic - range of subscripts is
statically bound, but storage is bound at
elaboration time - e.g. Pascal locals and, C locals that are
not - static
- Advantage space efficiency
- (Example) In a C function,
- void func()
- int a5
- ...
-
25- 3. Stack-dynamic - range and storage are dynamic,
but once the subscript ranges are bound and the
storage is allocated they are fixed from then on
for the variables lifetime - e.g. Ada declare blocks
-
- declare
- STUFF array (1..N) of FLOAT
- begin
- ...
- end
- Advantage flexibility - size need not be
known until the array is - about to be used
26- 4. Heap-dynamic The binding of subscript ranges
and storage allocation is dynamic and can change
any number of times during the arrays life time - e.g. (FORTRAN 90
- INTEGER, ALLOCATABLE, ARRAY (,) MAT
- (Declares MAT to be a dynamic 2-dim array)
- ALLOCATE (MAT (10, NUMBER_OF_COLS))
- (Allocates MAT to have 10 rows and
- NUMBER_OF_COLS columns)
- DEALLOCATE MAT
-
- (Deallocates MATs storage)
27Example
- (Example) program heap_array.C
- include ltstream.hgt
- void main()
- char array
- int N
- cout ltlt "Type an array size\n"
- cin gtgt N
- array new charN
- delete array
-
- - In APL Perl, arrays grow and shrink as
needed - - In Java, all arrays are objects
(heap-dynamic)
28Number of subscripts
- - FORTRAN I allowed up to three
- - FORTRAN 77 allows up to seven
-
- - C, C, and Java allow just one, but elements
can be arrays - - Others - no limit
- Array Initialization
- - Usually just a list of values that are put in
the array in the order in which the array
elements are stored in memory
29Examples
- 1. FORTRAN - uses the DATA statement, or put
the values in / ... / on the declaration -
- 2. C and C - put the values in braces can
let the compiler count them - e.g.
- int stuff 2, 4, 6, 8
-
- 3. Ada - positions for the values can be
specified - e.g.
- SCORE array (1..14, 1..2)
- (1 gt (24, 10), 2 gt (10, 7),
- 3 gt(12, 30), others gt (0, 0))
- 4. Pascal and Modula-2 do not allow array
initialization in the declaration section of
program.
30Array Operations
- 1. APL many ( arrays and their operations are
the hart of APL) - four basic operations for single dimensional
arrays and matrices - see book (p. 240 ) - 2. Ada
- - assignment RHS can be an aggregate
constant or an array name - - catenation for all single-dimensioned
arrays - - relational operators ( and / only)
- 3. FORTRAN 90
- - intrinsics (subprograms) for a wide
variety of array operations - (e.g., matrix multiplication, vector dot
product) -
- FORTRAN 77 no array operations
31Slices
- A slice is some substructure of an array lt not a
new data typegt, - nothing more than a referencing mechanism
-
- 1. FORTRAN 90
- INTEGER MAT (1 3, 1 3)
- MAT(1 3, 2) - the second column
- MAT(2 3, 1 3) - the second and
third row - 2. Ada - single-dimensioned arrays only
- LIST(4..10)
-
32Implementation of Arrays
- Access function maps subscript expressions to an
address in - the array - Row major (by rows) or column major
order (by columns) - Column major order is used in FORTRAN, but most
of other languages use row major order.
Location of the i, j element in a
matrix. LocationaI,j address of a1,1
( i -1)(size of row)
(j -1)( element size).
33Compiler time descriptor
- for single dimensional array is shown below
- This information requires to construct access
function.If runtime checking of index range is - not done and the attributes are static , then the
only access function is needed during - executionno run time descriptors are needed.
.
34Associative Arrays
- An associative array is an unordered collection
of data elements that are indexed by an equal
number of values called keys. - Each element of a associative array is in fact a
pair of entities , a key and value - - Design Issues
- 1. What is the form of references to elements?
- 2. Is the size static or dynamic?
- Associative arrays are supported by the standard
class library of Java. - But the main languages which supports an
associative arrays is Perl.
35Structure and Operations in Perl (hashes)
- In Perl, associative arrays are often called
hashes. - - Names begin with
- - Literals are delimited by parentheses
- e.g.,
- hi_temps ("Monday" gt 77,
- "Tuesday" gt 79,)
- - Subscripting is done using braces and keys
- e.g.,
- hi_temps"Wednesday" 83
- - Elements can be removed with delete
- e.g.,
- delete hi_temps"Tuesday"
- _at_hi_temp () empties the entire hash
36Records
- A record is a possibly heterogeneous aggregate
of data elements in which the individual elements
are identified by names. - First was introduced in1960 COBOLsince than
almost all languages support them ( except pre 90
FORTRANs). In OOL, the class construct - Supports records.
- Design Issues that are specific for records
- 1. What is the form of references?
- 2. What unit operations are
defined?
37Record Field References
- 1. COBOL
- field_name OF record_name_1 OF ... OF
record_name_n - 2. Others (dot notation)
- record_name_1.record_name_2. ...
.record_name_n.field_name - Fully qualified references must include all
record names - Elliptical references allow leaving out record
names as long as the reference is unambiguous (
COBOL and PL/I) - Pascal and Modula-2 provide a with clause to
abbreviate references - In C and C, individual fields of structures
are accessed by member selection operators .
and -gt.
38Example
- structure type in C and C, program
simple_struct.C - include ltstream.hgt
- struct student
- int ssn
- char grade
-
- void main()
- struct student john
- struct student p_john
- john.grade 'A'
- p_john john
- cout ltlt p_john ? grade ltlt endl
-
- einsteingt g simple_struct.C
- einsteingt a.out
- A
39Record Operations
- 1. Assignment
- - Pascal, Ada, and C allow it if the types
are identical - - In Ada, the RHS can be an aggregate
constant - 2. Initialization
- - Allowed in Ada, using an aggregate
constant - 3. Comparison
- - In Ada, and / one operand can be an
aggregate constant -
- 4. MOVE CORRESPONDING
- - In COBOL - it moves all fields in the
source record to fields with the same names in
the destination record - Comparing records and arrays
- 1. Access to array elements is much slower than
access to record fields, because subscripts are
dynamic (field names are static) - 2. Dynamic subscripts could be used with record
field access, but it would disallow type
checking and it would be much slower
40Implementation of Record Type
- The fields of record are stored in adjacent
memory locations. The offset address relative to
beginning is associated with each field.The
field accesses are handled by using this offsets.
The compiler time descriptors for record is shown
on the left side of slide. No need for run time
descriptors.
41Unions
- A union is a type whose variables are allowed
to store different type - values at different times during execution
- Design Issues for unions
- 1. What kind of type checking, if any, must
be done? - 2. Should unions be integrated with records?
- Examples
- 1. FORTRAN - with EQUIVALENCE in C or C
construct union is used - ( free unions)
- 2. Algol 68 - discriminated unions
- - Use a hidden tag to maintain the
current type - - Tag is implicitly set by assignment
- - References are legal only in
conformity clauses - (see book example p. 231)
- - This runtime type selection is a safe
method of accessing union objects
42- Problem with Pascals design type checking is
ineffective - Reasons
- a. User can create inconsistent unions
(because the tag can be individually assigned) - b. The tag is optional!
- - Now, only the declaration and the second an
last assignments are required to cause trouble - 4. Ada - discriminated unions
- - Reasons they are safer than Pascal Modula-2
-
- a. Tag must be present
-
- b. It is impossible for the user to create an
inconsistent union (because tag cannot be
assigned by itself--All assignments to the union
must include the tag value)
43Example Pascal record variant
- program main
- Type
- node
- record
- case tag boolean of
- true (count integer sum real)
- false (total real)
- end
- var a node
- begin
- a.tag true
- a.count 777
- a.sum 1.5
- writeln(a.count, a.sum, a.total)
- end.
44Example (Free union C)
- include ltstream.hgt
- union Value
- char cval
- float fval
-
- void main()
- Value val
- val.cval 'a'
- cout ltlt val.cval ltlt endl
- val.fval 1.2
- cout ltlt val.fval ltlt endl
- cout ltlt val.cval ltlt endl
-
- einsteingt g union.C
- classesgt a.out
45Implementation of Union Type
- Discriminated union
- Tag entry is associated with a case table, each
of whose entries points to a - descriptor of a particular variant.
46Sets
- A set is a type whose variables can store
unordered collections of distinct values from
some ordinal type called base type - Design Issue
- What is the maximum number of elements in any
set base type? - Examples
-
- 1. Pascal
- - No maximum size in the language definition
(not portable, poor - writability if max is too small)
- - Operations union (), intersection (),
difference (-), , lt gt, superset - 2. Modula-2 and Modula-3 - Additional
operations INCL, EXCL, / - (symmetric set difference (elements in
one but not both operands)) - 3. Ada - does not include sets, but defines in as
set membership operator for all enumeration types - 4. Java includes a class for set operations
47- Evaluation
- - If a language does not have sets, they must
be simulated, either with enumerated types or
with arrays - - Arrays are more flexible than sets, but have
much slower operations - Implementation
- - Usually stored as bit strings and use logical
- operations for the set operations
48Pointers
- A pointer type is a type in which the range of
values consists of memory - addresses and a special value, nil (or null) The
value nil indicates that a - pointer cannot currently be used to reference
another object. In C and - C, a value 0 is used as nil.
- Uses
- 1. Addressing flexibility ( indirect
addressing ) - 2. Dynamic storage management ( access to
heap) - Design Issues
- 1. What is the scope and lifetime of pointer
variables? - 2. What is the lifetime of heap-dynamic
variables? - 3. Are pointers restricted to pointing at a
particular type? - 4. Are pointers used for dynamic storage
management, indirect addressing, or both? - 5. Should a language support pointer types,
reference types, or both?
49Fundamental Pointer Operations
- 1. Assignment Sets a pointer variable to the
address of some object. - 2. References (explicit versus implicit
dereferencing) - Obtaining the value of the memory cell whose
address - is in the memory cell to which the pointer
variable - is bound to. In C and C, dereferencing is
specified - by prefixing a identifier of a pointer type by
the - dereferencing operator ().
50Example
- (Example in C)
- int j
- int ptr // pointer to integer variables
- ...
- j ptr
Address-of operator () Produces the address of
an object.
51Problems with pointers
- 1. Dangling pointers (dangerous)
- - A pointer points to a heap-dynamic variable
that has been deallocated - - Creating one
-
- a. Allocate a heap-dynamic variable and set a
pointer to point at it - b. Set a second pointer to the value of the
first pointer - c. Deallocate the heap-dynamic variable, using
the first pointer - (Example 1) dangling.C
- include ltstream.hgt
- void main()
- int x, y
- x new int
- x 777
- delete x
- y new int
- y 999
- cout ltlt x ltlt endl
-
52Example (C)
- dangling.C
- include ltstream.hgt
- void main()
- int x, y
- x new int
- x 777
- delete x
- y new int
- y 999
- cout ltlt x ltlt endl
-
Example 2 int x ... int y
1 x y ... cout ltlt x ltlt
endl // We don't know what's in x
53Lost Heap-Dynamic Variables
- - A heap dynamic variable that is no longer
referenced by - any program pointer (wasteful)
- - Creating one
- a. Pointer p1 is set to point to a newly created
heap-dynamic variable -
- b. p1 is later set to point to another newly
- created heap-dynamic variable
-
- - The process of losing heap-dynamic variables is
called memory leakage
54Examples
- 1. Pascal used for dynamic storage management
only - - Explicit dereferencing
- - Dangling pointers are possible (dispose)
- - Dangling objects are also possible
- 2. Ada a little better than Pascal and Modula-2
- - Some dangling pointers are disallowe
because dynamic objects can be automatically
deallocated at the end of pointer's scope - - All pointers are initialized to null
- - Similar dangling object problem (but rarely
happens) - 3. C and C - Used for dynamic storage
management and addressing - - Explicit dereferencing and address-of operator
- - Can do address arithmetic in restricted forms
- - Domain type need not be fixed (void )
- - void - can point to any type and can be type
checked (cannot be dereferenced) -
55Reference Types
- C has a special kind of pointers Reference
Types - which is constant pointers that are implicitly
- dereferenced
-
- Used for formal parameters in function
definitions - Advantages of both pass-by-reference and pass-by
- value ( details will come later)
- float stuff100
- float p
- p stuff
-
- (p5) is equivalent to stuff5 and p5
- (pi) is equivalent to stuffi and p
56Java
- Java - Only references
-
- - No pointer arithmetic
-
- Can only point at objects (which are all on the
heap - - No explicit deallocator (garbage collection is
used - - Means there can be no dangling references
-
- - Dereferencing is always implicit
57Implementation of Pointer and Reference Types
- In most larger computers, pointers and references
- are single values stored in either two or
four-byte - Memory cells depending on the size of the address
- space of machine address.
- Microcumputers
- (thus are based on Intel microprocessors which
uses two part addresses segment and offset)
ponters and references are implemented as pair of
16ibit words, one for each part.
58SOLUTIONS FOR THE GARBAGE PROBLEM
- Reference counter (eager approach) Each memory
cell is - associated with a counter which stores the number
of - pointers that currently point to the cell. When a
pointer - is disconnected from a cell,the counter is
decremented by 1 - if the reference counter reaches 0, meaning the
cell has - become a garbage, the cell is returned to the
list of - available space.
59Garbage collection
- (lazy approach) Waits until all available cells
have - been allocated. A garbage collection process
starts - by setting indicators of all the cells to
indicate - they are garbage. Then every pointer in the
program - is traced, and all reachable cells are marked as
- not being garbage.
60Solutions to the Dangling Pointer Problem
- Two related solutions have been implemented.
First, - using extra heap cells,called tombstones( Lomet
1975)
61Locks and Keys approach
- In this case pointers are represented as a pair
- (key,address).
- Heap dynamic variables are represented as as a
storage for a variable plus a header cell that
stores an integer luck value. - When a heap-dynamic variable is allocated, a lock
value is created and placed in both in the lock
cell of variable and the pointer specified in the
call to new.
62HW 6
- 1.(Review Questions) Answer all the questions (
1 - 23) on the pp 212-213 from your textbook
- 2. Do all listed problems
- 5, 9, 11, 13 and 14
- On page 213-217 ( 5th Edition of
textbook ) - Assigned 02/26 /02
Due 03/05/02 - Please send the solutions via email to
- gmelikian_at_wpo.nccu.edu
- and hand in hard copy by the beginning of the
class