Chapter 6 Data Types presentation

About This Presentation

Title:

Chapter 6 Data Types

Description:

A data type defines a collection of data objects and a set of predefined ... A slice is some substructure of an array. It is nothing more than a referencing mechanism ... –

Number of Views:172

Avg rating:3.0/5.0

Slides: 52

Provided by: david2548

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 6 Data Types

1
Chapter 6Data Types

CS 350 Programming Language Design
Indiana University Purdue University Fort Wayne

2
Chapter 6 Topics

Introduction
Primitive data types
Character string types
User-defined ordinal types
Array types
Associative arrays
Record types
Union types
Pointer and reference types

3
Introduction

A data type defines a collection of data objects
and a set of predefined operations on those
objects
Evolution of data types
FORTRAN I (1957)
Just types for INTEGER, REAL, arrays
Ada (1983)
Programmer able to create a user-defined type for
every category of variables in the problem space
and have the system enforce the types
A descriptor is the collection of the attributes
of a variable

4
Primitive data types

Primitive types are not defined in terms of other
types
Integer
Almost always an exact reflection of the hardware
There may be as many as eight different integer
types in a language
Floating Point
Model real numbers, but only as approximations
Issues are precision and range
Languages for scientific use support at least two
floating-point types, sometimes more
Usually exactly like the hardware, but not always

5
Primitive data types

Decimal
For business applications (dollars and cents)
Store a fixed number of decimal digits (BCD)
Advantage is accuracy
Disadvantage Limited range and wastes memory
Boolean
Could be implemented as bits, but typically one
byte per Boolean
Advantage is readability
Character
Stored as numeric codings (e.g., ASCII, Unicode)

6
Character string types

Values are sequences of characters
Design issue
Is the string type primitive or an array of
characters?
It is not costly and more convenient to have the
string type be primitive
Pascal, C, and C strings are arrays of
characters
Fortran 90, Ada, and Basic are closer to
primitive with intrinsic string operations
Typical operations
Assignment
Comparison (, gt, etc.)
Catenation
Substring reference
Pattern matching

7
Character string types

Design issue
Should strings have static or dynamic length?
Ada has the following string types
String static
Bounded_String limited dynamic up to a maximum
length
Unbounded_String unlimited dynamic length
It is common for a language to have the first two
Implementation issues
Static length only requires a compile-time
descriptor
Limited dynamic length may need a run-time
descriptor for length
Instead, C and C terminate strings with the
null char
Dynamic length needs a run-time descriptor
Allocation / deallocation is the biggest
implementation problem

8
Character string descriptors
Compile-time descriptor for static strings
Run-time descriptor for limited dynamic strings
9
Ordinal types

An ordinal is one in which the range of possible
values can be easily associated with a subset of
positive integers
Examples of typical predefined ordinal types
Integer
Character
Boolean
We will consider the following user-defined
ordinal types
Enumeration type
Subrange type

10
Enumeration type

An enumeration type is one in which the user
enumerates all of the possible values
Values are symbolic constants (identifiers)
Example (Ada)

type Days is (Sunday, Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday) for today in
Tuesday .. Thursday loop ? ? ? end loop
11
Enumeration type

Design Issues
What operations are allowed for enumeration types
Ada has attribute operations
DaysFirst gives the first day
DaysLast gives the last day
DaysPos( today ) gives the Integer position in
the enum list
DaysVal( 3 ) gives the enum value associated
with position 3
DaysPred( today ) gives the predecessor of today
DaysSucc( today ) gives the successor of today
Should comparison operations , lt, lt, etc. be
allowed?
Should a symbolic constant be allowed to be in
more than one type definition (overloading)?
Is coercion performed to or from enumeration
values?

12
Enumeration choices

Pascal
Cannot overload enumeration constants
Enums can be used for array subscripts and case
selectors
Enums can be compared
No operations for input or output
C and C
Can be used like Pascal, but . . .
Coerced, as in today or as in int n today
Operations for input and output as integers
Ada
Can be used as in Pascal, but . . .
Enums may be overloaded
Context must make use clear or use special
notation
No coercion and allowed ranges are checked
Operations exist for input and output of
enumeration values in text form
C
No coercion and allowed ranges are checked

13
Enumeration type

Evaluation
Aid to readability
Names are easily recognized whereas coded values
are not
E.g. no need to code a color as a number
Aid to reliability
Compiler can check
Operations on enums
E.g. dont allow colors to be added
Ranges of allowed values
E.g. Ada detects the error in day DaysSucc(
Saturday )
Implementation
Enumeration types are implemented as integers

14
Subrange type

The subrange type is an ordered contiguous
subsequence of an ordinal type
Examples (Ada)

subtype Positive is Integer range 1 ..
Integer'Last subtype Natural is Integer range 0
.. Integer'Last subtype Index is Integer range
-100 .. 100 for next in Index loop ? ? ? end
loop type Days is (Sunday, Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday) subtype
Weekdays is Days range Monday .. Friday for
today in Weekdays loop ? ? ? end loop
15
Subrange type

Evaluation
Aid to readability
E.g. Can distinguish between a weekday and a
day
Reliability
Restricted ranges aid error detection
E.g. Saturday is not a valid weekday
Implementation
Subrange types are just the parent types with
check code (inserted by the compiler) to restrict
assignments to subrange values

16
Arrays

An array is an aggregate of indexed data elements
of the same type
Two types involved
Element type
Index type
Each individual element is identified by an index
to its position in the aggregate
Design Issues
What types are legal for subscripts?
Are subscripting expressions in element
references range checked?
When does binding occur for subscript ranges?
When does allocation take place?
What is the maximum number of subscripts?
Can array objects be initialized?
Are any kind of slices allowed?

17
Arrays

Index syntax
FORTRAN, PL/I, Ada use parentheses
Ada intentionally uses parentheses to make an
array reference look like a function call
n a( 23 )
Most other languages use brackets
Indexing is a storage mapping from the array
indices to elements
This mapping requires a run-time calculation to
reference memory

18
Array storage mapping example

Storage mapping for 2-dim array b
Row-wise allocation is used
Access code for access b i, j requires 2 adds
and 2 multiplies
w is the size of each cell in bytes

j
0 1 2 3 4
0 1 2 3 4 5 6

loc( b i, j )
loc( b )
w ( ( elements in previous rows)
( previous elements in row i) )
loc( b ) w( i ( columns) j )
loc( b ) 4( 5i j )

i
19
Arrays

Subscript types
FORTRAN, C, and Java
Integer only
Pascal and Ada
Any ordinal type
Integer, Boolean, Character, enum
Range checking
Java, ML, C check the range of all subscripts
C, C, Perl, Fortran do not
Ada checks by default but this can be disabled by
a compiler Pragma

20
Array binding and allocation

We consider the following categories of arrays
Static array
Fixed stack-dynamic array
Stack-dynamic array
Fixed heap-dynamic array
Heap-dynamic array
These are based on when the subscript ranges are
bound and when storage is allocated

21
Array binding and allocation

Static arrays
Range of subscripts and storage bindings are
static
e.g. FORTRAN 77, some arrays in Ada, C/C static
arrays
Advantage
Execution efficiency
No run-time overhead for allocation or
deallocation
Fixed stack-dynamic arrays
The range of subscripts is statically bound
Storage is bound at elaboration time
e.g. most local variable arrays
Advantage space efficiency

descriptor
22
Array binding and allocation

Stack-dynamic arrays
The index range and storage allocation are
dynamic, but fixed from then on for the
variables lifetime
Advantage flexibility
Size need not be known until the array is about
to be used
E.g. Ada declare blocks

n ltexpressiongt declare a array (1..n)
of Float begin ? ? ? end
23
Array binding and allocation

Fixed heap-dyamic arrays
Like stack-dynamic arrays except . . .
Storage allocated on the heap
The index range and storage allocation is
initiated by program request rather than
subprogram elaboration
E.g. all Java arrays
Heap-dynamic arrays
The subscript range and storage bindings are
dynamic and may subsequently be changed
Supported by Smalltalk (e.g. OrderedCollection)j
, APL, Pearl, JavaScript, FORTRAN 90, and C
ArrayList class

24
Arrays

Number of subscripts
FORTRAN I allowed up to three
FORTRAN 77 allows up to seven
Other languages - no limit
Array initialization
Some language permit initialization of arrays
Fortran C
Ada aggregates

Integer List( 3 ) Data List / 21, 67, 9 /
int list 21, 67, 9
list array( 1 .. 3 ) of Integer ( 21, 67, 9
) list array( 1 .. 100 ) of Integer ( 10 gt
21, 20 gt 67, 30 gt 9, others gt 0 ) list
array( 1..10, 1..3 ) of Integer (1 gt (1,2,3),
10 gt (4,5,6), others gt (0, 0,0))
25
Array operations

An array operation operates on an array or a part
of an array as a unit
Ada operations
Assignment
Catenation (1-dim only)
Equality () and inequality (/)
APL
Most powerful array-processing language ever
devised
Many array operations

26
Slices

A slice is some substructure of an array
It is nothing more than a referencing mechanism
Slices are only useful in languages that have
array operations
Fortran slices at right
Ada slices below

a array (1..100) of Float a( 1..50 ) a(
51..100)
27
Associative arrays

An associative array is an unordered collection
of data elements that are indexed by an equal
number of values called keys
Also called a . . .
Map
Key-value table
Dictionary
Perl example
An associative array is called a hash in Perl
Names begin with
Aggregate literals are delimited by parentheses
E.g. temps ("Monday" gt 77,"Tuesday" gt
79,)
Subscripting is done using braces and keys
E.g. temps "Wednesday 83
Elements can be removed with delete
E.g. delete temps "Tuesday

28
Records

A record is a aggregate of named data elements of
possibly diverse types
A compile-time descriptor for a record is at
right
The offset is from the record base address
Design Issues
What is the form of references?
What unit operations are defined?

a compile-time descriptor for a record
29
Records

Called the struct data type in C, C, and C
A class defines a record in Java and Smalltalk
Record declarations
COBOL uses level numbers to show nested records
Other languages use a recursive definition
Field references
COBOL
ltfieldNamegt OF ltrecordName2gt OFltrecordName1gt
Other languages use dot notation
ltrecordName1gt.ltrecordName2gt.ltfieldNamegt

30
Records

Fully qualified field references must include all
nested record names
Elliptical references allow leaving out record
names as long as the reference is unambiguous
Pascal provides a with clause to abbreviate
references

31
Record Operations

Assignment
Allowed in Pascal, Ada, and C if the types are
identical
In Ada, the RHS can be an record aggregate
constant
COBOL uses MOVE CORRESPONDING
Moves all fields in the source record to fields
with the same names in the destination record
Initialization
Allowed in Ada, using an aggregate constant
In Java, done by the constructor
Comparison
Ada has tests for equality and /

32
Arrays vrs. records

Access to array elements is much slower than
access to record fields
Each record field is accessed with a fixed offset
from the record base address
Array subscripts require run-time calculation

33
Union types

A union is a type whose variables are allowed to
store different type values at different times
during execution
Design issue for unions
How should type checking be done?
Examples
Fortran has EQUIVALENCE
No type checking
C and C have free unions
Not part of structs
Complete freedom from type checking
Pascal embeds unions in records
Design leads to ineffective type checking

34
Discriminated unions

Algol 68 and Ada use discriminated unions
This provides secure type checking
Ada
Ada embeds discriminated unions in records
One record field in called a discriminant or tag
The discriminant on in the example on the
following slide is Form

35
Ada example
type Shape is ( Circle, Triangle, Rectangle
) type Colors is ( Red, Green,Blue ) type
Figure( Form Shape ) is record Filled
Boolean Color Colors case Form is
when Circle gt Diameter
Float when Triangle gt
LeftSide Ingeger RightSide
Integer Angle Float
when Rectangle gt Height
Integer Width Integer
end case end record

The discriminant field Form may not be changed in
isolation
It may only be changed by assigning to the entire
record
This prevents the record fields from becoming
inconsistent

36
Ada example

Assignment using a record aggregate
Layout of record fields
Fields Diameter, LeftSide, RightSide, Angle,
Height and Width share the same bytes

Fig Figure Fig ( Filled gt true, Color gt
Blue, Form gt Rectangle, Height gt 12, Width gt 3
)
37
Pointer types

Pointer type values consist of memory addresses
and the special value nil (or null)
Pointers are used for
Indirect addressing
Management of heap-dynamic variables
These are anonymous variables

38
Pointer operations

Assignment operation
Sets a pointer to a useful address
Dereferencing operation
Interprets the pointer variable as representing
the object at the memory address contained in the
pointer variable
Thus, it applies one level of indirect addressing
Deallocation
Returns the heap-dynamic storage referred to by a
pointer to the system for reallocation

39
Problems with pointers

Dangling pointers
A dangling pointer refers to a heap-dynamic
variable that has been deallocated
To create a dangling pointer in Pascal with
explicit deallocation . . .
Allocate a heap-dynamic variable pointed to by p
Make an alias for the pointer q p
Explicitly deallocate the heap-dynamic variable
dispose( p )
Now q contains a dangling pointer

40
Problems with pointers

Lost heap-dynamic variables
A lost heap-dynamic variable is no longer
referenced by any program pointer and is
inaccessible
To create a lost heap-dynamic variable . . .
Allocate a heap-dynamic variable pointed to by p
Replace the pointer in p by a reference to some
other heap-dynamic variable p q
Now the first heap-dynamic variable is
inaccessible
The process of losing heap-dynamic variables is
called memory leakage

41
Pointers in C and C

Pointers in C and C are similar to addresses in
assembly language
Pointers may point virtually anywhere in memory
Pointer arithmetic is possible
Programmer is responsible for avoiding problems
of dangling pointers and lost heap-dynamic
variables

42
Pointers in C and C

Dereferencing is explicitly specified with the
operator
Reference type variables are constant pointers
specified with the operator
Reference pointers are always implicitly
dereferenced
Used for parameter passing
pass-by-reference

int count / defines count as an int variable
/ int
ptr / defines ptr as a reference to an int
variable / int sum ptr
sum / operator produces the address of sum
/ count ptr /
operator dereferences ptr and produces the
value in sum / ptr ptr 3 / increments
address in ptr by 12
/ int ref sum / ref is
constant pointer that creates an alias for sum
/ ref 23 / assigns 23 to sum
(implicitly dereferenced)
/
43
Pointers in Ada

Called access types
Used only for heap-dynamic variables
No pointer arithmetic
All access variables are initialized to null
This also provides reliability
Heap-dynamic variables may (implementation
option) be implicitly deallocated at the end of
the scope of a pointer type
Partially alleviates the problem of lost
heap-dynamic variables
Has an explicit deallocator Unchecked_Deallocati
on
Dangling pointer problem is possible

44
Pointers in Java

These are called reference types
Refer to heap-dynamic objects exclusively
No pointer arithmetic
All reference variables are initialized to null
No explicit deallocation
This prevents the dangling pointer problem
All objects are implicitly deallocated by garbage
collection
Garbage collection prevents the lost heap-dynamic
variable problem
Reference variables are implicitly dereferenced
whenever the dot notation is used, as in p.link

45
Dangling pointer problem

Without garbage collection, dangling pointers can
be detected using . . .
Tombstones
Locks and keys

46
Tombstones

Tombstone
An extra heap cell that is a pointer to the
heap-dynamic variable
The actual pointer variable points only at a
tombstone
When a heap-dynamic variable deallocated, the
tombstone remains but set to null

47
Locks and keys

The locks-and-keys technique represents pointer
values as a key-address pair
Each heap-dynamic variable is represented as
storage for the data plus a cell for the key
When heap-dynamic variable allocated, a lock
value is created and a copy is placed in both . .
.
A lock cell within the heap-dynamic variable
The key cell of pointer
When a heap-dynamic variable is deallocated, its
lock value is cleared
Every dereference must compare the key value in
the pointer to the lock in the heap-dynamic
variable

48
Heap management

Takes deallocation of heap-dynamic variables out
of the hands of programmers
Two popular solutions
Reference counters
Incremental and done when inaccessible cells are
created
Garbage collection
Occurs when available heap space runs out

49
Reference counters

The reference counter solution maintains a
counter in every cell
The counter stores the number of pointers
currently pointing at the cell
Whenever a pointer is changed . . .
The counter in the old target is decremented
The counter in the new target is incremented
When a counter decrements to zero, the
heap-dynamic variable is returned to the list of
available space
Disadvantages
Space required by the reference counters
Time overhead
Complications for cells in circular linked lists

50
Garbage collection

When heap storage is exhausted, perform garbage
collection as follows
Every heap cell has an extra bit used by the
garbage collection algorithm
All bits are initially cleared (assumed to be
garbage)
Starting with all program pointers, recursively
follow all pointers and mark any heap-dynamic
variable that can be reached
All unmarked variables are then returned to the
list of available heap cells