Title: Chapter 6 Data Types
1Chapter 6Data Types
- CS 350 Programming Language Design
- Indiana University Purdue University Fort Wayne
2Chapter 6 Topics
- Introduction
- Primitive data types
- Character string types
- User-defined ordinal types
- Array types
- Associative arrays
- Record types
- Union types
- Pointer and reference types
3Introduction
- A data type defines a collection of data objects
and a set of predefined operations on those
objects - Evolution of data types
- FORTRAN I (1957)
- Just types for INTEGER, REAL, arrays
- Ada (1983)
- Programmer able to create a user-defined type for
every category of variables in the problem space
and have the system enforce the types - A descriptor is the collection of the attributes
of a variable
4Primitive data types
- Primitive types are not defined in terms of other
types - Integer
- Almost always an exact reflection of the hardware
- There may be as many as eight different integer
types in a language - Floating Point
- Model real numbers, but only as approximations
- Issues are precision and range
- Languages for scientific use support at least two
floating-point types, sometimes more - Usually exactly like the hardware, but not always
5Primitive data types
- Decimal
- For business applications (dollars and cents)
- Store a fixed number of decimal digits (BCD)
- Advantage is accuracy
- Disadvantage Limited range and wastes memory
- Boolean
- Could be implemented as bits, but typically one
byte per Boolean - Advantage is readability
- Character
- Stored as numeric codings (e.g., ASCII, Unicode)
6Character string types
- Values are sequences of characters
- Design issue
- Is the string type primitive or an array of
characters? - It is not costly and more convenient to have the
string type be primitive - Pascal, C, and C strings are arrays of
characters - Fortran 90, Ada, and Basic are closer to
primitive with intrinsic string operations - Typical operations
- Assignment
- Comparison (, gt, etc.)
- Catenation
- Substring reference
- Pattern matching
7Character string types
- Design issue
- Should strings have static or dynamic length?
- Ada has the following string types
- String static
- Bounded_String limited dynamic up to a maximum
length - Unbounded_String unlimited dynamic length
- It is common for a language to have the first two
- Implementation issues
- Static length only requires a compile-time
descriptor - Limited dynamic length may need a run-time
descriptor for length - Instead, C and C terminate strings with the
null char - Dynamic length needs a run-time descriptor
- Allocation / deallocation is the biggest
implementation problem
8Character string descriptors
Compile-time descriptor for static strings
Run-time descriptor for limited dynamic strings
9Ordinal types
- An ordinal is one in which the range of possible
values can be easily associated with a subset of
positive integers - Examples of typical predefined ordinal types
- Integer
- Character
- Boolean
- We will consider the following user-defined
ordinal types - Enumeration type
- Subrange type
10Enumeration type
- An enumeration type is one in which the user
enumerates all of the possible values - Values are symbolic constants (identifiers)
- Example (Ada)
type Days is (Sunday, Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday) for today in
Tuesday .. Thursday loop ? ? ? end loop
11Enumeration type
- Design Issues
- What operations are allowed for enumeration types
- Ada has attribute operations
- DaysFirst gives the first day
- DaysLast gives the last day
- DaysPos( today ) gives the Integer position in
the enum list - DaysVal( 3 ) gives the enum value associated
with position 3 - DaysPred( today ) gives the predecessor of today
- DaysSucc( today ) gives the successor of today
- Should comparison operations , lt, lt, etc. be
allowed? - Should a symbolic constant be allowed to be in
more than one type definition (overloading)? - Is coercion performed to or from enumeration
values?
12Enumeration choices
- Pascal
- Cannot overload enumeration constants
- Enums can be used for array subscripts and case
selectors - Enums can be compared
- No operations for input or output
- C and C
- Can be used like Pascal, but . . .
- Coerced, as in today or as in int n today
- Operations for input and output as integers
- Ada
- Can be used as in Pascal, but . . .
- Enums may be overloaded
- Context must make use clear or use special
notation - No coercion and allowed ranges are checked
- Operations exist for input and output of
enumeration values in text form - C
- No coercion and allowed ranges are checked
13Enumeration type
- Evaluation
- Aid to readability
- Names are easily recognized whereas coded values
are not - E.g. no need to code a color as a number
- Aid to reliability
- Compiler can check
- Operations on enums
- E.g. dont allow colors to be added
- Ranges of allowed values
- E.g. Ada detects the error in day DaysSucc(
Saturday ) - Implementation
- Enumeration types are implemented as integers
14Subrange type
- The subrange type is an ordered contiguous
subsequence of an ordinal type - Examples (Ada)
subtype Positive is Integer range 1 ..
Integer'Last subtype Natural is Integer range 0
.. Integer'Last subtype Index is Integer range
-100 .. 100 for next in Index loop ? ? ? end
loop type Days is (Sunday, Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday) subtype
Weekdays is Days range Monday .. Friday for
today in Weekdays loop ? ? ? end loop
15Subrange type
- Evaluation
- Aid to readability
- E.g. Can distinguish between a weekday and a
day - Reliability
- Restricted ranges aid error detection
- E.g. Saturday is not a valid weekday
- Implementation
- Subrange types are just the parent types with
check code (inserted by the compiler) to restrict
assignments to subrange values
16Arrays
- An array is an aggregate of indexed data elements
of the same type - Two types involved
- Element type
- Index type
- Each individual element is identified by an index
to its position in the aggregate - Design Issues
- What types are legal for subscripts?
- Are subscripting expressions in element
references range checked? - When does binding occur for subscript ranges?
- When does allocation take place?
- What is the maximum number of subscripts?
- Can array objects be initialized?
- Are any kind of slices allowed?
17Arrays
- Index syntax
- FORTRAN, PL/I, Ada use parentheses
- Ada intentionally uses parentheses to make an
array reference look like a function call - n a( 23 )
- Most other languages use brackets
- Indexing is a storage mapping from the array
indices to elements - This mapping requires a run-time calculation to
reference memory
18Array storage mapping example
- Storage mapping for 2-dim array b
- Row-wise allocation is used
- Access code for access b i, j requires 2 adds
and 2 multiplies - w is the size of each cell in bytes
j
0 1 2 3 4
0 1 2 3 4 5 6
- loc( b i, j )
- loc( b )
- w ( ( elements in previous rows)
- ( previous elements in row i) )
- loc( b ) w( i ( columns) j )
- loc( b ) 4( 5i j )
i
19Arrays
- Subscript types
- FORTRAN, C, and Java
- Integer only
- Pascal and Ada
- Any ordinal type
- Integer, Boolean, Character, enum
- Range checking
- Java, ML, C check the range of all subscripts
- C, C, Perl, Fortran do not
- Ada checks by default but this can be disabled by
a compiler Pragma
20Array binding and allocation
- We consider the following categories of arrays
- Static array
- Fixed stack-dynamic array
- Stack-dynamic array
- Fixed heap-dynamic array
- Heap-dynamic array
- These are based on when the subscript ranges are
bound and when storage is allocated
21Array binding and allocation
- Static arrays
- Range of subscripts and storage bindings are
static - e.g. FORTRAN 77, some arrays in Ada, C/C static
arrays - Advantage
- Execution efficiency
- No run-time overhead for allocation or
deallocation - Fixed stack-dynamic arrays
- The range of subscripts is statically bound
- Storage is bound at elaboration time
- e.g. most local variable arrays
- Advantage space efficiency
descriptor
22Array binding and allocation
- Stack-dynamic arrays
- The index range and storage allocation are
dynamic, but fixed from then on for the
variables lifetime - Advantage flexibility
- Size need not be known until the array is about
to be used - E.g. Ada declare blocks
n ltexpressiongt declare a array (1..n)
of Float begin ? ? ? end
23Array binding and allocation
- Fixed heap-dyamic arrays
- Like stack-dynamic arrays except . . .
- Storage allocated on the heap
- The index range and storage allocation is
initiated by program request rather than
subprogram elaboration - E.g. all Java arrays
- Heap-dynamic arrays
- The subscript range and storage bindings are
dynamic and may subsequently be changed - Supported by Smalltalk (e.g. OrderedCollection)j
, APL, Pearl, JavaScript, FORTRAN 90, and C
ArrayList class
24Arrays
- Number of subscripts
- FORTRAN I allowed up to three
- FORTRAN 77 allows up to seven
- Other languages - no limit
- Array initialization
- Some language permit initialization of arrays
- Fortran C
- Ada aggregates
Integer List( 3 ) Data List / 21, 67, 9 /
int list 21, 67, 9
list array( 1 .. 3 ) of Integer ( 21, 67, 9
) list array( 1 .. 100 ) of Integer ( 10 gt
21, 20 gt 67, 30 gt 9, others gt 0 ) list
array( 1..10, 1..3 ) of Integer (1 gt (1,2,3),
10 gt (4,5,6), others gt (0, 0,0))
25Array operations
- An array operation operates on an array or a part
of an array as a unit - Ada operations
- Assignment
- Catenation (1-dim only)
- Equality () and inequality (/)
- APL
- Most powerful array-processing language ever
devised - Many array operations
26Slices
- A slice is some substructure of an array
- It is nothing more than a referencing mechanism
- Slices are only useful in languages that have
array operations - Fortran slices at right
- Ada slices below
a array (1..100) of Float a( 1..50 ) a(
51..100)
27Associative arrays
- An associative array is an unordered collection
of data elements that are indexed by an equal
number of values called keys - Also called a . . .
- Map
- Key-value table
- Dictionary
- Perl example
- An associative array is called a hash in Perl
- Names begin with
- Aggregate literals are delimited by parentheses
- E.g. temps ("Monday" gt 77,"Tuesday" gt
79,) - Subscripting is done using braces and keys
- E.g. temps "Wednesday 83
- Elements can be removed with delete
- E.g. delete temps "Tuesday
28Records
- A record is a aggregate of named data elements of
possibly diverse types - A compile-time descriptor for a record is at
right - The offset is from the record base address
- Design Issues
- What is the form of references?
- What unit operations are defined?
a compile-time descriptor for a record
29Records
- Called the struct data type in C, C, and C
- A class defines a record in Java and Smalltalk
- Record declarations
- COBOL uses level numbers to show nested records
- Other languages use a recursive definition
- Field references
- COBOL
- ltfieldNamegt OF ltrecordName2gt OFltrecordName1gt
- Other languages use dot notation
- ltrecordName1gt.ltrecordName2gt.ltfieldNamegt
30Records
- Fully qualified field references must include all
nested record names - Elliptical references allow leaving out record
names as long as the reference is unambiguous - Pascal provides a with clause to abbreviate
references
31Record Operations
- Assignment
- Allowed in Pascal, Ada, and C if the types are
identical - In Ada, the RHS can be an record aggregate
constant - COBOL uses MOVE CORRESPONDING
- Moves all fields in the source record to fields
with the same names in the destination record - Initialization
- Allowed in Ada, using an aggregate constant
- In Java, done by the constructor
- Comparison
- Ada has tests for equality and /
32Arrays vrs. records
- Access to array elements is much slower than
access to record fields - Each record field is accessed with a fixed offset
from the record base address - Array subscripts require run-time calculation
33Union types
- A union is a type whose variables are allowed to
store different type values at different times
during execution - Design issue for unions
- How should type checking be done?
- Examples
- Fortran has EQUIVALENCE
- No type checking
- C and C have free unions
- Not part of structs
- Complete freedom from type checking
- Pascal embeds unions in records
- Design leads to ineffective type checking
34Discriminated unions
- Algol 68 and Ada use discriminated unions
- This provides secure type checking
- Ada
- Ada embeds discriminated unions in records
- One record field in called a discriminant or tag
- The discriminant on in the example on the
following slide is Form
35Ada example
type Shape is ( Circle, Triangle, Rectangle
) type Colors is ( Red, Green,Blue ) type
Figure( Form Shape ) is record Filled
Boolean Color Colors case Form is
when Circle gt Diameter
Float when Triangle gt
LeftSide Ingeger RightSide
Integer Angle Float
when Rectangle gt Height
Integer Width Integer
end case end record
- The discriminant field Form may not be changed in
isolation - It may only be changed by assigning to the entire
record - This prevents the record fields from becoming
inconsistent
36Ada example
- Assignment using a record aggregate
- Layout of record fields
- Fields Diameter, LeftSide, RightSide, Angle,
Height and Width share the same bytes
Fig Figure Fig ( Filled gt true, Color gt
Blue, Form gt Rectangle, Height gt 12, Width gt 3
)
37Pointer types
- Pointer type values consist of memory addresses
and the special value nil (or null) - Pointers are used for
- Indirect addressing
- Management of heap-dynamic variables
- These are anonymous variables
38Pointer operations
- Assignment operation
- Sets a pointer to a useful address
- Dereferencing operation
- Interprets the pointer variable as representing
the object at the memory address contained in the
pointer variable - Thus, it applies one level of indirect addressing
- Deallocation
- Returns the heap-dynamic storage referred to by a
pointer to the system for reallocation
39Problems with pointers
- Dangling pointers
- A dangling pointer refers to a heap-dynamic
variable that has been deallocated - To create a dangling pointer in Pascal with
explicit deallocation . . . - Allocate a heap-dynamic variable pointed to by p
- Make an alias for the pointer q p
- Explicitly deallocate the heap-dynamic variable
dispose( p ) - Now q contains a dangling pointer
40Problems with pointers
- Lost heap-dynamic variables
- A lost heap-dynamic variable is no longer
referenced by any program pointer and is
inaccessible - To create a lost heap-dynamic variable . . .
- Allocate a heap-dynamic variable pointed to by p
- Replace the pointer in p by a reference to some
other heap-dynamic variable p q - Now the first heap-dynamic variable is
inaccessible - The process of losing heap-dynamic variables is
called memory leakage
41Pointers in C and C
- Pointers in C and C are similar to addresses in
assembly language - Pointers may point virtually anywhere in memory
- Pointer arithmetic is possible
- Programmer is responsible for avoiding problems
of dangling pointers and lost heap-dynamic
variables
42Pointers in C and C
- Dereferencing is explicitly specified with the
operator - Reference type variables are constant pointers
specified with the operator - Reference pointers are always implicitly
dereferenced - Used for parameter passing
- pass-by-reference
int count / defines count as an int variable
/ int
ptr / defines ptr as a reference to an int
variable / int sum ptr
sum / operator produces the address of sum
/ count ptr /
operator dereferences ptr and produces the
value in sum / ptr ptr 3 / increments
address in ptr by 12
/ int ref sum / ref is
constant pointer that creates an alias for sum
/ ref 23 / assigns 23 to sum
(implicitly dereferenced)
/
43Pointers in Ada
- Called access types
- Used only for heap-dynamic variables
- No pointer arithmetic
- All access variables are initialized to null
- This also provides reliability
- Heap-dynamic variables may (implementation
option) be implicitly deallocated at the end of
the scope of a pointer type - Partially alleviates the problem of lost
heap-dynamic variables - Has an explicit deallocator Unchecked_Deallocati
on - Dangling pointer problem is possible
44Pointers in Java
- These are called reference types
- Refer to heap-dynamic objects exclusively
- No pointer arithmetic
- All reference variables are initialized to null
- No explicit deallocation
- This prevents the dangling pointer problem
- All objects are implicitly deallocated by garbage
collection - Garbage collection prevents the lost heap-dynamic
variable problem - Reference variables are implicitly dereferenced
whenever the dot notation is used, as in p.link
45Dangling pointer problem
- Without garbage collection, dangling pointers can
be detected using . . . - Tombstones
- Locks and keys
46Tombstones
- Tombstone
- An extra heap cell that is a pointer to the
heap-dynamic variable - The actual pointer variable points only at a
tombstone - When a heap-dynamic variable deallocated, the
tombstone remains but set to null
47Locks and keys
- The locks-and-keys technique represents pointer
values as a key-address pair - Each heap-dynamic variable is represented as
storage for the data plus a cell for the key - When heap-dynamic variable allocated, a lock
value is created and a copy is placed in both . .
. - A lock cell within the heap-dynamic variable
- The key cell of pointer
- When a heap-dynamic variable is deallocated, its
lock value is cleared - Every dereference must compare the key value in
the pointer to the lock in the heap-dynamic
variable
48Heap management
- Takes deallocation of heap-dynamic variables out
of the hands of programmers - Two popular solutions
- Reference counters
- Incremental and done when inaccessible cells are
created - Garbage collection
- Occurs when available heap space runs out
49Reference counters
- The reference counter solution maintains a
counter in every cell - The counter stores the number of pointers
currently pointing at the cell - Whenever a pointer is changed . . .
- The counter in the old target is decremented
- The counter in the new target is incremented
- When a counter decrements to zero, the
heap-dynamic variable is returned to the list of
available space - Disadvantages
- Space required by the reference counters
- Time overhead
- Complications for cells in circular linked lists
50Garbage collection
- When heap storage is exhausted, perform garbage
collection as follows - Every heap cell has an extra bit used by the
garbage collection algorithm - All bits are initially cleared (assumed to be
garbage) - Starting with all program pointers, recursively
follow all pointers and mark any heap-dynamic
variable that can be reached - All unmarked variables are then returned to the
list of available heap cells
51Garbage collection
- Disadvantage
- When you need it most, it works the worst
- You need it most when there is very little actual
garbage left in the heap - The garbage collection algorithm is very time
consuming in this situation