Chapter 6 Data Types

1 / 51
About This Presentation
Title:

Chapter 6 Data Types

Description:

A data type defines a collection of data objects and a set of predefined ... A slice is some substructure of an array. It is nothing more than a referencing mechanism ... –

Number of Views:172
Avg rating:3.0/5.0
Slides: 52
Provided by: david2548
Category:

less

Transcript and Presenter's Notes

Title: Chapter 6 Data Types


1
Chapter 6Data Types
  • CS 350 Programming Language Design
  • Indiana University Purdue University Fort Wayne

2
Chapter 6 Topics
  • Introduction
  • Primitive data types
  • Character string types
  • User-defined ordinal types
  • Array types
  • Associative arrays
  • Record types
  • Union types
  • Pointer and reference types

3
Introduction
  • A data type defines a collection of data objects
    and a set of predefined operations on those
    objects
  • Evolution of data types
  • FORTRAN I (1957)
  • Just types for INTEGER, REAL, arrays
  • Ada (1983)
  • Programmer able to create a user-defined type for
    every category of variables in the problem space
    and have the system enforce the types
  • A descriptor is the collection of the attributes
    of a variable

4
Primitive data types
  • Primitive types are not defined in terms of other
    types
  • Integer
  • Almost always an exact reflection of the hardware
  • There may be as many as eight different integer
    types in a language
  • Floating Point
  • Model real numbers, but only as approximations
  • Issues are precision and range
  • Languages for scientific use support at least two
    floating-point types, sometimes more
  • Usually exactly like the hardware, but not always

5
Primitive data types
  • Decimal
  • For business applications (dollars and cents)
  • Store a fixed number of decimal digits (BCD)
  • Advantage is accuracy
  • Disadvantage Limited range and wastes memory
  • Boolean
  • Could be implemented as bits, but typically one
    byte per Boolean
  • Advantage is readability
  • Character
  • Stored as numeric codings (e.g., ASCII, Unicode)

6
Character string types
  • Values are sequences of characters
  • Design issue
  • Is the string type primitive or an array of
    characters?
  • It is not costly and more convenient to have the
    string type be primitive
  • Pascal, C, and C strings are arrays of
    characters
  • Fortran 90, Ada, and Basic are closer to
    primitive with intrinsic string operations
  • Typical operations
  • Assignment
  • Comparison (, gt, etc.)
  • Catenation
  • Substring reference
  • Pattern matching

7
Character string types
  • Design issue
  • Should strings have static or dynamic length?
  • Ada has the following string types
  • String static
  • Bounded_String limited dynamic up to a maximum
    length
  • Unbounded_String unlimited dynamic length
  • It is common for a language to have the first two
  • Implementation issues
  • Static length only requires a compile-time
    descriptor
  • Limited dynamic length may need a run-time
    descriptor for length
  • Instead, C and C terminate strings with the
    null char
  • Dynamic length needs a run-time descriptor
  • Allocation / deallocation is the biggest
    implementation problem

8
Character string descriptors
Compile-time descriptor for static strings
Run-time descriptor for limited dynamic strings
9
Ordinal types
  • An ordinal is one in which the range of possible
    values can be easily associated with a subset of
    positive integers
  • Examples of typical predefined ordinal types
  • Integer
  • Character
  • Boolean
  • We will consider the following user-defined
    ordinal types
  • Enumeration type
  • Subrange type

10
Enumeration type
  • An enumeration type is one in which the user
    enumerates all of the possible values
  • Values are symbolic constants (identifiers)
  • Example (Ada)

type Days is (Sunday, Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday) for today in
Tuesday .. Thursday loop ? ? ? end loop
11
Enumeration type
  • Design Issues
  • What operations are allowed for enumeration types
  • Ada has attribute operations
  • DaysFirst gives the first day
  • DaysLast gives the last day
  • DaysPos( today ) gives the Integer position in
    the enum list
  • DaysVal( 3 ) gives the enum value associated
    with position 3
  • DaysPred( today ) gives the predecessor of today
  • DaysSucc( today ) gives the successor of today
  • Should comparison operations , lt, lt, etc. be
    allowed?
  • Should a symbolic constant be allowed to be in
    more than one type definition (overloading)?
  • Is coercion performed to or from enumeration
    values?

12
Enumeration choices
  • Pascal
  • Cannot overload enumeration constants
  • Enums can be used for array subscripts and case
    selectors
  • Enums can be compared
  • No operations for input or output
  • C and C
  • Can be used like Pascal, but . . .
  • Coerced, as in today or as in int n today
  • Operations for input and output as integers
  • Ada
  • Can be used as in Pascal, but . . .
  • Enums may be overloaded
  • Context must make use clear or use special
    notation
  • No coercion and allowed ranges are checked
  • Operations exist for input and output of
    enumeration values in text form
  • C
  • No coercion and allowed ranges are checked

13
Enumeration type
  • Evaluation
  • Aid to readability
  • Names are easily recognized whereas coded values
    are not
  • E.g. no need to code a color as a number
  • Aid to reliability
  • Compiler can check
  • Operations on enums
  • E.g. dont allow colors to be added
  • Ranges of allowed values
  • E.g. Ada detects the error in day DaysSucc(
    Saturday )
  • Implementation
  • Enumeration types are implemented as integers

14
Subrange type
  • The subrange type is an ordered contiguous
    subsequence of an ordinal type
  • Examples (Ada)

subtype Positive is Integer range 1 ..
Integer'Last subtype Natural is Integer range 0
.. Integer'Last subtype Index is Integer range
-100 .. 100 for next in Index loop ? ? ? end
loop type Days is (Sunday, Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday) subtype
Weekdays is Days range Monday .. Friday for
today in Weekdays loop ? ? ? end loop
15
Subrange type
  • Evaluation
  • Aid to readability
  • E.g. Can distinguish between a weekday and a
    day
  • Reliability
  • Restricted ranges aid error detection
  • E.g. Saturday is not a valid weekday
  • Implementation
  • Subrange types are just the parent types with
    check code (inserted by the compiler) to restrict
    assignments to subrange values

16
Arrays
  • An array is an aggregate of indexed data elements
    of the same type
  • Two types involved
  • Element type
  • Index type
  • Each individual element is identified by an index
    to its position in the aggregate
  • Design Issues
  • What types are legal for subscripts?
  • Are subscripting expressions in element
    references range checked?
  • When does binding occur for subscript ranges?
  • When does allocation take place?
  • What is the maximum number of subscripts?
  • Can array objects be initialized?
  • Are any kind of slices allowed?

17
Arrays
  • Index syntax
  • FORTRAN, PL/I, Ada use parentheses
  • Ada intentionally uses parentheses to make an
    array reference look like a function call
  • n a( 23 )
  • Most other languages use brackets
  • Indexing is a storage mapping from the array
    indices to elements
  • This mapping requires a run-time calculation to
    reference memory

18
Array storage mapping example
  • Storage mapping for 2-dim array b
  • Row-wise allocation is used
  • Access code for access b i, j requires 2 adds
    and 2 multiplies
  • w is the size of each cell in bytes

j
0 1 2 3 4
0 1 2 3 4 5 6
  • loc( b i, j )
  • loc( b )
  • w ( ( elements in previous rows)
  • ( previous elements in row i) )
  • loc( b ) w( i ( columns) j )
  • loc( b ) 4( 5i j )

i
19
Arrays
  • Subscript types
  • FORTRAN, C, and Java
  • Integer only
  • Pascal and Ada
  • Any ordinal type
  • Integer, Boolean, Character, enum
  • Range checking
  • Java, ML, C check the range of all subscripts
  • C, C, Perl, Fortran do not
  • Ada checks by default but this can be disabled by
    a compiler Pragma

20
Array binding and allocation
  • We consider the following categories of arrays
  • Static array
  • Fixed stack-dynamic array
  • Stack-dynamic array
  • Fixed heap-dynamic array
  • Heap-dynamic array
  • These are based on when the subscript ranges are
    bound and when storage is allocated

21
Array binding and allocation
  • Static arrays
  • Range of subscripts and storage bindings are
    static
  • e.g. FORTRAN 77, some arrays in Ada, C/C static
    arrays
  • Advantage
  • Execution efficiency
  • No run-time overhead for allocation or
    deallocation
  • Fixed stack-dynamic arrays
  • The range of subscripts is statically bound
  • Storage is bound at elaboration time
  • e.g. most local variable arrays
  • Advantage space efficiency

descriptor
22
Array binding and allocation
  • Stack-dynamic arrays
  • The index range and storage allocation are
    dynamic, but fixed from then on for the
    variables lifetime
  • Advantage flexibility
  • Size need not be known until the array is about
    to be used
  • E.g. Ada declare blocks

n ltexpressiongt declare a array (1..n)
of Float begin ? ? ? end
23
Array binding and allocation
  • Fixed heap-dyamic arrays
  • Like stack-dynamic arrays except . . .
  • Storage allocated on the heap
  • The index range and storage allocation is
    initiated by program request rather than
    subprogram elaboration
  • E.g. all Java arrays
  • Heap-dynamic arrays
  • The subscript range and storage bindings are
    dynamic and may subsequently be changed
  • Supported by Smalltalk (e.g. OrderedCollection)j
    , APL, Pearl, JavaScript, FORTRAN 90, and C
    ArrayList class

24
Arrays
  • Number of subscripts
  • FORTRAN I allowed up to three
  • FORTRAN 77 allows up to seven
  • Other languages - no limit
  • Array initialization
  • Some language permit initialization of arrays
  • Fortran C
  • Ada aggregates

Integer List( 3 ) Data List / 21, 67, 9 /
int list 21, 67, 9
list array( 1 .. 3 ) of Integer ( 21, 67, 9
) list array( 1 .. 100 ) of Integer ( 10 gt
21, 20 gt 67, 30 gt 9, others gt 0 ) list
array( 1..10, 1..3 ) of Integer (1 gt (1,2,3),
10 gt (4,5,6), others gt (0, 0,0))
25
Array operations
  • An array operation operates on an array or a part
    of an array as a unit
  • Ada operations
  • Assignment
  • Catenation (1-dim only)
  • Equality () and inequality (/)
  • APL
  • Most powerful array-processing language ever
    devised
  • Many array operations

26
Slices
  • A slice is some substructure of an array
  • It is nothing more than a referencing mechanism
  • Slices are only useful in languages that have
    array operations
  • Fortran slices at right
  • Ada slices below

a array (1..100) of Float a( 1..50 ) a(
51..100)
27
Associative arrays
  • An associative array is an unordered collection
    of data elements that are indexed by an equal
    number of values called keys
  • Also called a . . .
  • Map
  • Key-value table
  • Dictionary
  • Perl example
  • An associative array is called a hash in Perl
  • Names begin with
  • Aggregate literals are delimited by parentheses
  • E.g. temps ("Monday" gt 77,"Tuesday" gt
    79,)
  • Subscripting is done using braces and keys
  • E.g. temps "Wednesday 83
  • Elements can be removed with delete
  • E.g. delete temps "Tuesday

28
Records
  • A record is a aggregate of named data elements of
    possibly diverse types
  • A compile-time descriptor for a record is at
    right
  • The offset is from the record base address
  • Design Issues
  • What is the form of references?
  • What unit operations are defined?

a compile-time descriptor for a record
29
Records
  • Called the struct data type in C, C, and C
  • A class defines a record in Java and Smalltalk
  • Record declarations
  • COBOL uses level numbers to show nested records
  • Other languages use a recursive definition
  • Field references
  • COBOL
  • ltfieldNamegt OF ltrecordName2gt OFltrecordName1gt
  • Other languages use dot notation
  • ltrecordName1gt.ltrecordName2gt.ltfieldNamegt

30
Records
  • Fully qualified field references must include all
    nested record names
  • Elliptical references allow leaving out record
    names as long as the reference is unambiguous
  • Pascal provides a with clause to abbreviate
    references

31
Record Operations
  • Assignment
  • Allowed in Pascal, Ada, and C if the types are
    identical
  • In Ada, the RHS can be an record aggregate
    constant
  • COBOL uses MOVE CORRESPONDING
  • Moves all fields in the source record to fields
    with the same names in the destination record
  • Initialization
  • Allowed in Ada, using an aggregate constant
  • In Java, done by the constructor
  • Comparison
  • Ada has tests for equality and /

32
Arrays vrs. records
  • Access to array elements is much slower than
    access to record fields
  • Each record field is accessed with a fixed offset
    from the record base address
  • Array subscripts require run-time calculation

33
Union types
  • A union is a type whose variables are allowed to
    store different type values at different times
    during execution
  • Design issue for unions
  • How should type checking be done?
  • Examples
  • Fortran has EQUIVALENCE
  • No type checking
  • C and C have free unions
  • Not part of structs
  • Complete freedom from type checking
  • Pascal embeds unions in records
  • Design leads to ineffective type checking

34
Discriminated unions
  • Algol 68 and Ada use discriminated unions
  • This provides secure type checking
  • Ada
  • Ada embeds discriminated unions in records
  • One record field in called a discriminant or tag
  • The discriminant on in the example on the
    following slide is Form

35
Ada example
type Shape is ( Circle, Triangle, Rectangle
) type Colors is ( Red, Green,Blue ) type
Figure( Form Shape ) is record Filled
Boolean Color Colors case Form is
when Circle gt Diameter
Float when Triangle gt
LeftSide Ingeger RightSide
Integer Angle Float
when Rectangle gt Height
Integer Width Integer
end case end record
  • The discriminant field Form may not be changed in
    isolation
  • It may only be changed by assigning to the entire
    record
  • This prevents the record fields from becoming
    inconsistent

36
Ada example
  • Assignment using a record aggregate
  • Layout of record fields
  • Fields Diameter, LeftSide, RightSide, Angle,
    Height and Width share the same bytes

Fig Figure Fig ( Filled gt true, Color gt
Blue, Form gt Rectangle, Height gt 12, Width gt 3
)
37
Pointer types
  • Pointer type values consist of memory addresses
    and the special value nil (or null)
  • Pointers are used for
  • Indirect addressing
  • Management of heap-dynamic variables
  • These are anonymous variables

38
Pointer operations
  • Assignment operation
  • Sets a pointer to a useful address
  • Dereferencing operation
  • Interprets the pointer variable as representing
    the object at the memory address contained in the
    pointer variable
  • Thus, it applies one level of indirect addressing
  • Deallocation
  • Returns the heap-dynamic storage referred to by a
    pointer to the system for reallocation

39
Problems with pointers
  • Dangling pointers
  • A dangling pointer refers to a heap-dynamic
    variable that has been deallocated
  • To create a dangling pointer in Pascal with
    explicit deallocation . . .
  • Allocate a heap-dynamic variable pointed to by p
  • Make an alias for the pointer q p
  • Explicitly deallocate the heap-dynamic variable
    dispose( p )
  • Now q contains a dangling pointer

40
Problems with pointers
  • Lost heap-dynamic variables
  • A lost heap-dynamic variable is no longer
    referenced by any program pointer and is
    inaccessible
  • To create a lost heap-dynamic variable . . .
  • Allocate a heap-dynamic variable pointed to by p
  • Replace the pointer in p by a reference to some
    other heap-dynamic variable p q
  • Now the first heap-dynamic variable is
    inaccessible
  • The process of losing heap-dynamic variables is
    called memory leakage

41
Pointers in C and C
  • Pointers in C and C are similar to addresses in
    assembly language
  • Pointers may point virtually anywhere in memory
  • Pointer arithmetic is possible
  • Programmer is responsible for avoiding problems
    of dangling pointers and lost heap-dynamic
    variables

42
Pointers in C and C
  • Dereferencing is explicitly specified with the
    operator
  • Reference type variables are constant pointers
    specified with the operator
  • Reference pointers are always implicitly
    dereferenced
  • Used for parameter passing
  • pass-by-reference

int count / defines count as an int variable
/ int
ptr / defines ptr as a reference to an int
variable / int sum ptr
sum / operator produces the address of sum
/ count ptr /
operator dereferences ptr and produces the
value in sum / ptr ptr 3 / increments
address in ptr by 12
/ int ref sum / ref is
constant pointer that creates an alias for sum
/ ref 23 / assigns 23 to sum
(implicitly dereferenced)
/
43
Pointers in Ada
  • Called access types
  • Used only for heap-dynamic variables
  • No pointer arithmetic
  • All access variables are initialized to null
  • This also provides reliability
  • Heap-dynamic variables may (implementation
    option) be implicitly deallocated at the end of
    the scope of a pointer type
  • Partially alleviates the problem of lost
    heap-dynamic variables
  • Has an explicit deallocator Unchecked_Deallocati
    on
  • Dangling pointer problem is possible

44
Pointers in Java
  • These are called reference types
  • Refer to heap-dynamic objects exclusively
  • No pointer arithmetic
  • All reference variables are initialized to null
  • No explicit deallocation
  • This prevents the dangling pointer problem
  • All objects are implicitly deallocated by garbage
    collection
  • Garbage collection prevents the lost heap-dynamic
    variable problem
  • Reference variables are implicitly dereferenced
    whenever the dot notation is used, as in p.link

45
Dangling pointer problem
  • Without garbage collection, dangling pointers can
    be detected using . . .
  • Tombstones
  • Locks and keys

46
Tombstones
  • Tombstone
  • An extra heap cell that is a pointer to the
    heap-dynamic variable
  • The actual pointer variable points only at a
    tombstone
  • When a heap-dynamic variable deallocated, the
    tombstone remains but set to null

47
Locks and keys
  • The locks-and-keys technique represents pointer
    values as a key-address pair
  • Each heap-dynamic variable is represented as
    storage for the data plus a cell for the key
  • When heap-dynamic variable allocated, a lock
    value is created and a copy is placed in both . .
    .
  • A lock cell within the heap-dynamic variable
  • The key cell of pointer
  • When a heap-dynamic variable is deallocated, its
    lock value is cleared
  • Every dereference must compare the key value in
    the pointer to the lock in the heap-dynamic
    variable

48
Heap management
  • Takes deallocation of heap-dynamic variables out
    of the hands of programmers
  • Two popular solutions
  • Reference counters
  • Incremental and done when inaccessible cells are
    created
  • Garbage collection
  • Occurs when available heap space runs out

49
Reference counters
  • The reference counter solution maintains a
    counter in every cell
  • The counter stores the number of pointers
    currently pointing at the cell
  • Whenever a pointer is changed . . .
  • The counter in the old target is decremented
  • The counter in the new target is incremented
  • When a counter decrements to zero, the
    heap-dynamic variable is returned to the list of
    available space
  • Disadvantages
  • Space required by the reference counters
  • Time overhead
  • Complications for cells in circular linked lists

50
Garbage collection
  • When heap storage is exhausted, perform garbage
    collection as follows
  • Every heap cell has an extra bit used by the
    garbage collection algorithm
  • All bits are initially cleared (assumed to be
    garbage)
  • Starting with all program pointers, recursively
    follow all pointers and mark any heap-dynamic
    variable that can be reached
  • All unmarked variables are then returned to the
    list of available heap cells

51
Garbage collection
  • Disadvantage
  • When you need it most, it works the worst
  • You need it most when there is very little actual
    garbage left in the heap
  • The garbage collection algorithm is very time
    consuming in this situation
Write a Comment
User Comments (0)
About PowerShow.com