Title: Data Storage and Processing Methods
1- Data Storage and Processing Methods
2bits combine to form characters characters
combine to form fields fields combine to form
records records combine to form files files
combine to form the data base
3(No Transcript)
4A. Traditional Data File Organization 1. Two
ways an application maintains its own
files applications share certain
files "integrated system" Methods of file
organization Sequential Direct Access Indexed
Sequential (Next slide) Deficiencies - Data
redundancy, Inefficient use of
storage Possibility of conflicting
data Difficulty of system change (in integrated
systems)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9Data Base Management Organization 1. Characterist
ics a. Data is independent of applications
systems using it. Illustration 10-7 b. Provides
a schema and several subschema Illustration
10-11 b. Uses a data dictionary Illustration
10-12 1. Provides a query language 2. Data
Structures - how the DBMS software views the
data Illustration 10-13 a. Tree b. Network C.
Relational
10(No Transcript)
11The Database Administrator Systems analysts and
database analysts are becoming increasingly
involved in database design. However, in most
cases, they rarely make the final
recommendations. Who manages the database
environment? A database administrator (DBA)
oversees a staff of database specialists. These
specialists make the final recommendations on all
database designs because they have the global,
application- independent view that many systems
analysts lack. DBAs also load and maintain
databases, establish security controls, perform
backup and recovery, and maintain the DBMS
software. In addition, they plan and control
database definition to minimize redundancy and
keep track of where all data is stored and how
various systems use that data. In smaller shops,
a systems analyst may perform some or most of
these duties. In some organizations, the role of
the DBA is split between two administrators. The
database administrator manages the technical
environment of database. The data administrator
manages the data itself, keeping track of where
all data is stored and what programs and
end-users require access to what data.
12(No Transcript)
13(No Transcript)
14Query 2 What are the invoice numbers of all
sales made to D. Ainge, and who completed these
sales? This query would be written in SQL as
follows SELECT Sales Invoice , Salesperson,
Customer Name FROM Invoice, Customer WHERE Invoice
.Customer Customer. Customer AND
Customer Name 'D. Ainge' Query 3 How many
televisions were sold in October? This query
would be written in SOL as follows SELECT Sum
(quantity) FROM Line Item, Invoice,
Inventory WHERE Line Item.Item Inventory.Item
AND Description 'Television' AND Line
Item.Sales Invoice Invoice.Sales Invoice AND
Date BETWEEN 10/01/96 and 10/31/96
15(No Transcript)
16LOGICAL AND PHYSICAL VIEWS OF DATA. A database,
unlike a file, is intended to be shared by many
users. It is clear that the users all see the
data in different ways. We refer to the way a
user pictures and describes the data as a user
view. However, the problem is that different
users have different user views. These views are
examined by the systems analyst, and an overall
logical model of the database needs to be
developed. Finally, the logical model of the
database must be transformed into a corresponding
physical database design. Physical design is
involved with how data are stored and related, as
well as how they are accessed.
17Approaches to Data Processing A.Manual
Systems Illustration 10-15 B. Batch Processing
Systems Illustration 10-16 C. Batch
Systems with On-Line Inquiry Illustration 10-17
D.On-Line Real Time Systems Illustration 10-18
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Chapter 11
23I. Types of Files A. Classified by
Technology 1. Manual files 2. Computer files
24B. Classified by Content 1. Master File -
contains relatively permanent records 2. Transact
ion File records removed from file after
posting 3. Backup File duplicate copy of another
file 4. Archive File - kept for many years as a
record for historical or reference purposes
5. Scratch File a file used temporarily by
computer programs
25(No Transcript)
26(No Transcript)
27(No Transcript)
28Master files. Master files contain records for a
group of entities. The attributes may be updated
often, but the records themselves are relatively
permanent. These files tend to have large records
containing all the information about a data
entity. Each record usually contains a primary
key and several secondary keys. Often, master
files are stored as indexed or indexed sequential
files. While the analyst is free to arrange the
data elements within a master file in any order,
a standard arrangement is usually to place the
primary key field first, followed by descriptive
elements and, finally, by elements that reflect
the business and change frequently with business
activities. This procedure allows analysts or
other operational people to identify records
easily when a file is listed with a print
utility. Descriptive information is data that
does not change with business events, such as an
item description, customer name, address, or
employee department. These elements are usually
changed by maintenance programs using direct
access methods. Generally, these elements contain
alternate keys or indexes, and the data are in
display format. Business information elements
are those that periodically change with business
events, such as year-to-year gross pay, grade
point average, customer account balance, and the
customer date- of-last-purchase. These elements
are changed by update programs, which usually
read both files and sequentially match records
for efficiency. The record elements are modified
only when the data is in error by correcting
programs using random update methods. Often the
dollar amount fields are in a compressed data
format called packed decimal to save room on
files and to speed up program execution time.
29Transaction files. A transaction file is used to
enter changes that update the master file and
produce reports. Suppose a newspaper subscriber
master file needs to be updated the transaction
file would contain the subscriber number, a
transaction code such as E for extending the
subscription, C for canceling the subscription,
or A for address change. Then only information
relevant to the updating needs to be entered that
is, the length of renewal if E, and the address
if A. No additional information would be needed
if the subscription was canceled. The rest of the
information already exists in the master file. As
a result, transaction files are usually kept to a
minimum length. Transaction files may contain
several different types of records, such as the
three used for updating the newspaper
subscription master, with a code on the
transaction file indicating the type of
transaction.
30File Operations A. File Update - a change to
the balance data in a master file Illustration
11-7 B. File Maintenance - a change to reference
data in a master file C. Sort - re-sequencing
the records in a data file Illustration
11-8 D. Merge - combines two files into a third
file of the same sequence Illustration 11-9
31(No Transcript)
32(No Transcript)
33(No Transcript)
34SEQUENTIAL ORGANIZATION. When records are
physically in order in a file, the file is said
to be a sequential file. When a sequential file
is updated. it is necessary to go through the
entire file. Since records cannot be inserted in
the middle of the file, a sequential file is
usually copied over during the updating
process. Sequential master files are used when
the hardware requires it (remember that a
magnetic tape is a sequential device) or when the
normal access requires that most of the records
be accessed. In other words, when we need to read
or update only a few records, it is inefficient
to use a sequential structure, but when many
records need to be read or modified, sequential
organization would make sense. Sequential
organization is normally used for all types of
files except master files.
35(No Transcript)
36File Operations A. File Update - a change to
the balance data in a master file Illustration
11-7 B. File Maintenance - a change to reference
data in a master file C. Sort - re-sequencing
the records in a data file Illustration
11-8 D. Merge - combines two files into a third
file of the same sequence Illustration 11-9
37(No Transcript)
38(No Transcript)
39(No Transcript)
40HASHED FILE ORGANIZATION. Direct-access devices
also permit access to a given record by going
directly to its address. Since it is not feasible
to reserve a physical address for each possible
record, a method called hashing is used. Hashing
is the process of calculating an address from the
record key. Suppose that there were 500
employees in an organization and we wanted to use
the social security number as a key. It would be
inefficient to reserve 999,999,999 addresses, one
for each social security number. Therefore, we
could take the social security number and use it
to derive the address of the record. There are
many hashing techniques. A common one is to
divide the original number by a prime number that
approximates the storage locations and then to
use the remainder as the address, as follows
Begin with the social security number
053-46-8942. Then divide by 509, yielding 105047.
Note that 105047 multiplied by 509 does not equal
the original number it equals 53468923 instead.
The difference between the original number,
53468942, and the dividend, 53468923, is the
remainder, and it equals 19. The storage location
of the record for an employee whose social
security number is 053-46-8942 would thus be
19. A problem arises, however, when a person
with a different social security number (say,
472-38-4086) has the same remainder. When this
occurs, the second person's record has to be
placed in a special overflow area. (See Hashing
algorithm on page 343 of your text)
41INDEXED-SEQUENTIAL ORGANIZATION. A widely used
method of file organization is called
indexed-sequential organization, or ISAM (for
indexed sequential access method). In an ISAM
file, the records are arranged in blocks. The
records within blocks are stored in order
physically, but the blocks of records may be in
any order. Therefore, an index is needed to
locate the blocks of records. A newer
organizational format used for mainframe
computers is the VSAM (virtual storage access
method), a more modern and efficient method for
handling indexed-sequential files.
Indexed-sequential files allow programs to read
records directly (that is, randomly) without
reading other records in the file. Records
written using an indexed-sequential method are
placed in sequence within the file. When
organized in this way, records may be deleted or
rewritten without reading other records as well.
When a record is rewritten, it is physically
placed on the disk in the same location from
which the original record was obtained.
42(No Transcript)
43Audit trails should be established for all files.
An audit trail makes a permanent record of any
update -an added record, deleted record, or
changed record -that is made by a program. If
such changes are processed only in batches, the
program needs to print a report of the updates.
Today, on-line processing of updates is more
common. On-line programs need to post updates to
a special audit file that can be printed at
convenient times. Many file management systems
automatically create audit trails, relieving the
analyst and programmer of that responsibility.