Title: Module 5'1'1
1Module 5.1.1
- Data, Information, Knowledge and Processing
2Definition of Data
- Data is raw facts and figures
- Any alphanumeric character
- Data consists of raw values that, on their own
have no meaning
3Examples of Data
- 301083
- FB6RT78
- 43AB34YT
- Each of the above pieces of data has no meaning
4Definition of Information
- Information is processed data that is given
meaning by its context - It is data that has been processed into a form
that is useful - If a book company recorded each sale the facts
recorded are data. - If they combined them into a monthly sales
figure - That is information
5Formula for Information
Information
Data
Context
Meaning
6Example of Information
- Data 02221732
- This has no meaning or context
- Context It is a US Date
- This allows us to recognise it as 22nd February
1732 - It still has no meaning and is therefore not
information - Meaning The Birthday of George Washington
- This gives us all the elements required for
information
7Range of Definitions
- Meaning extracted by humans
- Message
- Semantic
- Syntactic
- Representation methods
8Representation Methods
- Graphically
- Numbers are easier to visualise in graphical
format - Symbols
- Language independent
- Universal recognition
- Some symbols may have different meanings so care
is needed - Some symbols are recognised but their meaning
- is not so well known.
9Semantic
- Covers the Meaning for the data the way that
meaning is attached to a statement. - Take the following statement
- "Fruit flies like a banana"
- Do small insects prefer a banana
- or does fruit glide through the air in a way
similar to a banana - Semantic representation of data is of importance
when attempting to code data
10Syntactic
- Concerning the Rules of the Data
- Data 10/12/90
- Rules dd/mm/yy
- Gives the date of 10th December 90
- Rules mm/dd/yy
- Gives the date of 12th October 90
11Semantic and Syntactic
- Remember
- Syntactic Rules
- Semantic Meaning
- Learn some examples
12Definition of Knowledge
- Knowledge is the result of interpreting
information - We need more tins of spaghetti hoops might be
the knowledge acquired from interpreting the
information given in the stock report. - We use knowledge to build up sets of rules
- E.g. It is hotter in July therefore we will sell
more ice cream, so we need to increase the order
for ice cream in June.
13Difference Between Information and Knowledge
- Information is based on facts
- Knowledge is based on rules, and these rules are
based on probabilities, not certainties - Double clicking an icon in Windows will open an
application - This is not information as it is not a certainty.
- Icons are pictures
- This is information not knowledge
14Concepts and Understanding
- Microsoft Windows 98 is an operating system
- This is information.
- To be knowledge you would need to have an
understanding of what an operating system is. - The concept of an operating system gives you an
understanding of what is meant by the statement.
15Data Types
- Boolean
- Can hold one of two values true/false, 1/0
- Integer
- Holds whole numbers only
- Real
- Holds decimal numbers
- Text/String
- Holds any alphanumeric character, can include
numbers and symbols
16Examples of Data Types
- Boolean
- Are you married?
- Integer
- For storing school years, e.g. 7,8,9,10,11,12,13
- Real
- For storing currency e.g. 10.56 (it cannot store
the currency symbol) - Text/String
- For storing any text, e.g. Name, address,
telephone number, postcode
17Sources of Data
- Gathered from original source
- From Indirect source
- Data passed on
- Data purchased
- By product of processing an original data set
- Archives
18Gathered from an original source
- Collected as part of a transaction in a shop
- e.g Credit Card Number
- Collected in a survey
- e.g. Recorded on an OMR form
- Recorded in an interview
- Collected using sensors
- E.g. weather station
- From an original source is where there is no
third party between the source of the data and
the collection device/person
19Indirect Source
- Data used for a purpose different to that for
which it was originally collected - E.g. a credit card firm uses data about each
transaction to bill the customer. If it then
used the data to find out about their spending
habits to send them focused adverts, then this is
using the data from an indirect data source. - Data Passed on/Purchased
- These are methods of acquiring the data and the
data will then be used in a method different to
that originally intended.
20By-Product of Processing
- Data produced by the processing of source data
- The source data from a supermarket might be the
number of cans of spaghetti hoops at the
beginning of the month and the number at the end.
- The result of processing is the number sold
during the month
21Archive
- Data which is not used frequently and has been
placed in an archive - E.g. Pupils who have left school are archived
- Any information that needs to be kept but is not
used frequently - Bills (utility gas, telephone)
- Past employees
22Effect of Quality of Data Source on Information
Produced
- Unreliable Questionnaires
- If the wrong individual has been asked then the
data will be accurate but cannot be relied upon
e.g. asking a five year old their views on
washing liquid. - Incomplete Data
- Goods can leave a store by many different ways
the main one is by sales which are recorded by
bar code readers. If the management only relied
on this data then their information produced
would be inaccurate. Goods could also be stolen,
or damaged for example.
23Effect of Quality of Data Source on Information
Produced
- GIGO
- Garbage in Garbage Out
- If the data source is corrupt, then the resulting
information produced will be corrupt
24Effect of Quality of Data Source on Information
Produced
- Factors affecting the quality of the data source
include - Relevance
- If the information is not relevant
- Age
- If the information is out of date
- Completeness
- If some of the information is missing
- Presentation
- If the information cannot be found because of the
way the it has been presented - Level of Detail
- Too much detail or too little both have an
effect
25Coding of Data
- This is changing the original data into a
shortened version in order to store it in the
computer. - Storing days of the week as Mo, Tu, We etc, or
months of the Year as Jan, Feb, Mar
26Problems of Coding Data
- Precision of data coarsened
- E.g. Light Blue coded as Blue
- The user needs to know the codes utilised
- If the user is not aware of the codes then they
cannot interpret the data - Coding of Value judgements
- E.g. Did you like the film? to be coded as a
judgement of 1-4. This will be coded differently
by different people and makes comparisons
difficult.
27Benefits of Coding Data
- Less storage space required
- If Tue is stored instead of Tuesday then not as
much storage space is required - Comparisons are shorted and can therefore be made
quicker, thus speeding up searches - As less data is being stored it is faster to
search and to make comparisons between pieces of
data - A limited number of codes exists aiding in
validation of input - With a limited number of codes it is easier to
match them against rules and make sure that only
codes that exist are entered - Codes can be easier to remember
- Short codes can be easier to remember than full
names
28Testing
- Every system must be reliable and the data it
produces trusted - This is done through testing
- Testing gives the users and management confidence
that the system works
29Purpose of Test Data
- Normal
- To test the system works under normal conditions
with normal data - Extreme
- This tests with accurate data but at the lower
and upper extremes of the range of data required - Erroneous
- This tests with incorrect data
30Importance of Testing
- To test the system under all conditions
- To emulate users and their actions and ensure the
system continues to work - To cover all potential actions and entries into
the system - To give users confidence in the system
- To allow the system to be signed off and payment
received
31Importance of Test Plans
- To ensure all avenues are covered and none
forgotten - To document the data used
- To list the actions taken
- To list the start point of any testing
- To enable all tests to be reproduced
- To list expected results
- To be able to tie expected results to actual
results
32Factors affecting quality of information
- Accuracy
- Relevance
- Age
- Completeness
- Presentation
- Level of Detail
33Verification
- Ensuring the source data is the same as the
object data - In other words, the contents of the piece of
paper in your hand are the same as the contents
entered into the computer. - Three methods of verification
- Computer verification
- You enter the data in twice and the computer
checks the entries. - Manual verification
- You enter the data in and check manually from the
screen against the source. - Lookup verification
- Having part of the data and retrieving the
rest/checking the rest by looking up the data on
a list - Postcode enter postcode to get street
34Verification (cont.)
- Designed to trap transcription errors
- Problems
- Manual
- Blurred eyes
- Computer
- May have made the same error and therefore it is
not picked up - Lookup
- List may be incomplete or contain incorrect data
- May be multiple values returned in lookup
postcode returns more than one address
35Validation
- Making sure that the data value entered is
sensible and reasonable - Types of Validation
- Field Presence Check
- Field Length Check
- Range Check
- Format/Picture Check
- Check Digit
36Types of Validation
- Field Presence Check
- Makes sure data has been entered into a field
- Called a required field in MSAccess
- Field Length Check
- Checks the number of characters entered (minimum
and maximum) - Range Check
- To check that the value entered is within a
pre-determined range.
37Types of Validation (cont.)
- Format/Picture Check
- Makes sure that the data entered follows a known
pattern (e.g. Postcodes, National Insurance
Numbers) - Check Digit
- Allows a number to be self checking the
computer applies a set of rules which determines
of the numbers entered are valid. (ISBN)
38Validation and Verification
- Can only ensure that the data is reasonable
- Cannot guarantee accuracy
- If the source data is wrong then the data entered
into the system will also be incorrect - Verification makes sure the source is the same as
the object - Validation makes sure the data is within
acceptable boundaries - Neither ensures accuracy of data
39Costs of Producing Information
- Information costs money to produce.
- Hardware
- To collect, analyse and output the data
- Storage space to hold the data
- Purchasing of equipment and updating the
equipment - Software
- Required to analyse the data and to report on the
results - Software licences
- Maintenance agreements
- Manpower
- People employed to collect or enter the data
- Maintenance of hardware and software
40Costs of Producing Information
- Additional Factors
- Training of staff
- User manuals
- Consumables
- Paper
- Toner cartridges
41Information as a Commodity
- Information is used for a variety of purposes
- Decision Making
- Planning
- Control
- Recording Transactions
- Measuring Performance
- Intended use affects its value.
- Costs must be balanced against the benefits
- the greater the benefit the higher the cost you
will be prepared to pay
42Rule Based Systems
- Humans interpret information to gain knowledge
- This knowledge is used as the basis for making
decisions - An expert/rule based system is used to support
the decision making process
43Rule Based System - Definition
- A rule based system is a computer program that
attempts to solve a problem in the same way as a
human expert - It has three components
- Knowledge base
- Inference engine
- User interface
44Rule Based Systems (cont.)
- They appear intelligent but are not
- Some expert systems are heuristic
- This means that they can increase the rule base
and knowledge base through experience, just as a
human expert does. - The knowledge base consists of If..Then rules
- E.g. IF it is raining THEN I need to take an
umbrella with me - Has a set of solutions
- Uses questions to narrow down the possible
answers until only left with one
45Reporting
- Business applications which produce standard
reports would take the data and present it in a
format which is readily understood - Take the total number of beans left in stock at
the end of a month and present a graph of stock
levels for the whole year. - A Rule base system would take the data and
analyse it to make deductions - Based on the data the system could recommend
amounts to order to ensure there is not a surplus
or a deficit of stock.
46Reporting
- Looking at the difference between
- Standard report
- For example, queries created in a database and
then a report is created based on the results of
the queries - Static
- Rule base system
- Can make recommendations based on data
extrapolate and look at trends to give probable
outcomes
47ICT Terms
- Input
- Taking data external to the current system and
entering it into the system. - Processing
- Manipulating the data into information usually
into a form understandable by the user doing
something with the data - Output
- Taking data within the system and presenting it
to the user, or in a format specified by the user
(e.g on disk, screen, paper, etc.)
48ICT Terms (cont.)
- Storage
- Holding either the input or the results of
processing for use at a later date. - Feedback
- Where the output of the system influences the
input. - There is a continuous loop of input resulting in
output which in turn affects the subsequent input.
49ICT Structure Diagram
50ICT Feedback Example
- Setting
- School registers via OMR sheets
- Input
- Taking of register in the morning is the pupil
present or absent? - Processing
- Input of register details into the system and
collating present attendance with pupils record
of attendance - Output
- At end of the week an absence list of unaccounted
absences for that pupil is produced, at the end
of each month a record of the pupils attendance
is produced.
51ICT Feedback Example (cont.)
- Storage
- The storage of the attendance data on the pupil
during their school career - Feedback
- Filling in the absence list with reasons, which
is re-input into the system - The next weeks absence list should be shorter
with fewer entries, with the absences with
reasons removed. - Negative feedback is where the system is stable.
- In the above example, stable is where none of the
absences are unaccounted for and the feedback
moves towards this state.