Title: Lecture 8 Steganography
1Lecture 8Steganography
CSCE 590 Forensic Computing
- References
- Disappearing Cryptography by Peter Wayner
- URL
- Digital Watermarking by Cox, Miller and Bloom
- URL
June 12, 2003
2Overview
- Steganography Overview
- Good Reasons for Secrecy
- Encryption Public Key, Private key
- PGP
- Information Theory
- How is steganography used?
- RSA ?
- Error Correction / Error Detection
- Hiding Information in Parity Bits
- Hiding Information in Texts
- Compression
- Basic Mimicry
- Using Grammars
- Hiding in the Noise
3Overview continued
- Anonymous remailers
- Toolkits for steganography
- Watermarks
4Steganography Introduction
- Etymology (origins of the terminology)
- Greek words
- Steganos which means covered
- Graphia which means writing
- Covered writing or concealed communication
- Hiding messages in pictures, and other messages.
5Good Reasons for Secrecy
- So you can seek counseling about deeply personal
problems such as suicidal thoughts. - So you can explore job possibilities without
revealing where you currently work and
potentially losing your job. - So you can turn a person in to the authorities
anonymously because you fear recriminations. - So you can protect your personal information from
being exploited. - So the police can communicate with undercover
agents infiltrating gangs.
6Encryption Public Key, Private key
- Classical cryptography
- Caesar cipher rotate alphabet
- Substitutions and transpositions
- Private Key or Secret key cryptography
- Well known algorithm varies with key
- Same key used for encryption and decryption
- If the key is lost all is lost
- Public Key Cryptography
- User As Public key is available to all and is
used to encrypt messages to be sent to A. - A uses the private key to decode
- What does steganography offer above this?
7Pretty Good Privacy (PGP)
8How is steganography used?
- Document Authentication
- Extra information added to document to verify
that is ia authentic, the real thing. - Strong Watermarks
- Creators of digital works of art, would like to
be able to stamp their movie or audio files for
authentication - Want the watermark to be non-reproduceble
- And to be non-removable by compression, cropping
etc. - Extra information, e.g., view once on
downloaded video - Document Tracking modification history
- Private Communication message for terrorists
broadcast by placing it in a picture on Ebay
9How is steganography accomplished?
- Use the noise
- Spread the information out
- Adopt a statistical profile
- Replace randomness
- Change the order
- Split information
- Hide the source
10Error Correction / Error Detection
- Extra information added for detection and
correction of errors - Parity bits
- Hamming codes single error correcting, double
error detecting
11Information Theory
- Claude Shannon founded the field of information
theory. - Example
- Consider a sequence of 8 bit bytes, how much
information is transmitted. - What if the sequence is a, b, a, a, b
just as and bs? - We could do this with bits 01001 0 for a and
1 for b - So how much information is really being
transmitted? - Entropy the measure of the amount of
information - If the information stream is characters x0, x1,
xn - With probabilities of occuring ?(xi) then the
entropy is - ? ?(xi) log1/?(xi)
12Entropy Example
- Data stream consists of 8 bit ASCII characters
with each character equally likely - ?(x) 1/256 for each character x
- ?(xi) log1/?(xi)
- 1/256 log 256
- 256(1/256) 8
- 8
13Entropy Example
- Data stream consists of 8 bit ASCII characters
with - ?(a) ?(b) ½
- and ?(x) 0 for x not either of a and b
- ?(xi) log1/?(xi)
- ½ log 2
- 2(1/2) 1
- 1
14Entropy Example
- 8 bit ASCII characters ?(a) ¼ ?(b) ¾
- and ?(x) 0 for x not either of a and b
- ?(xi) log1/?(xi)
- ¼ log(1/1/4) ¾ log(1/3/4)
- ¼ 2 ¾ log(1.333)
- .5 .75( )
15Entropy Example
- 8 bit ASCII characters with parity bit bits for
character parity - What is the entropy of a byte assuming all
characters are equally likely?
16Hiding Information in the Order of Things
- The purpose of a list is to convey information.
- But the order of the list might be used to convey
other information. - As a trivial example is we need a yes/no answer
to a question we need only one bit to be
transferred. - By transferring a shopping list
- In alphabetial order could convey YES
- Not alphabetial order could convey NO
- But we can do better than this!
17Kinds of Lists
- Shopping lists
- Favorite
- GIF colors
18Flexibase Notation
- Decimal ? digit j corresponds to 10j (numbering
from right starting with zero) - Flexibase ? digit j (numbering from the right
again) is in the range 0 to j1 - 0th digit can be 0 or 1
- 1st digit can be 0, 1, or 2
- 2nd digit can be 0, 1, 2, or 3
- Etc
- Conversion from flexibase to decimal
- 0th digit 1! 1st digit 2! 2nd digit 3!
- Example 53311 ?
19From a list to a number
- Let's begin with the master list say five
recording artists in alphabetical order - Abba
- Barry Manilow
- Cher
- Donny and Marie
- Eagles
20From a list to a number
- One ordering of favorites is
- Eagles
- Abba
- Barry Manilow
- Donny and Marie
- Cher
- How do we get this to be a number?
21Conversion from List to Flexibase
- Start with an alphabetical master list. Number
the items beginning with zero instead of 1. That
means Abba gets 0, Barry gets 1, Cher gets 2,
etc. - Take the first item from your arbitrary list and
find it in the alphabetical master list. In this
case, my first choice, the Eagles, is in position
4 at the bottom of the alphabetical master list.
This will be the left most digit in our
notational scheme. - Delete the Eagles from the list.
- Now find the second band from my list, Abba, in
the alphabetical list. It's first which means it
comes with the digit 0. This will be the next
digit in the value which now looks like 40. - Delete Abba. The alphabetical list now looks
like - Barry Manilow
- Cher
- Donny and Marie
- The third item on my list, Barry Manilow, is now
at the top of the alphabetical list. That means
it has a value of 0. After deleting it, the
number now looks like 400. - The fourth item, Donny and Marie, is second on
the list meaning it has value 1. - The final name, Cher, is ignored.
- This algorithm produces the value 4001
22Final Touches
- Flexibase ? decimal ? binary
- Encode message to binary ? decimal?flexibase ?
order of list
23Hiding Information in text
- http//www.wayner.org/texts/mimic/
- Mimicry Applet by PETER WAYNER
- This applet shows how data can be mutated into
innocent sounding plaintext with the push of a
button. In this case, the destination is a the
voiceover from a hypothetical baseball game
between two teams named the Blogs and the
Whappers. - The information is encoded by choosing the words,
the players and the action in the game. In some
cases, one message will lead to a string of
homeruns and in other cases a different message
will strike out three players in a row. See the
FAQ for more information.
24Mimicry Applet by PETER WAYNER demo
- Applet Layout
- Message Box
- Push for Mimicry button
- Mimic text
- push to Remove Mimicry button
- Unscramble text box
- Operation
- Enter secret message
- Push for mimicry button
- Read story send. Cut and paste to unscramble
text box. - Push remove mimicry button
25Mimicry Applet by PETER WAYNER demo
- Secret Message Test Wednesday
- Story
- Well Bob, Welcome to yet another game between the
Whappers and the Blogs here in scenic downtown
Blovonia . I think it is fair to say that there
is plenty of Blog Fever brewing in the stands as
the hometown comes out to root for its favorites.
It's a fine day for a game. Another new inning .
Ain't life great, Bob ? Nobody out yet . Now, Sal
Sauvignon swings the baseball bat to stretch and
enters the batter's box . The pitchers is winding
up to throw. No wood on that one . He's winding
up . What a toaster . No good. Definitely a ball
. Checks first base . Nothing. Winds up and
pitches a curve ball . He bounces one of the
ground into the first-baseman's glove . The
Whappers have only one out . Now, Sal Sauvignon
swings the baseball bat to stretch and enters the
batter's box . Here we go. Checks first base .
Nothing. Winds up and pitches a bouncing -
26How does the mimicry work?
- The mimic computations starts with a collection
of words and a set of rules for joining the words
together. These are often called a "grammar". - For instance, it might include words like
"Hello", "My name is ", "How are you?", "Larry",
"Moe", "Curly". The information would be encoded
by stringing the words together in a phrase. - For instance, "Hello, my name is Larry" would
hide the message 1, while "Hello, my name is
Curly." would hide the message 3. The choice of
the name hid the information.
27How is the message turned into choices?
- 26 characters space punctuation? digits?
- How many bits needed to represent?
- Then develop an encoding
28Grammar
- WeatherComment
- Hmm . Do you think it will rain ? /.1/
- What are the chances of rain today ? /.1/
- Nice weather as long as it doesn't rain . /.1/
- Well, if rain breaks out it will certainly change
things . /.1/ - You can really tell the mettle of a manager when
rain is threatened . /.1// - Announcer Bob /.1/ Ted /.1/ Mike /.1/ Rich /.
- DumbComment
- Some kind of Ballplayer, huh ? /.1/
- These guys came to play ball ./.1/
- What a game so far today ./.1/
- How about those players ./.1/
- Got to love baseball ./.1/ Hey, they're playing
the organ ./.1//
29Pulling it all together (Text Mimicry)
- Encode message in binary.
- Help ? 8 - 5 - 12 - 16
- 01000 00101 01100 10000
- Generate choices in story to match bit stream
- 1st choice of two ? 0
- 2nd choice of two ? 1
- 1st choice of four ? 00
- 2nd choice of four ? 01
- 3rd choice of four ? 01
- 4th choice of four ? 01
30Data Compression
- If the entropy of a file is less than its size
then we can compress it without losing
information. - Of course we need to be able to uncompress when
we need the data. - Lossless compression
- loses no data
- uncompression is perfect
- Used for text medical images
- Lossy compression
- Loses information
- Usually done with images and sound
- The goal is to compress, but not affect
perception - If you cant see the difference, whats the big
deal!
31Approaches to compression
- Probabilistic methods
- More frequently occurring letters represented by
shorter strings - Morse code e dot, p which occurs less
frequently is represented by dot-dash-dash-dot - Huffman codes
- Dictionary Methods
- Compiles a list of most frequently occuring words
or collections of bytes in a file and then
numbers them - Lempel-Ziv compression
- Wave Methods
- Compress the wave expansion, JPEG, JPEG2000, MPEG
- Fractal Methods
32Huffman Codes
- Wayner Table 5.1 and 5.2 p 72,73
- Huffman tree figure 5.2 p 76
33Steganography in images
- Pixel picture element
- Each pixel has red-blue-green intensity triple
- Greyscale 0 to 255 intensity
- 0 0000 0000 ? white
- 255 1111 1111 ? black
- So for any pixel drop the least significant
bit(s) and replace with coded information - 237 . 1 ? . x where x is the info we are
encoding - Replace least significant bits in images with
encoded information.