Digital Forensics - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Digital Forensics

Description:

1. Disks collected are imaged onto into a single AFF file. ( AFF is the Advanced Forensic Format, a file format for disk images that contains ... – PowerPoint PPT presentation

Number of Views:297
Avg rating:3.0/5.0
Slides: 34
Provided by: chrisc8
Category:
Tags: aff | digital | forensics

less

Transcript and Presenter's Notes

Title: Digital Forensics


1
Digital Forensics
  • Dr. Bhavani Thuraisingham
  • The University of Texas at Dallas
  • Lecture 27
  • Evidence Correlation
  • October 31, 2007

2
Outline
  • Review of Lectures 26
  • Discussion of the papers on Evidence Correlation

3
Review of Lecture 26
  • FORZA Digital forensics investigation framework
    that incorporate legal issues
  • http//dfrws.org/2006/proceedings/4-Ieong.pdf
  • A cyber forensics ontology Creating a new
    approach to studying cyber forensics
  • http//dfrws.org/2006/proceedings/5-Brinson.pdf
  • Arriving at an anti-forensics consensus
    Examining how to define and control the
    anti-forensics problem
  • http//dfrws.org/2006/proceedings/6-Harris.pdf

4
Papers to discuss
  • Forensic feature extraction and cross-drive
    analysis
  • http//dfrws.org/2006/proceedings/10-Garfinkel.pdf
  • md5bloom Forensic file system hashing revisited
    (OPTIONAL)
  • http//dfrws.org/2006/proceedings/11-Roussev.pdf
  • Identifying almost identical files using context
    triggered piecewise hashing (OPTIONAL)
  • http//dfrws.org/2006/proceedings/12-Kornblum.pdf
  • A correlation method for establishing provenance
    of timestamps in digital evidence
  • http//dfrws.org/2006/proceedings/13-20Schatz.pdf

5
Abstract of Paper 1
  • This paper introduces Forensic Feature Extraction
    (FFE) and Cross-Drive Analysis (CDA), two new
    approaches for analyzing large data sets of disk
    images and other forensic data. FFE uses a
    variety of lexigraphic techniques for extracting
    information from bulk data CDA uses statistical
    techniques for correlating this information
    within a single disk image and across multiple
    disk images. An architecture for these techniques
    is presented that consists of five discrete
    steps imaging, feature extraction, first-order
    cross-drive analysis, cross-drive correlation,
    and report generation. CDA was used to analyze
    750 images of drives acquired on the secondary
    market it automatically identified drives
    containing a high concentration of confidential
    financial records as well as clusters of drives
    that came from the same organization. FFE and CDA
    are promising techniques for prioritizing work
    and automatically identifying members of social
    networks under investigation. Authors believe it
    is likely to have other uses as well.

6
Outline
  • Introduction
  • Forensics Feature Extraction
  • Single Drive Analysis
  • Cross drive analysis
  • Implementation
  • Directions

7
Introduction Why?
  • Improper prioritization. In these days of cheap
    storage and fast computers, the critical resource
    to be optimized is the attention of the examiner
    or analyst. Today work is not prioritized based
    on the information that the drive contains.
  • Lost opportunities for data correlation. Because
    each drive is examined independently, there is no
    opportunity to automatically connect the dots
    on a large case involving multiple storage
    devices.
  • Improper emphasis on document recovery. Because
    todays forensic tools are based on document
    recovery, they have taught examiners, analysts,
    and customers to be primarily concerned with
    obtaining documents.

8
Feature Extraction
  • An email address extractor, which can recognize
    RFC822- style email addresses.
  • An email Message-ID extractor.
  • An email Subject extractor.
  • A Date extractor, which can extract date and
    time stamps in a variety of formats.
  • A cookie extractor, which can identify cookies
    from the Set-Cookie header in web page cache
    files.
  • A US social security number extractor, which
    identifies the patterns -- and
    when preceded with the letters SSN and an
    optional colon.
  • A Credit card number extractor.

9
Single Drive analysis
  • Extracted features can be used to speed initial
    analysis and answer specific questions about a
    drive image.
  • Authors have successfully used extracted features
    for drive image attribution and to build a tool
    that scans disks to report the likely existence
    of information that should have been destroyed
    under Fair and Accurate Credit Transactions Act
  • Drive attribution an analyst might encounter a
    hard drive and wish to determine to whom that
    drive previously belonged. For example, the drive
    might have been purchased on eBay and the analyst
    might be attempting to return it to its previous
    owner.
  • powerful technique for making this determination
    is to create a histogram of the email addresses
    on the drive (as returned by the email address
    feature extractor).

10
Cross drive analysis (CDA)
  • Cross-drive analysis is the term that coined to
    describe forensic analysis of a data set that
    spans multiple drives.
  • The fundamental theory of cross-drive analysis is
    data gleaned from multiple drives can improve the
    forensic analysis of a drive in question both in
    the case when the multiple drives are related to
    the drive in question and in the case when they
    are not.
  • two forms of CDA first order, in which the
    results of a feature extractor are compared
    across multiple drives, an O(n) operation and
    second order, where the results are correlated,
    an O(n2) operation.

11
Implementation
  • 1. Disks collected are imaged onto into a single
    AFF file. (AFF is the Advanced Forensic Format, a
    file format for disk images that contains all of
    the data accession information, such as the
    drives manufacturer and serial number, as well
    as the disk contents)
  • 2. The afxml program is used to extract drive
    metadata from the AFF file and build an entry in
    the SQL database.
  • 3. Strings are extracted with an AFF-aware
    program in three passes, one for 8-bit
    characters, one for 16-bit characters in lsb
    format, and one for 16-bit characters in msb
    format.
  • 4. Feature extractors run over the string files
    and write their results to feature files.
  • 5. Extracted features from newly-ingested drives
    are run against a watch list hits are reported
    to the human operator.
  • 6. The feature files are read by indexers, which
    build indexes in the SQL server of the identified
    features.

12
Implementation
  • 7. A multi-drive correlation is run to see if the
    newly accessioned drive contained features in
    common with any drives that are on a drive watch
    list.
  • 8. A user interface allows multiple analysts to
    simultaneously interact with the database, to
    schedule new correlations to be run in a batch
    mode, or to view individual sectors or recovered
    files from the drive images that are stored on
    the file server.

13
Directions
  • Improve feature extraction
  • Improve the algorithms
  • Develop end to end systems

14
Abstract of Paper 2
  • Hashing is a fundamental tool in digital forensic
    analysis used both to ensure data integrity and
    to efficiently identify known data objects.
    Authors objective is to leverage advanced hashing
    techniques in order to improve the efficiency and
    scalability of digital forensic analysis. They
    explore the use of Bloom filters as a means to
    efficiently aggregate and search hashing
    information. In They present md5bloo a Bloom
    filter manipulation tool that can be incorporated
    into forensic practice, along with example uses
    and experimental results.

15
Outline
  • Introduction
  • Bloom filter
  • Applications
  • Directions

16
Introduction
  • The goal is to pick from a set of forensic images
    the one(s) that are most like (or perhaps most
    unlike) a particular target.
  • This problem comes up in a number of different
    variations, such as comparing the target with
    previous/related cases, or determining the
    relationships among targets in a larger
    investigation.
  • The goal is to get a high-level picture that will
    guide the following in-depth inquiry.
  • already existing problems of scale in digital
    forensic tools are further multiplied by the
    number of targets, which explains the fact that
    in other forensic areas comparison with other
    cases is routine and massive, whereas in digital
    forensics it is the exception.
  • An example is object versioning detection need
    to detect a particular version of an object and
    not the target object

17
Introduction
  • need to address is a way to store a set of hashes
    representing the different components of a
    composite object as opposed to a single hash.
  • For example, hashing the individual routines of
    libraries or executables would enable
    fine-grained detection of changes (e.g. only a
    fraction of the code changes from version to
    version).
  • The problem is that storing more hashes presents
    a scalability problem even for targets of modest
    sizes.
  • Therefore, authors propose the use of Bloom
    filters as an efficient way to store and query
    large sets of hashes.

18
Bloom Filters
  • A Bloom filter B is a representation of a set S
    s1,., sn of n elements from a universe (of
    possible values) U. The filter consists of an
    array of m bits, initially all set to 0.
  • the ratio r m/n is a key design element and is
    usually fixed for a particular application.
  • To represent the set elements, the filter uses k
    independent hash functions h1, ., hk, with a
    range 0, ., m 1. All hash functions are assumed
    to be independent and to map elements from U
    uniformly over the range of the function.
  • Md5bloom Authors have a prototype
    stream-oriented Bloom filter implementation
    called md5bloom.

19
Application of Bloom Filter in Security
  • Spafford (1992) was one of the first person to
    use Bloom filters to support computer security.
  • The OPUS system by Spafford uses a Bloom filter
    which efficiently encodes a wordlist containing
    poor password choices to help users choose strong
    passwords.
  • Bellovin and Cheswick present a scheme for
    selectively sharing data while maintaining
    privacy. Through the use of encrypted Bloom
    filters, they allow parties to perform searches
    against each others document sets without
    revealing the specific details of the queries.
    The system supports query restrictions to limit
    the set of allowed queries.
  • Aguilera et al. discuss the use of Bloom filters
    to enhance security in a network-attached disks
    (NADs) infrastructure.
  • The authors use bloom filtering to detect hash
    tampering

20
Directions
  • Cryptography is a kept application for detecting
    evidence tampering
  • Bloom filters are one application for tampering
    the hash
  • Need to compare different cryptographic
    algorithms
  • Relations to correlation needs to be determined

21
Abstract of Paper 3
  • Homologous files share identical sets of bits in
    the same order. Because such files are not
    completely identical, traditional techniques such
    as cryptographic hashing cannot be used to
    identify them. This paper introduces a new
    technique for constructing hash signatures by
    combining a number of traditional hashes whose
    boundaries are determined by the context of the
    input. These signatures can be used to identify
    modified versions of known files even if data has
    been inserted, modified, or deleted in the new
    files. The description of this method is followed
    by a brief analysis of its performance and some
    sample applications to computer forensics.

22
Outline
  • Introduction
  • Piece-wise hashing
  • Spamsum algorithms
  • Directions

23
Introduction
  • This paper describes a method for using a context
    triggered rolling hash in combination with a
    traditional hashing algorithm to identify known
    files that have had data inserted, modified, or
    deleted.
  • First, they examine how cryptographic hashes are
    currently used by forensic examiners to identify
    known files and what weaknesses exist with such
    hashes.
  • Next, the concept of piecewise hashing is
    introduced.
  • Finally a rolling hash algorithm that produces a
    pseudo-random output based only on the current
    context of an input is described.
  • By using the rolling hash to set the boundaries
    for the traditional piecewise hashes, authors
    create a Context Triggered Piecewise Hash (CTPH).

24
Piece wise hashing
  • arbitrary hashing algorithm to create many
    checksums for a file instead of just one. Rather
    than to generate a single hash for the entire
    file, a hash is generated for many discrete
    fixed-size segments of the file. For example, one
    hash is generated for the first 512 bytes of
    input, another hash for the next 512 bytes, and
    so on.
  • A rolling hash algorithm produces a pseudo-random
    value based only on the current context of the
    input. The rolling hash works by maintaining a
    state based solely on the last few bytes from the
    input. Each byte is added to the state as it is
    processed and removed from the state after a set
    number of other bytes have been processed.

25
Spamsum
  • Email spam detection written by Dr. Andrew
    Tridgell is Spamsum can identify emails that are
    similar but not identical to samples of known
    spam. The spamsum algorithm was in turn based
    upon the rsync checksum also by Dr. Tridgell.
  • The spamsum algorithm uses FNV hashes for the
    traditional hashes which produce a 32-bit output
    for any input. In spamsum, Dr. Tridgell further
    reduced the FNV hash by recording only a base64
    encoding of the six least significant bits (LS6B)
    of each hash value
  • The algorithm for the rolling hash was inspired
    by the Alder32
  • checksum.

26
Directions
  • Many applications in alerted document matching
    and partial file matching
  • Improvement to hash algorithms
  • Performance studies

27
Abstract of Paper 4
  • Establishing the time at which a particular event
    happened is a fundamental concern when relating
    cause and effect in any forensic investigation.
    Reliance on computer generated timestamps for
    correlating events is complicated by uncertainty
    as to clock skew and drift, environmental factors
    such as location and local time zone offsets, as
    well as human factors such as clock tampering.
    Establishing that a particular computers
    temporal behavior was consistent during its
    operation remains a challenge. The contributions
    of this paper are both a description of
    assumptions commonly made regarding the behavior
    of clocks in computers, and empirical results
    demonstrating that real world behavior diverges
    from the idealized or assumed behavior. Authors
    present an approach for inferring the temporal
    behavior of a particular computer over a range of
    time by correlating commonly available local
    machine timestamps with another source of
    timestamps. We show that a general
    characterization of the passage of time may be
    inferred from an analysis of commonly available
    browser records.

28
Outline
  • Introduction
  • Factors to consider
  • Drifting clocks
  • Identifying computer timescales by correlation
    with corroborating sources
  • Directions

29
Introduction
  • Timestamps are increasingly used to relate events
    which happen in the digital realm to each other
    and to events which happen in the physical realm,
    helping to establish cause and effect.
  • Difficulty with timestamps is how to interpret
    and relate the timestamps generated by separate
    computer clocks when they are not synchronized
  • Current approaches to inferring the real world
    interpretation of timestamps assume idealized
    models of computer clock
  • Uncertainty with the behavior of suspects clock
    computer before seizure.
  • Authors explore two themes related to this
    uncertainty.
  • investigate whether it is reasonable to assume
    uniform behavior of computer clocks over time,
    and test these assumptions by attempting to
    characterize how computer clocks behave in the
    wild.
  • investigate the feasibility of automatically
    identifying the local time on a computer by
    correlating timestamps embedded in digital
    evidence with corroborative time sources.

30
Factors
  • Computer timekeeping
  • Real-time synchronization
  • Factors affecting timekeeping accuracy
  • Clock configuration
  • Tampering
  • Synchronization protocol
  • Misinterpretation
  • Usage of timestamps in forensics

31
Drifting clocks behavior
  • Enumerate the main factors influencing the
    temporal behavior of the clock of a computer, and
    then attempt to experimentally validate whether
    one can make informed assumptions about such
    behavior.
  • Authors do this by empirically studying the
    temporal behavior of a network of computers found
    in the wild.
  • The subject of case study is a network of
    machines in active use by a small business. The
    network consists of a Windows 2000 domain,
    consisting of one Windows 2000 server, and mixed
    number of Windows XP and 2000 workstations.
  • The goal here is to observe the temporal
    behavior. In order to observe this behavior,
    authors have constructed a simple service that
    logs both the system time of a host computer and
    the civil time for the location.
  • The program samples both sources of time and logs
    the results to a file. The logging program was
    deployed on all workstations and the server

32
Correlation
  • Automated approach which correlates time stamped
    events found
  • on a suspect computer with time stamped events
    from a more reliable, corroborating source.
  • Web browser records are increasingly employed as
    evidence in investigations, and are a rich source
    of time stamped data.
  • Techniques implemented are Click stream
    correlation algorithm and Non-cached correlation
    algorithm
  • Authors compare the results of both algorithms

33
Directions
  • Need to determine whether the conditions and the
    assumptions of the experiments are realistic
  • What are the most appropriate correlation
    algorithms?
  • Need to integrate with clock synchronization
    algorithms
Write a Comment
User Comments (0)
About PowerShow.com