Dyninst Object Serialization/Deserialization II - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Dyninst Object Serialization/Deserialization II

Description:

serialize(filename, HierarchyRootNode, Translator); Serialize hierarchy into filename ... Translator can output all relevant types. University of Maryland ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 28
Provided by: tik7
Category:

less

Transcript and Presenter's Notes

Title: Dyninst Object Serialization/Deserialization II


1
Dyninst Object Serialization/Deserialization II
  • Binary for performance, XML for interoperability

2
Review Why XML Serialization?
  • Create standardized representations for
  • Basic symbol table information
  • Abstract program objects
  • Functions, loops, blocks.
  • More complex binary analyses
  • CFG, Data Slicing, etc
  • Exports Dyninsts expertise for easy use by
  • Other tools
  • Interfacing the textual world
  • Parse-able snapshots of programs
  • Cross-platform aggregation of results
  • Allows Dyninst to use output from other tools in
    its own analyses
  • Other tools may perform different and/or richer
    analysis that would be valuable for Dyninst

3
Review Why Binary Serialization?
  • Large Binaries
  • Weve had reports of existing Dyninst analyses
    taking a prohibitively long time for large
    binaries (100s of MB)
  • Eg. Full CFG analysis of large statically linked
    scientific simulations
  • More complex analyses are in the works
  • Dyninst continues to offer newer and more
    expensive-to-compute features
  • Control Flow Graphs
  • Data Slicing
  • Stripped binary analysis
  • Complex tools that use these analyses may find
    them cost-prohibitive
  • If they have to be re-performed every time the
    tool is run
  • Why not just save them?

4
Serialization policy
  • Binary serialization should be transparent
  • User-controlled on/off switch Env. Var.
  • Granularity
  • One binary cache file per library / executable
  • Per logical sub-library of Dyninst
  • Checksum-based cache invalidation
  • Rebuild cache for a given binary when the binary
    changes
  • Example libc is large and expensive to fully
    analyze, but it seldom changes
  • Two phase strategy for by-default vs. on-demand
  • (1) Bulk serialization of Dyninsts
    existing-by-default internal state
  • Straightforward structured I/O
  • (2) Incremental serialization of Dyninsts
    existing-on-demand internal state
  • Somewhat trickier
  • No specific orderings allowed

5
Phase 1 Bulk Structured I/O
  • Multiple types of serialization can share the
    same infrastructure
  • Binary and XML
  • Leverage c and the Dyninst class hierarchy
  • Keep serialization/deserialization process as
    extensible as possible
  • Add new types of output down the road?
  • Desired behavior
  • serialize(filename, HierarchyRootNode,
    Translator)
  • Serialize hierarchy into ltfilenamegt
  • Traverse hierarchy in a (somewhat) generic manner
  • Translator uses overloaded virtual translation
    functions that can be specialized as needed

6
Phase 1 Example Using SymtabAPI
func1
func2
funcN
var1
7
Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt
  • open (f.xml)
  • Start_symtab(f)

func1
func2
Serialize( symtab, toXML, f.xml )
funcN
  • Open File
  • Write XML preamble

var1
8
Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt ltnamegt nm lt/namegt ltisAOutgt y
lt/isAOutgt
  • open (f.xml)
  • Start_symtab(f)
  • Out_val(fname)
  • Out_val(is_a_out)

func1
func2
Serialize( symtab, toXML, f.xml )
funcN
  • Write-out object fields (scalar)
  • Translator can output all relevant types

var1
9
Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt ltnamegt nm lt/namegt ltisAOutgt y
lt/isAOutgt ltSymbolListgt ltnsymsgt N1
lt/nsymsgt ltSymbolgt ltnamegt f1 lt/namegt
lt/Symbolgt ltSymbolgt ltnamegt v1 lt/namegt
lt/Symbolgt lt/SymbolListgt
  • open (f.xml)
  • Start_symtab(f)
  • Out_val(fname)
  • Out_val(is_a_out)
  • Out_vector(syms)
  • Foreach (syms)
  • out_val(sym)

func1
func2
Serialize( symtab, toXML, f.xml )
funcN
  • Write-out object fields (vector)
  • Helper functions take care of container classes

var1
10
Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt ltnamegt nm lt/namegt ltisAOutgt y
lt/isAOutgt ltSymbolListgt ltnsymsgt N1
lt/nsymsgt ltSymbolgt ltnamegt f1 lt/namegt
lt/Symbolgt ltSymbolgt ltnamegt v1 lt/namegt
lt/Symbolgt lt/SymbolListgt lt/Symtabgt
  • open (f.xml)
  • Start_symtab(f)
  • Out_val(fname)
  • Out_val(is_a_out)
  • Out_vector(syms)
  • Foreach (syms) ------out_val(sym)
  • End_symtab(f)
  • Close(f)

func1
func2
Serialize( symtab, toXML, f.xml )
funcN
  • Finish up, close file

var1
11
Phase 1 Example With Binary Output
Translator toXML
Translator toBin
  • open (f.xml)
  • Start_symtab(f)
  • Out_val(fname)
  • Out_val(is_a_out)
  • Out_vector(syms)
  • open (f.bin)
  • Start_symtab(f)
  • Out_val(fname)
  • Out_val(is_a_out)
  • Out_vector(syms)
  • Foreach (syms) ------out_val(sym)
  • Foreach (syms) ------out_val(sym)
  • End_symtab(f)
  • Close(f)
  • End_symtab(f)
  • Close(f)

Translator sequence is identical (at the highest
structural level)
12
Phase 1 Example With Binary Output
TranslatorBase
Virtual out_val(name)
Translator toXML
Translator toBin
0x18 size 0xa3 data 0x11 0x37 . .
  • open (f.xml)
  • Start_symtab(f)
  • Out_val(fname)
  • open (f.bin)
  • Start_symtab(f)
  • Out_val(fname)

Lowest level data type outputs are specialized
per output format
ltnamegt nameValue lt/namegt
Higher level outputs are generalized by default,
specialized as needed
13
Speedup from Bulk Structured I/O
  • Results for symtabAPI
  • Not exactly a real world problem
  • Wanted to verify scaling characteristics under a
    controlled situation
  • Computer-generated programs
  • with identical characteristics
  • except symbols
  • Expect greater time savings with more complex
    analyses

14
Phase 2 On-Demand Analyses
  • Dyninst generates much of its internal state
    on-demand of API user
  • Phase 1 serialization better suited to a known,
    fixed set of internal state
  • existing by-default
  • Still useful, but needs augmentation
  • Structural solution to on-demand data
  • Ideally want an automatic solution
  • Do an analysis, then
  • Serialization should happen transparently
  • Along comes Annotations
  • One Idea representing optional data
  • Perfect fit for the representation of on-demand
    analyses

15
Annotations Overview
  • Basic Concepts
  • Create a runtime relationship between objects
  • belonging to ,has a, attribute of
  • Annotations are typed and can be referenced by
    name
  • Annotate object A with B, as a foo
  • Get all foo from A
  • Scaling Considerations
  • Must work at different extremes of scale
  • Eg. Annotate Instruction with Register
  • high density small data applied to many
    objects
  • Eg. Annotate Function with Slicing
  • low density large data applied to fewer
    objects
  • Careful consideration of space/time requirements
  • Zero space requirement if no annotations
  • Fast, easy access to annotations

16
Annotations Details
  • Public Interface
  • Annotateltclass T, class Sgt(T , S, name_t)
  • Annotate a T with an S, using ltnamegt
  • VectorltSgt getAnnotations(T , name_t)
  • Get all Ss attributed to T under ltnamegt
  • Implementation
  • Template class wrappers provide type-safety
  • Use one large static map
  • Map Annotatee lt--gt Map AnnotationID lt--gt
    vectorltAnnotationgt
  • Auxilliary mapping for AnnotationNamelt-gtID

17
Serializing Annotations
  • Basic Parameters
  • Not all Annotations will be serialized
  • Does not make sense for all cases
  • Use boolean template parameter to control
    Annotation serialization policy
  • Eg. class A public AnnotatableltB, truegt
  • Serialization is structural
  • Performed when annotation is added
  • Serialization parameters for annotation
  • Just enough information to reconstruct
  • Annotatee ID
  • this Pointer suffices
  • Annotation Name
  • Annotation Type is determined by Name

18
Example Serialize Line Information
class Module public
AnnotatableltLineInformation,
line_info, truegt
Line Information
  • Part of SymtabAPI
  • Belongs to class Module
  • Exists only on-demand

19
Example Serialize Line Information
class Module public
AnnotatableltLineInformation,
line_info, truegt
addAnnotation(LineInfo )
  • Marks entry in static annotation map

20
Example Serialize Line Information
Translator toBin
f.bin
class Module public
ltAnnotationgt ltAnnoTypegt an_type lt/AnnoTypegt
ltAnnotatee IDgt par_id lt/Annotatee IDgt
  • append (f.bin)
  • Start_annotation(f)
  • Out_val(an_type)
  • Out_val(par_id)

AnnotatableltLineInformation,
line_info, truegt
anno-gtserialize(LineInfo )
  • First output Annotation Information
  • Just enough for full reconstruction
  • Annotation Type
  • ID of Parent

21
Example Serialize Line Information
Translator toBin
f.bin
class Module public
ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
  • append (f.bin)
  • Start_annotation(f)
  • Out_val(an_type)
  • Out_val(par_id)
  • Out (line_info)

AnnotatableltLineInformation,
line_info, truegt
  • Foreach (tuple)
  • out (tuple)

anno-gtserialize(LineInfo )
  • Finally Translate LineInformation
  • Using ordinary hierarchical I/O translation
    routine

22
Deserializing Annotations
  • Basic Parameters
  • Need to construct new object given
  • Annotatee ID
  • Build a working map between serialized Annotatee
    IDs and rebuilt Annotatable Objects
  • Annotation Type
  • Maintain static map between Annotation Type and
    deserialization function
  • Deserialization sequence
  • Read Annotation Type
  • Read Annotatee ID
  • Lookup/call constructor for Annotation Type
  • Deserialize Annotation Object
  • Lookup Annotatee and re-annotate

23
Example Deserialize Line Information
Annotation fromBin
f.bin
  • Read annotation type
  • Read annotatee ID

ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
  • Read Annotations meta information
  • Annotatee ID used to determine parent object

24
Example Deserialize Line Information
Annotation fromBin
f.bin
  • Read annotation type
  • Read annotatee ID
  • Look up function to deserialize type F
  • F()

ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
F LineInfo fromBin
class LineInfo vectorlttuplegt
  • Construct LineInfo
  • Read num_entries
  • Foreach (entry)
  • read (tuple)
  • Use Annotation Type to determine function to
  • call for deserialization
  • Requires static deserialization function map
  • Use Function to recreate Line Info

25
Example Deserialize Line Information
Annotation fromBin
f.bin
  • Read annotation type
  • Read annotatee ID
  • Look up function to deserialize type F
  • F()

ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
F LineInfo fromBin
  • Construct LineInfo
  • Read num_entries
  • Foreach (entry)
  • read (tuple)
  • Annotate (Module, LineInformation)
  • Use Annotatee ID to determine parent module
  • Re-annotate module with LineInformation

26
Recap
  • Upcoming serialization / deserialization features
    will
  • Improve tool performance, esp. for
  • Large binaries
  • Repeated expensive analyses
  • Allow for easier interoperability with other
    tools via an XML interface
  • XML spec resembles dyninst class hierarchy
  • Annotations provide a great abstraction for
    on-demand data
  • Internal use allows for generic
    serialize-on-the-fly
  • Hmm any use for on-demand XML serialization?
  • External API provides users a way to attach
    arbitrary information to Dyninst class instances
  • Other uses still pending
  • Still flexible until other uses are resolved

27
Questions?
Write a Comment
User Comments (0)
About PowerShow.com