Title: Dyninst Object Serialization/Deserialization II
1Dyninst Object Serialization/Deserialization II
- Binary for performance, XML for interoperability
2Review Why XML Serialization?
- Create standardized representations for
- Basic symbol table information
- Abstract program objects
- Functions, loops, blocks.
- More complex binary analyses
- CFG, Data Slicing, etc
- Exports Dyninsts expertise for easy use by
- Other tools
- Interfacing the textual world
- Parse-able snapshots of programs
- Cross-platform aggregation of results
- Allows Dyninst to use output from other tools in
its own analyses - Other tools may perform different and/or richer
analysis that would be valuable for Dyninst
3Review Why Binary Serialization?
- Large Binaries
- Weve had reports of existing Dyninst analyses
taking a prohibitively long time for large
binaries (100s of MB) - Eg. Full CFG analysis of large statically linked
scientific simulations - More complex analyses are in the works
- Dyninst continues to offer newer and more
expensive-to-compute features - Control Flow Graphs
- Data Slicing
- Stripped binary analysis
- Complex tools that use these analyses may find
them cost-prohibitive - If they have to be re-performed every time the
tool is run - Why not just save them?
4Serialization policy
- Binary serialization should be transparent
- User-controlled on/off switch Env. Var.
- Granularity
- One binary cache file per library / executable
- Per logical sub-library of Dyninst
- Checksum-based cache invalidation
- Rebuild cache for a given binary when the binary
changes - Example libc is large and expensive to fully
analyze, but it seldom changes - Two phase strategy for by-default vs. on-demand
- (1) Bulk serialization of Dyninsts
existing-by-default internal state - Straightforward structured I/O
- (2) Incremental serialization of Dyninsts
existing-on-demand internal state - Somewhat trickier
- No specific orderings allowed
5Phase 1 Bulk Structured I/O
- Multiple types of serialization can share the
same infrastructure - Binary and XML
- Leverage c and the Dyninst class hierarchy
- Keep serialization/deserialization process as
extensible as possible - Add new types of output down the road?
- Desired behavior
- serialize(filename, HierarchyRootNode,
Translator) - Serialize hierarchy into ltfilenamegt
- Traverse hierarchy in a (somewhat) generic manner
- Translator uses overloaded virtual translation
functions that can be specialized as needed
6Phase 1 Example Using SymtabAPI
func1
func2
funcN
var1
7Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt
- open (f.xml)
- Start_symtab(f)
func1
func2
Serialize( symtab, toXML, f.xml )
funcN
- Open File
- Write XML preamble
var1
8Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt ltnamegt nm lt/namegt ltisAOutgt y
lt/isAOutgt
- open (f.xml)
- Start_symtab(f)
- Out_val(fname)
- Out_val(is_a_out)
func1
func2
Serialize( symtab, toXML, f.xml )
funcN
- Write-out object fields (scalar)
- Translator can output all relevant types
var1
9Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt ltnamegt nm lt/namegt ltisAOutgt y
lt/isAOutgt ltSymbolListgt ltnsymsgt N1
lt/nsymsgt ltSymbolgt ltnamegt f1 lt/namegt
lt/Symbolgt ltSymbolgt ltnamegt v1 lt/namegt
lt/Symbolgt lt/SymbolListgt
- open (f.xml)
- Start_symtab(f)
- Out_val(fname)
- Out_val(is_a_out)
- Out_vector(syms)
- Foreach (syms)
- out_val(sym)
func1
func2
Serialize( symtab, toXML, f.xml )
funcN
- Write-out object fields (vector)
- Helper functions take care of container classes
var1
10Phase 1 Example Using SymtabAPI
Translator toXML
f.xml
ltSymtabgt ltnamegt nm lt/namegt ltisAOutgt y
lt/isAOutgt ltSymbolListgt ltnsymsgt N1
lt/nsymsgt ltSymbolgt ltnamegt f1 lt/namegt
lt/Symbolgt ltSymbolgt ltnamegt v1 lt/namegt
lt/Symbolgt lt/SymbolListgt lt/Symtabgt
- open (f.xml)
- Start_symtab(f)
- Out_val(fname)
- Out_val(is_a_out)
- Out_vector(syms)
- Foreach (syms) ------out_val(sym)
func1
func2
Serialize( symtab, toXML, f.xml )
funcN
var1
11Phase 1 Example With Binary Output
Translator toXML
Translator toBin
- open (f.xml)
- Start_symtab(f)
- Out_val(fname)
- Out_val(is_a_out)
- Out_vector(syms)
- open (f.bin)
- Start_symtab(f)
- Out_val(fname)
- Out_val(is_a_out)
- Out_vector(syms)
- Foreach (syms) ------out_val(sym)
- Foreach (syms) ------out_val(sym)
Translator sequence is identical (at the highest
structural level)
12Phase 1 Example With Binary Output
TranslatorBase
Virtual out_val(name)
Translator toXML
Translator toBin
0x18 size 0xa3 data 0x11 0x37 . .
- open (f.xml)
- Start_symtab(f)
- Out_val(fname)
- open (f.bin)
- Start_symtab(f)
- Out_val(fname)
Lowest level data type outputs are specialized
per output format
ltnamegt nameValue lt/namegt
Higher level outputs are generalized by default,
specialized as needed
13Speedup from Bulk Structured I/O
- Not exactly a real world problem
- Wanted to verify scaling characteristics under a
controlled situation - Computer-generated programs
- with identical characteristics
- except symbols
- Expect greater time savings with more complex
analyses
14Phase 2 On-Demand Analyses
- Dyninst generates much of its internal state
on-demand of API user - Phase 1 serialization better suited to a known,
fixed set of internal state - existing by-default
- Still useful, but needs augmentation
- Structural solution to on-demand data
- Ideally want an automatic solution
- Do an analysis, then
- Serialization should happen transparently
- Along comes Annotations
- One Idea representing optional data
- Perfect fit for the representation of on-demand
analyses
15Annotations Overview
- Basic Concepts
- Create a runtime relationship between objects
- belonging to ,has a, attribute of
- Annotations are typed and can be referenced by
name - Annotate object A with B, as a foo
- Get all foo from A
- Scaling Considerations
- Must work at different extremes of scale
- Eg. Annotate Instruction with Register
- high density small data applied to many
objects - Eg. Annotate Function with Slicing
- low density large data applied to fewer
objects - Careful consideration of space/time requirements
- Zero space requirement if no annotations
- Fast, easy access to annotations
16Annotations Details
- Public Interface
- Annotateltclass T, class Sgt(T , S, name_t)
- Annotate a T with an S, using ltnamegt
- VectorltSgt getAnnotations(T , name_t)
- Get all Ss attributed to T under ltnamegt
- Implementation
- Template class wrappers provide type-safety
- Use one large static map
- Map Annotatee lt--gt Map AnnotationID lt--gt
vectorltAnnotationgt - Auxilliary mapping for AnnotationNamelt-gtID
17Serializing Annotations
- Basic Parameters
- Not all Annotations will be serialized
- Does not make sense for all cases
- Use boolean template parameter to control
Annotation serialization policy - Eg. class A public AnnotatableltB, truegt
- Serialization is structural
- Performed when annotation is added
- Serialization parameters for annotation
- Just enough information to reconstruct
- Annotatee ID
- this Pointer suffices
- Annotation Name
- Annotation Type is determined by Name
18Example Serialize Line Information
class Module public
AnnotatableltLineInformation,
line_info, truegt
Line Information
- Part of SymtabAPI
- Belongs to class Module
- Exists only on-demand
19Example Serialize Line Information
class Module public
AnnotatableltLineInformation,
line_info, truegt
addAnnotation(LineInfo )
- Marks entry in static annotation map
20Example Serialize Line Information
Translator toBin
f.bin
class Module public
ltAnnotationgt ltAnnoTypegt an_type lt/AnnoTypegt
ltAnnotatee IDgt par_id lt/Annotatee IDgt
- append (f.bin)
- Start_annotation(f)
- Out_val(an_type)
- Out_val(par_id)
AnnotatableltLineInformation,
line_info, truegt
anno-gtserialize(LineInfo )
- First output Annotation Information
- Just enough for full reconstruction
- Annotation Type
- ID of Parent
21Example Serialize Line Information
Translator toBin
f.bin
class Module public
ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
- append (f.bin)
- Start_annotation(f)
- Out_val(an_type)
- Out_val(par_id)
- Out (line_info)
AnnotatableltLineInformation,
line_info, truegt
- Foreach (tuple)
- out (tuple)
anno-gtserialize(LineInfo )
- Finally Translate LineInformation
- Using ordinary hierarchical I/O translation
routine
22Deserializing Annotations
- Basic Parameters
- Need to construct new object given
- Annotatee ID
- Build a working map between serialized Annotatee
IDs and rebuilt Annotatable Objects - Annotation Type
- Maintain static map between Annotation Type and
deserialization function - Deserialization sequence
- Read Annotation Type
- Read Annotatee ID
- Lookup/call constructor for Annotation Type
- Deserialize Annotation Object
- Lookup Annotatee and re-annotate
23Example Deserialize Line Information
Annotation fromBin
f.bin
- Read annotation type
- Read annotatee ID
ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
- Read Annotations meta information
- Annotatee ID used to determine parent object
24Example Deserialize Line Information
Annotation fromBin
f.bin
- Read annotation type
- Read annotatee ID
- Look up function to deserialize type F
- F()
ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
F LineInfo fromBin
class LineInfo vectorlttuplegt
- Construct LineInfo
- Read num_entries
- Foreach (entry)
- read (tuple)
- Use Annotation Type to determine function to
- call for deserialization
- Requires static deserialization function map
- Use Function to recreate Line Info
25Example Deserialize Line Information
Annotation fromBin
f.bin
- Read annotation type
- Read annotatee ID
- Look up function to deserialize type F
- F()
ltAnnotationgt ltAnnoTypegt an_type
lt/AnnoTypegt ltAnnotatee IDgt par_id
lt/Annotatee IDgt ltLineInformationgt
ltnum_entriesgt num lt/num_entriesgt
ltTuplegt ltfilegt f1 lt/filegt ltlinegt ln
lt/linegt ltoffsetgt off lt/offsetgt
lt/Tuplegt ltTuplegt lt/Tuplegt
lt/LineInformationgt lt/Annotationgt
F LineInfo fromBin
- Construct LineInfo
- Read num_entries
- Foreach (entry)
- read (tuple)
- Annotate (Module, LineInformation)
- Use Annotatee ID to determine parent module
- Re-annotate module with LineInformation
26Recap
- Upcoming serialization / deserialization features
will - Improve tool performance, esp. for
- Large binaries
- Repeated expensive analyses
- Allow for easier interoperability with other
tools via an XML interface - XML spec resembles dyninst class hierarchy
- Annotations provide a great abstraction for
on-demand data - Internal use allows for generic
serialize-on-the-fly - Hmm any use for on-demand XML serialization?
- External API provides users a way to attach
arbitrary information to Dyninst class instances - Other uses still pending
- Still flexible until other uses are resolved
27Questions?