Title: Developing high performance applications with .NET Compact Framework
1Developing high performance applications with
.NET Compact Framework
- Deepak Gulati
- ISV Developer Evangelist
- Microsoft
2OEM/IHV Supplied
BSP(ARM, SH4, MIPS)
OEM Hardware and Standard Drivers
Standard PC Hardware and Drivers
Hardware/Drivers
Windows XP DDK
Device Building Tools
Windows Embedded Studio
Platform Builder
Lightweight Relational
SQL Server 2005 Express Edition
EDB
Data
SQL Server 2005
SQL Server 2005 Mobile Edition
Win32
Native Managed Server Side
Programming Model
MFC 8.0, ATL 8.0
.NET Compact Framework
.NET Framework
ASP.NET Mobile Controls
ASP.NET
Windows Media
Multimedia
DirectX
Location Services
MapPoint
Development Tools
Visual Studio 2005
Internet Security and Acceleration Server
Communications Messaging
Exchange Server
Live Communications Server
Speech Server
Device Update Agent
ManagementTools
Software Update Services
Image Update
Systems Management Server
Microsoft Operations Manager
3Measuring PerformanceOverview
- Basic technique involves
- Find start time
- Find end time
- Calculate delta
4Measuring PerformanceOverview
- Start and End times can be measured in various
ways - GetTickCount, a Win32 API function
- Environment.TickCount is its managed code
equivalent - Both return int that represents time in ms that
has passed since the device was booted - Can also use System.DateTime and get
System.TimeSpan by subtracting Start and End
values
5Measuring PerformanceOverview
- There can be issues with these techniques
- For a device that has been on for a long time,
TickCount clips and goes negative - Not great for measuring short operations, there
can be a variation of upto 500 ms - System.Date also suffers from accuracy issues
6Measuring PerformanceOverview
- QueryPerformanceCounter/QueryPerformanceFrequency
to the rescue! - High resolution timer OEM specific
implementation - Defaults to GetTickCount if not available
7Measuring PerformanceOverview
- No managed implementation available for
QueryPerformanceCounter or Frequency - PInvoke QueryPerformanceFrequency and get the
clock frequency of the device/sec. Divide by 1000
to get the clock frequency/ms - PInvoke QueryPerformanceCounter before your call.
Make your call. PInvoke QueryPerformanceCounter
again - End Start / frequency/ms will give you time for
your call in ms
8Demo
- Using QueryPerformanceCounter
9Measuring PerformanceOverview
- Micro-benchmarks versus Scenarios
- Benchmarking tips
- Start from known state
- Ensure nothing else is running
- Measure multiple times, take average
- Run each test in own AppDomain / Process
- Log results at the end
- Understand JIT-time versus runtime cost
10.NET Compact Framework .NET Compact Framework
Performance v1-v2
Biggeris better
Smalleris better
11Measuring PerformancePerformance Counters
- There will be times when an application runs slow
and the code looks fine - .NET CompactFramework can be made to report
performance statistics - .stat (formerly mscoree.stat)
- http//msdn.microsoft.com/library/en-us/dnnetcomp/
html/netcfperf.asp - Registry
- HKLM\SOFTWARE\Microsoft\.NETCompactFramework\PerfM
onitorCounters (DWORD) 1 - What does .stat tell you?
- Working set and performance statistics
- More counters added in v2
- Generics usage
- COM interop usage
- Number of boxed valuetypes
- Threading and timers
- GUI objects
- Network activity (socket bytes send/received)
12Demo
- Enabling .NET Compact Framework Performance
Statistics
13.stat
counter
total last datum n mean
min max Total Program Run Time (ms)
55937 - -
- - - App Domains Created
18 -
- - - - App
Domains Unloaded
18 - - -
- - Assemblies Loaded
323 - -
- - - Classes Loaded
18852 -
- - -
- Methods Loaded
37353 - - -
- - Closed Types Loaded
730 - -
- - - Closed Types
Loaded per Definition 730
8 385 1 1
8 Open Types Loaded
78 - - -
- - Closed Methods Loaded
46 - -
- - - Closed Methods
Loaded per Definition 46
1 40 1 1
2 Open Methods Loaded
0 - - -
- - Threads in Thread Pool
- 0 6
1 0 3 Pending Timers
-
0 93 0 0
1 Scheduled Timers
46 - - -
- - Timers Delayed by Thread Pool
Limit 0 -
- - - - Work Items
Queued 46
- - - -
- Uncontested Monitor.Enter Calls
57240 - - -
- - Contested Monitor.Enter Calls
0 -
- - - - Peak Bytes
Allocated (native managed) 4024363
- - - -
- Managed Objects Allocated
1015100 - - -
- - Managed Bytes Allocated
37291444 28
1015100 36 8 55588 Managed
String Objects Allocated 112108
- - - -
- Bytes of String Objects Allocated
4596658 - - -
- - Garbage Collections (GC)
33 -
- - - - Bytes
Collected By GC 25573036
41592 33 774940 41592
1096328 Managed Bytes In Use After GC
- 23528 33
259414 23176 924612 Total Bytes In Use
After GC - 3091342
33 2954574 1833928 3988607 GC
Compactions
17 - - -
- - Code Pitchings
6 - -
- - - Calls to GC.Collect
0 -
- - - - GC
Latency Time (ms)
279 16 33 8
0 31 Pinned Objects
156 - -
- - - Objects Moved by
Compactor 73760
- - - -
- Objects Not Moved by Compactor
11811 - - -
- - Objects Finalized
6383 - -
- - - Boxed Value Types
350829
- - - -
- Process Heap
- 1626 430814 511970
952 962130 Short Term Heap
- 0 178228
718 0 21532 JIT Heap
-
0 88135 357796 0
651663 App Domain Heap
- 0 741720 647240
0 833370 GC Heap
- 0
376 855105 0 2097152 Native Bytes
Jitted 7202214
152 26910 267 80
5448 Methods Jitted
26910 - - -
- - Bytes Pitched
1673873 0
7047 237 0 5448
Peak Bytes Allocated (native managed)
JIT Heap
App Domain Heap
GC Heap
Garbage Collections (GC)
GC Latency Time (ms)
Boxed Value Types
Managed String Objects Allocated
14.NET Compact FrameworkHow we are different?
- Portable JIT Compiler
- Fast code generation, less optimized
- May pitch JIT-compiled code under memory pressure
- No NGen, install time or persisted code
- Interpreted virtual calls (no v-tables)
- Simple mark and sweep GC, non generational
15Common Language RuntimeExecution Engine
- Call path
- Managed calls are more expensive than native
- Instance call 2-3X the cost of a native
function call - Virtual call 1.4X the cost of a managed
instance call - Platform invoke 5X the cost of managed instance
call (Marshal int parameter) - Properties are calls
- JIT compilers
- All platforms has the same optimizing JIT
compiler architecture in v2 - Optimizations
- Method inlining for simple methods
- Variable enregistration
16Common Language Runtime Call path (sample)
- public class Shape
-
- protected int m_volume
- public virtual int Volume
-
- get return m_volume
-
-
- public class CubeShape
-
- public MyType(int vol)
-
- m_volume vol
-
public class Shape protected int m_volume
public int Volume get return m_volume
public class CubeShape public
MyType(int vol) m_volume vol
17Common Language Runtime Call path (sample)
- public class MyCollection
-
- private const int m_capacity 10000
- private Shape storage new
Shapem_capacity -
- public void Sort()
-
- Shape tmp
- for (int i0 i
- for (int j0 j
- if (storagej1.Volume storagej.Volume)
- tmp storagej
- storagej storagej1
- storagej1 tmp
-
-
-
-
callvirt instance int32 Shapeget_Volume()
18Common Language Runtime Call path (sample)
- public class Shape
-
- protected int m_volume
- public virtual int Volume
-
- get return m_volume
-
-
- public class CubeShape
-
- public MyType(int vol)
-
- m_volume vol
-
public class Shape protected int m_volume
public int Volume get return m_volume
public class CubeShape public
MyType(int vol) m_volume vol
57 sec
39 sec
19Common Language RuntimeGarbage Collector
- What triggers a GC?
- Memory allocation failure
- 1M of GC objects allocated (v2)
- Application going to background
- GC.Collect() (Avoid helping the GC!)
- What happens at GC time?
- Freezes all threads at safe point
- Finds all live objects and marks them
- An object is live if it is reachable from root
location - Unmarked objects are freed and added to finalizer
queue - Finalizers are run on a separate thread
- GC pools are compacted if required (less than
750K of free space) - Return free memory to the operating system
- In general, if you dont allocate objects, GC
wont occur - Beware of side-effects of calls that may allocate
objects - http//blogs.msdn.com/stevenpr/archive/2004/07/26/
197254.aspx
20Common Language RuntimeGarbage Collector
GC Latency per collection
21Common Language RuntimeGarbage Collector
Allocation rate
22Common Language RuntimeGarbage Collector
Allocation throughput
23Common Language RuntimeWhere garbage comes from?
- Unnecessary string copies
- Strings are immutable
- String manipulations (Concat(), etc.) cause
copies - Use StringBuilder
String result "" for (int i0 i result ".NET Compact Framework"
result " Rocks!"
StringBuilder result new StringBuilder() for
(int i0 i
Compact Framework") result.Append(" Rocks!")
24.stat
Run time 173 sec
counter
total last datum n mean
min max Total Program Run Time (ms)
11843 - -
- - - App Domains Created
1 -
- - - - App
Domains Unloaded
1 - - - -
- Assemblies Loaded
2 - -
- - - Classes Loaded
175 -
- - - - Methods
Loaded 198
- - - -
- Closed Types Loaded
0 - - -
- - Closed Types Loaded per
Definition 0 0
0 0 0 0 Open
Types Loaded
0 - - - -
- Closed Methods Loaded
0 - -
- - - Closed Methods Loaded
per Definition 0 0
0 0 0 0 Open
Methods Loaded
0 - - - -
- Threads in Thread Pool
- 0 2
0 0 1 Pending Timers
- 0
2 0 0
1 Scheduled Timers
1 - - -
- - Timers Delayed by Thread Pool
Limit 0 -
- - - - Work Items
Queued 1
- - - -
- Uncontested Monitor.Enter Calls
2 - - -
- - Contested Monitor.Enter Calls
0 -
- - - - Peak Bytes
Allocated (native managed) 3326004
- - - -
- Managed Objects Allocated
60266 - - -
- - Managed Bytes Allocated
5801679432 28
60266 96267 8 580020 Managed
String Objects Allocated 20041
- - - -
- Bytes of String Objects Allocated
5800480578 - - -
- - Garbage Collections (GC)
4912 -
- - - - Bytes
Collected By GC 5918699036
1160076 4912 1204946 597824
1572512 Managed Bytes In Use After GC
- 580752 4912
381831 8364 580752 Total Bytes In Use
After GC - 1810560
4912 1611885 1097856 1810560 GC
Compactions
0 - - -
- - Code Pitchings
0 - -
- - - Calls to GC.Collect
0 -
- - - - GC
Latency Time (ms)
686 0 4912 0
0 16 Pinned Objects
0 - -
- - - Objects Moved by
Compactor 0
- - - -
- Objects Not Moved by Compactor
0 - - -
- - Objects Finalized
1 - -
- - - Boxed Value Types
3
- - - -
- Process Heap
- 278 235 2352
68 8733 Short Term Heap
- 0 278
986 0 10424 JIT Heap
-
0 360 12103 0
24444 App Domain Heap
- 0 1341 46799
0 64562 GC Heap
- 0
35524 2095727 0 3276800 Native
Bytes Jitted 22427
140 98 228 68
1367 Methods Jitted
98 - -
- - - Bytes Pitched
0 0
0 0 0 0 Methods
Pitched 0
- - - -
- Method Pitch Latency Time (ms)
0 0 0 0
0 0 Exceptions Thrown
0 -
- - - - Platform
Invoke Calls 0
- - - -
-
String result "" for (int i0 i result ".NET Compact Framework"
result " Rocks!"
Managed String Objects Allocated
20040 Garbage Collections (GC)
4912 Bytes of String Objects
Allocate 5,800,480,574 Bytes Collected
By GC 5,918,699,036 GC
latency 107128 ms
25.stat
Run time 0.1 sec
counter
total last datum n mean
min max Total Program Run Time (ms)
11843 - -
- - - App Domains Created
1 -
- - - - App
Domains Unloaded
1 - - - -
- Assemblies Loaded
2 - -
- - - Classes Loaded
175 -
- - - - Methods
Loaded 198
- - - -
- Closed Types Loaded
0 - - -
- - Closed Types Loaded per
Definition 0 0
0 0 0 0 Open
Types Loaded
0 - - - -
- Closed Methods Loaded
0 - -
- - - Closed Methods Loaded
per Definition 0 0
0 0 0 0 Open
Methods Loaded
0 - - - -
- Threads in Thread Pool
- 0 2
0 0 1 Pending Timers
- 0
2 0 0
1 Scheduled Timers
1 - - -
- - Timers Delayed by Thread Pool
Limit 0 -
- - - - Work Items
Queued 1
- - - -
- Uncontested Monitor.Enter Calls
2 - - -
- - Contested Monitor.Enter Calls
0 -
- - - - Peak Bytes
Allocated (native managed) 3326004
- - - -
- Managed Objects Allocated
60266 - - -
- - Managed Bytes Allocated
5801679432 28
60266 96267 8 580020 Managed
String Objects Allocated 20041
- - - -
- Bytes of String Objects Allocated
5800480578 - - -
- - Garbage Collections (GC)
4912 -
- - - - Bytes
Collected By GC 5918699036
1160076 4912 1204946 597824
1572512 Managed Bytes In Use After GC
- 580752 4912
381831 8364 580752 Total Bytes In Use
After GC - 1810560
4912 1611885 1097856 1810560 GC
Compactions
0 - - -
- - Code Pitchings
0 - -
- - - Calls to GC.Collect
0 -
- - - - GC
Latency Time (ms)
686 0 4912 0
0 16 Pinned Objects
0 - -
- - - Objects Moved by
Compactor 0
- - - -
- Objects Not Moved by Compactor
0 - - -
- - Objects Finalized
1 - -
- - - Boxed Value Types
3
- - - -
- Process Heap
- 278 235 2352
68 8733 Short Term Heap
- 0 278
986 0 10424 JIT Heap
-
0 360 12103 0
24444 App Domain Heap
- 0 1341 46799
0 64562 GC Heap
- 0
35524 2095727 0 3276800 Native
Bytes Jitted 22427
140 98 228 68
1367 Methods Jitted
98 - -
- - - Bytes Pitched
0 0
0 0 0 0 Methods
Pitched 0
- - - -
- Method Pitch Latency Time (ms)
0 0 0 0
0 0 Exceptions Thrown
0 -
- - - - Platform
Invoke Calls 0
- - - -
-
StringBuilder result new StringBuilder() for
(int i0 iCompact Framework") result.Append("
Rocks!")
Managed String Objects Allocated
56 Bytes of String Objects Allocated
2097718 Garbage Collections (GC)
2 Bytes Collected By
GC 1081620 GC Latency 21 ms
26Last notes on StringBuilder
- Remember it's all about reducing memory traffic
- If you roughly know the expected length of your
final string allocate that much before hand
(StringBuilder constructor) - Getting the string out of a StringBuilder doesn't
cause a new alloc, the existing buffer is
converted into a string
http//weblogs.asp.net/ricom/archive/2003/12/02/40
778.aspx
27Common Language RuntimeWhere garbage comes from?
- Unnecessary boxing
- Value types allocated on the stack
- (fast to allocate)
- Boxing causes a heap allocation and a copy
- Use strongly typed arrays and collections
- (framework collections are NOT strongly typed)
- class Hashtable
- struct bucket
- Object key
- Object val
-
- bucket buckets
- public Object thisObject key get set
-
28Demo
29Common Language RuntimeGenerics
- Fully specialized implementation in .NET Compact
Framework v2 - Pros
- Strongly typed
- No unnecessary boxing and type casts
- Specialized code is more efficient than shared
- Cons
- Internal execution engine data structures and
JIT-compiled code arent shared - List, List, List
- http//blogs.msdn.com/romanbat/archive/2005/01/06/
348114.aspx
30Common Language RuntimeFinalization and Dispose
- Cost of finalizers
- Non-deterministic cleanup
- Extends lifetime of object
- In general, rely on GC for automatic memory
cleanup - The exceptions to the rule
- If your object contains an unmanaged resource
that the GC is unaware of, you need to implement
a finalizer - Also implement Dispose pattern to release
unmanaged resource in deterministic manner - Dispose method should suppress finalization
- If the object you are using implements Dispose,
call it when you are done with the object - Assumes an unmanaged resource in the object chain
31Common Language RuntimeSample Code
Finalization and Dispose
- class SerialPort IDisposable
- IntPtr SerialPortHandle
- public SerialPort(String name)
- // Platform invoke to native code to open
serial port - SerialPortHandle SerialOpen(name)
-
- SerialPort()
- // Platform invoke to native code to close
serial port - SerialClose(SerialPortHandle)
-
- public void Dispose()
- // Platform invoke to native code to close
serial port - SerialClose(SerialPortHandle)
- GC.SuppressFinalize(this)
-
32Common Language RuntimeSample Code
Finalization and Dispose
- class SerialTrace IDisposable
- SerialPort serialPort
- public SerialTrace()
- serialPort new SerialPort()
-
- public void Dispose()
- serialPort.Dispose()
-
33Common Language RuntimeExceptions
- Exceptions are cheapuntil you throw
- Throw exceptions in exceptional circumstances
- Do not use exceptions for normal flow control
- Use performance counters to track the number of
exceptions thrown - Replace On Error/Goto with Try/Catch/Finally
in Microsoft Visual Basic .NET
34Common Language RuntimeReflection
- Reflection can be expensive
- Reflection performance cost
- Type comparisons (for example typeof() )
- Member enumerations (for example
Type.GetFields()) - Member access (for example Type.InvokeMember())
- Think 10-100x slower
- Working set cost
- Runtime data structures
- Think 100 bytes per loaded type, 80 bytes per
loaded method - Be aware of APIs that use reflection as a side
effect - Override
- Object.ToString()
- GetHashCode() and Equals() (for value types)
35Common Language RuntimeBuilding a Cost Model for
Managed Math
- Math performance
- 32 bit integers Similar to native math
- 64 bit integers 5-10X cost of native math
- Floating point Similar to native math
- ARM processors do not have FPU
36.NET Compact Framework
FX
Redist
Globalization
GUI
Net
I/O
Crypto
System.Globalization
System.Cryptography
System.IO.Ports
Microsoft.VisualBasic
System.WebServices
DirectX.DirectD3DM
System.Reflection
MSI Setup(ActiveSync)
Microsoft. Win32.Registry
System.Data
System
System.Net.Http
Windows.Forms
Per Device CABInstall (SMS, etc)
System.IO.File
System.Xml
mscorlib
System.Net.Sockets
System.Drawing
Visual Studio
CLR
JIT Compiler GC
Debugger
CalendarData
Debug Engine
ClassLoader
AssemblyCache
CultureData
ICorDbg
NativeInterop
App DomainLoader
Host
Windows CE
ProcessLoader
Memory and Threading
NTLM
CommonControls
File I/O
Sorting
Crypto API
Managed Loader
File Mapping
Cert/SecurityVerification
SSL
GDI/GWES
Registry
Encodings
Sockets
Casing
D3DM
37Base Class LibraryCollections
- Pre-size collection classes appropriately
- Resizing creates unnecessary copies
- Beware of foreach overhead, use indexer when
available - ArrayList al new ArrayList(string_array)
- foreach (MyType mt in al)//do something
- will be compiled into
- callvirt instance class IEnumeratorGetEnumerato
r() -
- callvirt instance object IEnumeratorget_Curre
nt() -
- callvirt instance bool IEnumeratorMoveNext()
38Windows FormsBest Practices
- Load and cache Forms in the background
- Populate data separate from Form.Show()
- Pre-populate data, or
- Load data async to Form.Show()
- Use BeginUpdate/EndUpdate when it is available
- e.g. ListView, TreeView
- Use SuspendLayout/ResumeLayout when repositioning
controls - Keep event handling code tight
- Process bigger operations asynchronously
- Blocking in event handlers will affect UI
responsiveness - Form load performance
- Reduce the number of method calls during
initialization
39Graphics And GamesBest Practices
- Compose to off-screen buffers to minimize direct
to screen blitting - Approximately 50 faster
- Avoid transparent blitting in areas that require
performance - Approximate 1/3 speed of normal blitting
- Consider using pre-rendered images versus using
System.Drawing rendering primitives - Need to measure on a case-by-case basis
40XMLBest Practices for Managing Large XML Data
Files
- Use XMLTextReader/XMLTextWriter
- Smaller memory footprint than using XmlDocument
- XmlTextReader is a pull model parser which only
reads a window of the data - XmlDocument builds a generic, untyped object
model using a tree - Type stored as string
- OK to use with smaller documents (64K XML
0.25s) - Optimize the structure of XML document
- Use elements to group
- Allows use of Skip() in XmlReader
- Use attributes to reduce size processing
attribute-centric documents is faster - Keep it short! (attribute and element names)
- Avoid gratuitous use of white space
41XMLCreating optimized Reader/Writer
- In v2 use XmlReader/XmlWriter factory classes to
create optimized reader or writer - Applying proper XMLReaderSettings can improve
performance - XmlReaderSettings settings new
XmlReaderSettings() - settings.IgnoreWhitespace true
- XmlReader reader XmlReader.Create(my.xml,setti
ngs) - Up to 30 performance increase when
IgnoreWhitespace true is specified (depends on
document format)
42Demo
- XmlDocument vs. XmlTextReader
43XMLReading local data with DataSet
- DataSet is a database independent container of
relational data - Allows you to work with XML
- ReadXml Allows you to load XML data into DataSet
- Simple to use, but performs badly, especially
with large XML files - If you must use DS.ReadXml, make sure that you
first supply the schema - Use XmlReader whereever possible for traversing
through your data
44Demo
- DataSet and .NET CompactFramework
45Non-XML local dataReading files locally
- It might be required to read text file stored
locally on the device - StreamReader and FileStream classes are typically
employed - For large file sizes (100 K), FileStream
outperforms StreamReader - StreamReader specifically looks for line-breaks,
FileStream does not
46Web ServicesWhere is a bottleneck
- Are you network bound or CPU bound?
- Use perf counters socket bytes sent / received
Do you come close to the network capacity? - If you are network bound work on reducing the
size of the message - Create a canned message, send over HTTP
Compare performance with the web service - If you are CPU bound, optimize the serialization
scheme for speed - http//blogs.msdn.com/mikezintel/archive/2005/03/3
0/403941.aspx
47Moving Forward
- More tools
- Live Remote Performance Counters (new in v2)
- Under construction
- Allocation profiler (CLR profiler)
- Call profiler
- Working set improvements
- More speed
48Summary
- Make performance a requirement and measure
- Understand the APIs
- Isolate exactly what is being measured
- Repeat tests several times and ignore the first
time which is affected by JITting - Track the results in order for later comparisons
and review - Ensure comparison of Apples to Apples
- Use real code when possible
- Test multiple designs and strategies - Understand
the differences or variation - Avoid unnecessary object allocation and copies
due to - String manipulations
- Boxing
- Not pre-sized collections
- Performance FAQ
- http//blogs.msdn.com/netcfteam/archive/2005/05/04
/414820.aspx