Title: ?????????? Intel C ? ?????????? ???? VTune
1?????????? Intel C ? ?????????? ???? VTune ???
???????????????? Intel XScale
- ??????? ?????????
- ??????? ?? ???????????? ???????????
- ISDEF 2004, 17 ???????? 2004
2????? ??? ??????????? Intel XScale
Bulverde Applications Processors
Bulverde Applications Processors
Performance
Premium
PDAs
Phones
Value
Budget
3??????????? ??? ???????????????? Intel XScale
- ?????????? ??????????
- Microsoft eMebedded Visual C 3.0 ??? 4.0 ?
PPC ??? Smartphone SDK - Intel XScale Technology Tool chain
- ????????? ??????????????????
- ??????????? ? ?????????? Intel XScale
- Intel Integrated Performance Primitives (Intel
IPP) - Intel Graphics Performance Primitives (Intel
GPP) - ??????/????????? ??????????????????
- Intel VTune Performance Analyzer ???
???????????????? Intel Xscale - Host Windows2000/XP
4???????? ?????? ?????????? ARM
v6
v5TEJ
ARM1026EJ-S
v5TE
ARM7EJ-S
ARM946E-S
ARM966E-S
v4T
ARM922T
ARM940T
Architecture Name T Thumb Instructions E
Enhanced DSP
ARM7TDMI-S
v4
I
SA110
Other names and brands may be claimed as the
property of others.
5Intel Xscale Architecture Performance Features
- 7 stage pipe
- Branch TargetBuffer
- Write Buffer
- Fill Buffer
- Pend Buffer
- Large Caches
- Performance Monitoring
- iMPT (CP0)
- Intel Wireless MMX technology (CP0,1)
67 stage pipeline
Instruction Fetch 1
Branch Target Buffer
Instruction Fetch 2
Instruction Cache
Instruction Decode
Register file Barrel shift
Bypasses
Bypass
Arithmetic Execution
Data Access 1
Multiply Accumulator 0-3
Data Cache
State Execution
Data Access 2
Integer Writeback
MAC Writeback
Data Writeback
7Organization of PXA255 Data Cache
- 32K Data cache
- 32 bytes per cache line
- 32 sets/32 ways
- 32 byte aligned
- Size affects scalability of App
- Round Robin Replacement
Optimization Tip Multiple threads can trash
each others use of the cache
8XScale vs. IA32 Whats different?
Feature Pentium(r) 4 Processor (Prescott) XScale (Bulverde)
Architecture 25 stage OOO 7 stage in-order
Frequency 3.2 GHz 400MHz (520MHz)
L1 Data Cache 8KByte (16 KByte) 32KByte2KB
L1 Instruction Cache 12k uops 32KByte
L1 Cache Control prefetch prefetch, lock, line flush
L2 Cache 512 KByte (1MByte) -
BTB control - lock, flush
SIMD instructions MMX, SSE, SSE2 (PNI) 5 MPT (43 Wireless MMX)
9Intel Media Processing Technology
- Multiply/accumulate with 40 Bit Result
- Additional instructions to mix and match top and
bottom halfs of input registers - Implemented in Coprocessor 0
10?????????? Intel Wireless MMX
WADDBUSNE wr0, wr1, wr2
11?????????? ? ??? ??????????? Intel XScale
- ????? ????????
- Intel C Compiler 1.2.8
- Intel Assembler 1.2.8
- ?????????????? ?????????
- Microsoft eMbedded Visual Tools, version 3.0
- Microsoft eMbedded Visual C, version 4.0
(Service Pack 1 or Service Pack 2 required) - Platform Builder for Microsoft Windows CE,
version 3.0 - Platform Builder for Microsoft Windows CE .NET,
version 4.1 - Platform Builder for Microsoft Windows CE .NET,
version 4.2 - SDKs
- Pocket PC 2002
- Pocket PC 2003
- Smartphone 2002
- Smartphone 2003
12?????????? ? ??? ??????????? Intel XScale
- ??????????????? ? Microsoft EVC 3.0/4.2 IDE ?
???????? plug-in? - ??????????? ????? ??????????? ????? ?? ??? ?
??????????? Microsoft - ? ??????????? ??????? ?????????? ??? ? ??????
??????????????????? - ???????????? ????????????? ??????????? ???
????????????? Intel Performance Libraries (IPP,
GPP)
13????????????? ??????????? Intel C
- ????? ????????? ?????????? ??? ????? ?????? ?
???? Tools - ????? ??????????? ????????? ???????
?????????????? ????? ???????????. - ???????? ? ????????? ????????????? ???????????
Intel C - ???????????? ?? ?????? ? ?????? ????????????
???????????.
14????? ???????????
- /Od ??? ???????????
- /O1 ??????????? ?? ???????? ? ???????????
?????? ??????? ????? ?????????? - /O2 ??????????? ?? ????????
- /O3 ???????? ??????????????? ???????????
- /Ox ???????????? ??????????? (??????? ??????
????? ??????????) - /Qip, /Qipo ?????????????? ???????????
15????? ??????????????????
Application Bottleneck MS 3.0 MS 4.0 Intel Compiler
Prime Memory 7.9909 7.2283 7.3918
Mortgage Calculator Floating point arithmetic 11.5017 11.5008 2.8332
GapiDraw (squares) Memory writes 1.6394 1.5969 1.507
Digital Persona CPU intensive .8857 .7493 .6799
Speechworks CPU intensive No data 3.1244 2.80245
16VTune ?????????? ??????????????? ???????
?????????????????? ??? ???????????????? Intel
XScale
- ?????? ??????? ?? ?????? ?????????????? ?????????
???????, ?????????? ????? ??????? (??????? ????)
???????? ????????????? ? ???????? (Time Based
Sampling) - ?????? ??????? ?? ?????? ??????????????
?????????? ???????. (Event Based Sampling) - ?????????? PMU, ??????? ???? ?????? ? ???????????
Intel
17????????? ??????????? ??????????????????
- ?????????? ???????? ???????? ??????????
??????????????????. - ?????? ????????? ??????????????????
Event Number Event Definition Event Number Event Definition
0x0 Instruction cache-miss 0x7 Instruction executed
0x1 Instruction cache cannot deliver instruction 0x8 Stall due to D-Cache buffer full (every cycle condition is present)
0x2 Data dependency stall 0x9 Contiguous seq. of event 0x8
0x3 Instruction TLB miss 0xa Data cache access
0x4 Data TLB miss 0xb Data cache-miss
0x5 Branch instr. executed 0xc Data cache write-back
0x6 Branch mispredicted 0xd Software changed the PC
18???????????? VTune
- ????????? ?????????????????? ??? ??????????????
- ????????? ?????? ?????????? ??????????
- ?????????? ?? ?????? ????????? ???? (?? ??????
?????????? ????? ????????????????) ???
??????????? ??????? ????? - ???? ???????????? ?? ????????? ??????????????????
????
19?????????? ???? ??? ???????
- ? ????
- Project -gt Settings
- ???????? ???????? ?/C
- ? ?????????? ???? Debug Info, ???????? Program
Database
20?????????? ???? ??? ???????
- ? ????
- Project -gt Settings
- ???????? ???????? Link
- ????????? ??? ??????? ????? Generate debug info
21??? ???????? VTune Data Collector
- Data Collector ???????????? ??? ?????????
?????????? ????????? ? PMU ? ??????? ?????????? - ?????????? ???????????, ??????????? PMU ??? ?????
?????? - ISR ???????? ??????? ?????????, ?????????? ?
??????, ????????, ???????
Data Collector
- ?????????
- ??????
- Sample File (.rsf)(PMU interrupt sample Data
- Module File (.rmf)(list of modules and
- Locations
Application Under Test w/ Debug Info
Sampling ISR (BSP)
??? ???????? WinCE ????????? ??????????? build
???????????? ???????
22Top5 ???????? ???????
- ????????????? ????????????? ????!
- ???-?????? ????? 150 ??????
- ??????????? preload()
- O?????????? writes, ???????????? reads
- Advanced mini cache, XScale ?????????
???????????? ?????? ? ???? - ?????? ??????????? ??????????? ????
- ??????????? DSP-??????????, iMPT, WMMX
- ????????????? ?????????
- ????????????? ?????
- ??????????? ?????????? Intel!
- ?????e run-time ??????????
- ???????????? ? ???????????
- ???????? ???????????!
- ??????????? VTune ??? ??????? ??????????????????
? ?????? ????? ????! - ??????????? ???????????????? IPP/GPP ??????????
- 0. ????????? ?????? ????? ??? ??? ????????
???????????
23?????
- ??????????? ARM ??????? ? ?????? 32 Bit RISC
CPU ? ??????????????? L1 cache ? ???????
?????????? ??????? - ??????????? Intel Xscale ???????? ???????????
??? ???????????? ARM - ?????????????? ?????? ??????????????????
- ??????????? ????? ??????????? ??? ?????????
?????????????????? ???????? - ?????????? ???????? ??? Intel Xscale ???????
?????????? ???????? ??? IA32 - MS Embedded Visual Studio, Intel Compiler,
Intel VTune performance analyzer
24??????? ?? ????????!