Title: Cache Coherence Simulation using GEMS
1Cache Coherence Simulation using GEMS
2Cache Coherence
- Caches are essential for high-performance
- Multiprocessor has many caches to keep
consistent. - Cache Coherence Protocols
- Dependent on architecture and applications
- Can be difficult to validate correctness
- Simulation is invaluable
3Cache Coherence Simulators
- LIMES
- RSIM
- M5
- ccSIM
- TLA/TLC
4GEMS Overview
- Fully Functional simulation
- Timing focus Simics handles functionality
- Ruby - Memory simulator
- Cache coherence protocol
- Interconnection network
- Memory architecture
- Opal - Out of Order execution simulator
5SLICC
- Specification Language for Implementing Cache
Coherence - Protocol specified using
- States
- Events
- Actions
- Transitions
6SLICC Documentation
7SLICC Documentation
8Installation (SIMICS)
- Assessing the Host Machine
- Acquiring a Simics License
- Downloading Simics
- Follow Simics Installation Instructions
- Test Simics
9Preparing Simics
- Install Solaris
- Edit the Hardware Configuration
- Create CDROM Image if you want to import
pre-compiled information - Load the CDROM Image file into the Hardware
Configuration - Startup New Hardware
- Save Checkpoint
10Installation (GEMS)
- Download and Install
- Copy Simics into GEMS directory
- Compile Ruby, Opal, and a Cache Coherency
Protocol - Startup Simics
- GEMS documentation is excellent at describing how
to start simics using the newly compiled cache
coherency protocol. - http//www.cs.wisc.edu/gems/doc/wiki/moin.cgi
- Load the Checkpoint
- Init Ruby (and optionally opal)
11Pitfalls
- Getting files in and out of Simics
- Setting RUBY parameters properly
- Simics over XWindows
12Tested Simulation
- Heat Distribution Problem built on PTHREADS
- The operating system would disperse the 8 threads
onto 8 different processors - Each thread swapped data using shared memory
- Barriers were also used to synchronize the
threads sharing
13Simulation Results?
L1_REQUEST_LATENCY 2 L2_REQUEST_LATENCY
4 SINGLE_ACCESS_L2_BANKS true SEQUENCER_TO_CONTRO
LLER_LATENCY 4 L1CACHE_TRANSITIONS_PER_RUBY_CYCLE
32 L2CACHE_TRANSITIONS_PER_RUBY_CYCLE
32 DIRECTORY_TRANSITIONS_PER_RUBY_CYCLE
32 g_SEQUENCER_OUTSTANDING_REQUESTS
16 NUMBER_OF_TBES 128 NUMBER_OF_L1_TBES
32 NUMBER_OF_L2_TBES 32 FINITE_BUFFERING
false FINITE_BUFFER_SIZE 3 PROCESSOR_BUFFER_SIZE
10 PROTOCOL_BUFFER_SIZE 32 TSO
false g_MASK_PREDICTOR_CONFIG AlwaysBroadcast g_T
OKEN_REISSUE_THRESHOLD 2 g_PERSISTENT_PREDICTOR_C
ONFIG None g_NETWORK_TOPOLOGY
HIERARCHICAL_SWITCH g_CACHE_DESIGN
NUCA g_endpoint_bandwidth 10000 g_adaptive_routin
g true NUMBER_OF_VIRTUAL_NETWORKS
4 FAN_OUT_DEGREE 4 g_PRINT_TOPOLOGY
false Profiler printConfig Network
Configuration
Profiler Stats -------------- Elapsed_time_in_seco
nds 8368 Elapsed_time_in_minutes
139.467 Elapsed_time_in_hours 2.32444 Elapsed_tim
e_in_days 0.0968519 Ruby_current_time
26376000 Ruby_start_time 1 Ruby_cycles
26375999 mbytes_resident 232.309 mbytes_total
247.68 resident_ratio 0.937987 L1D_cache cache
stats L1D_cache_total_misses 28732
L1D_cache_total_demand_misses 28732
L1D_cache_total_prefetches 0
L1D_cache_total_sw_prefetches 0
L1D_cache_total_hw_prefetches 0
L1D_cache_misses_per_transaction 28732
L1D_cache_misses_per_instruction 7.66225e-05
L1D_cache_instructions_per_misses 13051
L1D_cache_request_type_LD 51.3156
L1D_cache_request_type_ST 43.0565
L1D_cache_request_type_ATOMIC 5.62787
Requests of asi 0x4 191870 Requests of asi
0x10 10080 Requests of asi 0x11 7735 Requests
of asi 0x14 893 Requests of asi 0x24
5066 Requests of asi 0x71 90 Requests of asi
0x80 463460432 Requests of asi 0xf0
9883 Simics Driver Transaction Results
Stats ------------------------------------------ F
ast path 463618487 Request missed
50274 Sequencer not ready 0 Duplicate
instruction fetches 21541 Hit return
27115 Atomic last accesses 1617 Chip
Stats ---------- --- L1Cache --- - Event
Counts - Load 14744 Ifetch 21542 Store
13988 L1_to_L2 37772 L2_to_L1D 7442 L2_to_L1I
14520
outgoing_messages_switch_21_link_3_Data 578
41616 0 578 0 0 base_latency
14 switch_22_inlinks 1 switch_22_outlinks
4 links_utilized_percent_switch_22 0.249035
links_utilized_percent_switch_22_link_0 0.102994
bw 10000 base_latency 14 links_utilized_percen
t_switch_22_link_1 0.183767 bw 10000
base_latency 14 links_utilized_percent_switch_2
2_link_2 0.575514 bw 10000 base_latency 14
links_utilized_percent_switch_22_link_3 0.133867
bw 10000 base_latency 14 outgoing_messages_sw
itch_22_link_0_Control 28512 228096 28512 0 0
0 base_latency 14 outgoing_messages_switch_22
_link_0_Data 605 43560 0 605 0 0
base_latency 14 outgoing_messages_switch_22_lin
k_1_Control 28512 228096 28512 0 0 0
base_latency 14 outgoing_messages_switch_22_lin
k_1_Data 3564 256608 0 3564 0 0
base_latency 14 outgoing_messages_switch_22_lin
k_2_Control 28512 228096 28512 0 0 0
base_latency 14 outgoing_messages_switch_22_lin
k_2_Data 17915 1289880 0 17915 0 0
base_latency 14 outgoing_messages_switch_22_lin
k_3_Control 28512 228096 28512 0 0 0
base_latency 14 outgoing_messages_switch_22_lin
k_3_Data 1736 124992 0 1736 0 0
base_latency 14 switch_23_inlinks
1 switch_23_outlinks 4 links_utilized_percent_swi
tch_23 0.0105899 links_utilized_percent_switch_
23_link_0 0.0101183 bw 10000 base_latency 14
links_utilized_percent_switch_23_link_1
0.0107613 bw 10000 base_latency 14
links_utilized_percent_switch_23_link_2
0.00997877 bw 10000 base_latency 14
links_utilized_percent_switch_23_link_3
0.0115014 bw 10000 base_latency 14
outgoing_messages_switch_23_link_0_Control 3336
26688 3336 0 0 0 base_latency 14
outgoing_messages_switch_23_link_1_Control 3548
28384 3548 0 0 0 base_latency 14
Ruby Configuration ------------------ protocol
MOSI_SMP_bcast simics_version simics-2.0.28 compi
led_at 122202, Mar 16 2005 RUBY_DEBUG
false hostname eb22909.eng.uah.edu g_RANDOM_SEED
1 g_DEADLOCK_THRESHOLD 50000 g_FORWARDING_ENABLE
D false RANDOMIZATION false g_SYNTHETIC_DRIVER
false g_DETERMINISTIC_DRIVER false g_FILTERING_EN
ABLED false g_DISTRIBUTED_PERSISTENT_ENABLED
true g_DYNAMIC_TIMEOUT_ENABLED
true g_RETRY_THRESHOLD 1 g_FIXED_TIMEOUT_LATENCY
300 g_trace_warmup_length 1000000 g_bash_bandwid
th_adaptive_threshold 0.75 g_tester_length
0 g_synthetic_locks 2048 g_deterministic_addrs
1 g_SpecifiedGenerator DetermInvGenerator g_callb
ack_counter 0 g_NUM_COMPLETIONS_BEFORE_PASS
0 g_think_time 5 g_hold_time 5 g_wait_time
5 PROTOCOL_DEBUG_TRACE true
prefetch_latency binsize 1 max 0 count 0
average NaN standard deviation NaN 0
prefetch_latency_L2Missbinsize 1 max 0
count 0 average NaN standard deviation NaN
0 multicast_retries binsize 1 max 0 count
0 average NaN standard deviation NaN 0
gets_mask_prediction_count binsize 1 max 0
count 0 average NaN standard deviation NaN
0 getx_mask_prediction_count binsize 1 max
0 count 0 average NaN standard deviation NaN
0 explicit_training_mask binsize 1 max 0
count 0 average NaN standard deviation NaN
0 conflicting_histogram binsize log2 max
26374003 count 28512 average 1.38371e07
standard deviation 1.55252e07 0 0 0 5 0 0 0 0
1 8 9 19 38 63 72 0 0 0 0 0 116 1136 398 7457
8077 11113 conflicting_histogram_percent
binsize log2 max 26374003 count 28512
average 1.38371e07 standard deviation
1.55252e07 0 0 0 0.0175365 0 0 0 0 0.0035073
0.0280584 0.0315657 0.0666386 0.133277 0.22096
0.252525 0 0 0 0 0 0.406846 3.98429 1.3959
26.1539 28.3284 38.9766 Request
Profile --------------- I M
GETS 885 3.10396 I
M GETX 118 0.413861 I
M GET_INSTR 2 0.00701459
I OS GETS 322 1.12935
I OS GETX 5
0.0175365 I OSS GETS
1192 4.1807 I OSS GETX
17 0.059624 NP C
GETS 5813 20.3879 NP
C GETX 9244 32.4214 NP
C GET_INSTR 4656 16.33
DEBUG_FILTER_STRING none DEBUG_VERBOSITY_STRING
none DEBUG_START_TIME 0 DEBUG_OUTPUT_FILENAME
none SIMICS_RUBY_MULTIPLIER 2 OPAL_RUBY_MULTIPLIE
R 2 TRANSACTION_TRACE_ENABLED
false USER_MODE_DATA_ONLY false PROFILE_HOT_LINES
false PROFILE_ALL_INSTRUCTIONS
false PRINT_INSTRUCTION_TRACE false BLOCK_STC
false PERFECT_MEMORY_SYSTEM false DATA_BLOCK
false REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH
false g_SIMICS true L1_CACHE_ASSOC
4 L1_CACHE_NUM_SETS_BITS 8 L2_CACHE_ASSOC
4 L2_CACHE_NUM_SETS_BITS 16 g_MEMORY_SIZE_BYTES
1073741824 g_DATA_BLOCK_BYTES 64 g_PAGE_SIZE_BYTE
S 4096 g_NUM_PROCESSORS 8 g_NUM_L2_BANKS
8 g_NUM_MEMORIES 8 g_PROCS_PER_CHIP
1 g_NUM_CHIPS 8 g_NUM_CHIP_BITS 3
NP M GETS 453
1.5888 NP M GETX 158
0.554153 NP M GET_INSTR
14 0.0491021 NP OS GETS
33 0.115741 NP OSS
GETS 9 0.0315657 NP
S GETS 512 1.79574 NP
S GETX 22 0.0771605
NP S GET_INSTR 1293 4.53493
NP SS GETS 237
0.831229 NP SS GETX
2 0.00701459 NP SS GET_INSTR
1058 3.71072 O M
GETX 1 0.0035073 O
OS GETX 301 1.0557 O
OSS GETX 235 0.824214
S M GETX 63
0.22096 S OS GETX 538
1.88692 S OSS GETX
78 0.273569 S S
GETX 1186 4.15965 S
SS GETX 65 0.227974 filter_acti
on binsize 1 max 0 count 0 average NaN
standard deviation NaN 0 Message Delayed
Cycles ---------------------- Total_delay_cycles
binsize 1 max 0 count 0 average NaN
standard deviation NaN 0 Total_nonPF_delay_c
ycles binsize 1 max 0 count 0 average NaN
standard deviation NaN 0
virtual_network_0_delay_cycles binsize 1 max
0 count 0 average NaN standard deviation NaN
0 virtual_network_1_delay_cycles binsize
1 max 0 count 0 average NaN standard
deviation
outgoing_messages_switch_7_link_0_Control
1788 14304 1788 0 0 0 base_latency 14
outgoing_messages_switch_7_link_0_Data 466 33552
0 466 0 0 base_latency 14 switch_8_inlinks
1 switch_8_outlinks 1 links_utilized_percent_swit
ch_8 0.0817288 links_utilized_percent_switch_8_
link_0 0.0817288 bw 10000 base_latency 14
outgoing_messages_switch_8_link_0_Data 2994
215568 0 2994 0 0 base_latency
14 switch_9_inlinks 1 switch_9_outlinks
1 links_utilized_percent_switch_9 0.0821929
links_utilized_percent_switch_9_link_0 0.0821929
bw 10000 base_latency 14 outgoing_messages_sw
itch_9_link_0_Data 3011 216792 0 3011 0 0
base_latency 14 switch_10_inlinks
1 switch_10_outlinks 1 links_utilized_percent_swi
tch_10 0.0805005 links_utilized_percent_switch_
10_link_0 0.0805005 bw 10000 base_latency 14
outgoing_messages_switch_10_link_0_Data 2949
212328 0 2949 0 0 base_latency
14 switch_11_inlinks 1 switch_11_outlinks
1 links_utilized_percent_switch_11 0.0836397
links_utilized_percent_switch_11_link_0
0.0836397 bw 10000 base_latency 14
outgoing_messages_switch_11_link_0_Data 3064
220608 0 3064 0 0 base_latency
14 switch_12_inlinks 1
misses_per_transaction 0 0 0 0 0 0 0 0 0
Busy Controller Counts L1Cache-00
L1Cache-10 L1Cache-20 L1Cache-30
L1Cache-40 L1Cache-50 L1Cache-60
L1Cache-70 Directory-00 Directory-10
Directory-20 Directory-30 Directory-40
Directory-50 Directory-60 Directory-70
Busy Bank Count0 L1TBE_usage binsize 1
max 0 count 0 average NaN standard deviation
NaN 0 L2TBE_usage binsize 1 max 0 count
28512 average 0 standard deviation 0
28512 StopTable_usage binsize 1 max 0
count 0 average NaN standard deviation NaN
0 sequencer_requests_outstanding binsize 1
max 1 count 50274 average 1 standard
deviation 0 0 50274 store_buffer_size
binsize 1 max 0 count 0 average NaN
standard deviation NaN 0 unique_blocks_in_st
ore_buffer binsize 1 max 0 count 0 average
NaN standard deviation NaN 0 All Non-Zero
Cycle Demand Cache Accesses ----------------------
------------------ miss_latency binsize 4 max
610 count 50274 average 113.155 standard
deviation 97.9074 0 21762 0 0 0 0 0 0 0 0 0 0
0 0 0 0 430 104 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2383 63 332 164 131 238 23 246 5 150 2 23 8 0 0 0
1 21736 100 1846 10 164 119 10 94 2 41 0 15 0 0 0
0 0 0 1 27 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 15 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 11 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
L1D_cache_access_mode_type_SupervisorMode
24019 83.5967 L1D_cache_access_mode_type_Use
rMode 4713 16.4033 L1D_cache_request_size
binsize log2 max 64 count 28732 average
25.3078 standard deviation 27.779 0 1878 661
7732 8723 0 0 9738 L1I_cache cache stats
L1I_cache_total_misses 21542
L1I_cache_total_demand_misses 21542
L1I_cache_total_prefetches 0
L1I_cache_total_sw_prefetches 0
L1I_cache_total_hw_prefetches 0
L1I_cache_misses_per_transaction 21542
L1I_cache_misses_per_instruction 5.74482e-05
L1I_cache_instructions_per_misses 17407
L1I_cache_request_type_IFETCH 100
L1I_cache_access_mode_type_SupervisorMode
17190 79.7976 L1I_cache_access_mode_type_Use
rMode 4352 20.2024 L1I_cache_request_size
binsize log2 max 4 count 21542 average
4 standard deviation 0 0 0 0 21542
L2_cache cache stats L2_cache_total_misses
28512 L2_cache_total_demand_misses 28512
L2_cache_total_prefetches 0 L2_cache_total_sw_p
refetches 0 L2_cache_total_hw_prefetches 0
L2_cache_misses_per_transaction 28512
cache_set_size_Kbytes 16 cache_set_size_Mbytes
0.015625 cache_size_bytes 65536
cache_size_Kbytes 64 cache_size_Mbytes
0.0625 Cache config L1Cache_0_L2
cache_associativity 4 num_cache_sets_bits 16
num_cache_sets 65536 cache_set_size_bytes
4194304 cache_set_size_Kbytes 4096
cache_set_size_Mbytes 4 cache_size_bytes
16777216 cache_size_Kbytes 16384
cache_size_Mbytes 16 sequencer STD_Sequencer -
SC Store buffer entries 128 (Only valid if TSO
is enabled) memory_bits 30 memory_size_bytes
1073741824 memory_size_Kbytes 1.04858e06 memory_
size_Mbytes 1024 memory_size_Gbytes
1 module_bits 21 module_size_lines
2097152 module_size_bytes 134217728 module_size_K
bytes 131072 module_size_Mbytes 128 Real time
Apr/20/2005 162633
switch_12_outlinks 1 links_utilized_percent_switc
h_12 0.0838581 links_utilized_percent_switch_12
_link_0 0.0838581 bw 10000 base_latency 14
outgoing_messages_switch_12_link_0_Data 3072
221184 0 3072 0 0 base_latency
14 switch_13_inlinks 1 switch_13_outlinks
1 links_utilized_percent_switch_13 0.0812921
links_utilized_percent_switch_13_link_0
0.0812921 bw 10000 base_latency 14
outgoing_messages_switch_13_link_0_Data 2978
214416 0 2978 0 0 base_latency
14 switch_14_inlinks 1 switch_14_outlinks
1 links_utilized_percent_switch_14 0.0824659
links_utilized_percent_switch_14_link_0
0.0824659 bw 10000 base_latency 14
outgoing_messages_switch_14_link_0_Data 3021
217512 0 3021 0 0 base_latency
14 switch_15_inlinks 1 switch_15_outlinks
1 links_utilized_percent_switch_15 0.0818653
links_utilized_percent_switch_15_link_0
0.0818653 bw 10000 base_latency 14
outgoing_messages_switch_15_link_0_Data 2999
215928 0 2999 0 0 base_latency
14 switch_16_inlinks 4 switch_16_outlinks 1
g_MEMORY_SIZE_BITS 30 g_DATA_BLOCK_BITS
6 g_PAGE_SIZE_BITS 12 g_NUM_PROCESSORS_BITS
3 g_PROCS_PER_CHIP_BITS 0 g_NUM_L2_BANKS_BITS
3 g_NUM_L2_BANKS_PER_CHIP_BITS
0 g_NUM_L2_BANKS_PER_CHIP 1 g_NUM_MEMORIES_BITS
3 g_NUM_MEMORIES_PER_CHIP 1 g_MEMORY_MODULE_BITS
21 g_MEMORY_MODULE_BLOCKS 2097152 MAP_L2BANKS_TO
_LOWEST_BITS false DIRECTORY_CACHE_LATENCY
6 NULL_LATENCY 1 ISSUE_LATENCY
2 CACHE_RESPONSE_LATENCY_MINUS_1
11 MEMORY_LATENCY 80 DIRECTORY_LATENCY
80 NETWORK_LINK_LATENCY 14 COPY_HEAD_LATENCY
4 ON_CHIP_LINK_LATENCY 1 RECYCLE_LATENCY
10 L2_RECYCLE_LATENCY 5 TIMER_LATENCY
10000 L1_BANK_LATENCY_MINUS_1 2 L2_BANK_LATENCY_M
INUS_2 4 TBE_RESPONSE_LATENCY
1 PERIODIC_TIMER_WAKEUPS true
miss_latency_LD binsize 4 max 429 count
14744 average 122.583 standard deviation
91.6827 0 5288 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1534 47 302 158
130 237 23 241 4 147 1 23 8 0 0 0 1 5970 19 452 4
54 34 6 31 1 14 0 1 0 0 0 0 0 0 0 9 0 0 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 miss_latency_ST
binsize 4 max 610 count 12371 average 177.71
standard deviation 66.447 0 1258 0 0 0 0 0 0
0 0 0 0 0 0 0 0 336 101 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 646 14 22 5 1 0 0 4 1 1 1 0 0 0 0 0 0 9352
27 496 1 29 14 3 12 0 3 0 1 0 0 0 0 0 0 1 13 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 miss_latency_ATOMIC binsize 4
max 427 count 1617 average 103.586 standard
deviation 95.0209 0 697 0 0 0 0 0 0 0 0 0 0 0
0 0 0 94 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 187 2
8 1 0 1 0 1 0 2 0 0 0 0 0 0 0 551 6 48 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 5 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
miss_latency_IFETCH binsize 2 max 249
count 21542 average 70.3483 standard
deviation 95.4592 0 0 14519 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 4 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5863 26 22 846
4 4 0 79 2 0 71 1 0 0 51 1 0 0 24 0 0 13
miss_latency_NULL binsize 4 max 610 count
50274 average 113.155 standard deviation
97.9074 0 21762 0 0 0 0 0 0 0 0 0 0 0 0 0 0 430
104 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2383 63 332
164 131 238 23 246 5 150 2 23 8 0 0 0 1 21736 100
1846 10 164 119 10 94 2 41 0 15 0 0 0 0 0 0 1 27
1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 15 2 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 2 11 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 miss_latency_L2Miss
binsize 1 max 0 count 0 average NaN
standard deviation NaN 0 All Non-Zero
Cycle SW Prefetch Requests -----------------------
-------------
L2_cache_misses_per_instruction 7.60358e-05
L2_cache_instructions_per_misses 13151.7
L2_cache_request_type_LD 33.165
L2_cache_request_type_ST 38.9766
L2_cache_request_type_ATOMIC 3.22671
L2_cache_request_type_IFETCH 24.6317
L2_cache_access_mode_type_SupervisorMode 24839
87.1177 L2_cache_access_mode_type_UserMode
3673 12.8823 L2_cache_request_size
binsize log2 max 64 count 28512 average
24.927 standard deviation 28.0464 0 1357 577
11484 5424 0 0 9670 Total_misses
28512 total_misses 28512 2098 797 849 608 635
3632 18105 1788 user_misses 3673 0 0 0 0 0
234 3103 336 supervisor_misses 24839 2098
797 849 608 635 3398 15002 1452
instruction_executed 374981341 52017683
51976805 51917161 52160663 52135482 48815032
15010842 50947673 cycles_per_instruction
0.562716 0.507058 0.507457 0.50804 0.505668
0.505913 0.540325 1.75713 0.517708
misses_per_thousand_instructions 0.0760358
0.0403324 0.0153338 0.016353 0.0116563 0.0121798
0.0744033 1.20613 0.0350948 transactions_starte
d 0 0 0 0 0 0 0 0 0 transactions_ended 0
0 0 0 0 0 0 0 0 instructions_per_transaction 0
0 0 0 0 0 0 0 0 cycles_per_transaction 0 0
0 0 0 0 0 0 0
links_utilized_percent_switch_16 0.0516348
links_utilized_percent_switch_16_link_0
0.0516348 bw 10000 base_latency 14
outgoing_messages_switch_16_link_0_Control 4352
34816 4352 0 0 0 base_latency 14
outgoing_messages_switch_16_link_0_Data 1408
101376 0 1408 0 0 base_latency
14 switch_17_inlinks 4 switch_17_outlinks
1 links_utilized_percent_switch_17 0.139776
links_utilized_percent_switch_17_link_0 0.139776
bw 10000 base_latency 14 outgoing_messages_sw
itch_17_link_0_Control 24160 193280 24160 0 0
0 base_latency 14 outgoing_messages_switch_17
_link_0_Data 2436 175392 0 2436 0 0
base_latency 14 switch_18_inlinks
4 switch_18_outlinks 1 links_utilized_percent_swi
tch_18 0.328062 links_utilized_percent_switch_1
8_link_0 0.328062 bw 10000 base_latency 14
outgoing_messages_switch_18_link_0_Data 12018
865296 0 12018 0 0 base_latency
14 switch_19_inlinks 4 switch_19_outlinks
1 links_utilized_percent_switch_19 0.329481
links_utilized_percent_switch_19_link_0 0.329481
bw 10000 base_latency 14 outgoing_messages_sw
itch_19_link_0_Data 12070 869040 0 12070 0 0
base_latency 14 switch_20_inlinks 4
outgoing_messages_switch_3_link_0_Data 138 9936
0 138 0 0 base_latency 14 switch_4_inlinks
1 switch_4_outlinks 1 links_utilized_percent_swit
ch_4 0.00451926 links_utilized_percent_switch_4
_link_0 0.00451926 bw 10000 base_latency 14
outgoing_messages_switch_4_link_0_Control 635
5080 635 0 0 0 base_latency 14
outgoing_messages_switch_4_link_0_Data 95 6840
0 95 0 0 base_latency 14 switch_5_inlinks
1 switch_5_outlinks 1 links_utilized_percent_swit
ch_5 0.0330998 links_utilized_percent_switch_5_
link_0 0.0330998 bw 10000 base_latency 14
outgoing_messages_switch_5_link_0_Control 3632
29056 3632 0 0 0 base_latency 14
outgoing_messages_switch_5_link_0_Data 809 58248
0 809 0 0 base_latency 14 switch_6_inlinks
1 switch_6_outlinks 1 links_utilized_percent_swit
ch_6 0.0840127 links_utilized_percent_switch_6_
link_0 0.0840127 bw 10000 base_latency 14
outgoing_messages_switch_6_link_0_Control 18105
144840 18105 0 0 0 base_latency 14
outgoing_messages_switch_6_link_0_Data 1066
76752 0 1066 0 0 base_latency
14 switch_7_inlinks 1 switch_7_outlinks
1 links_utilized_percent_switch_7 0.0181438
links_utilized_percent_switch_7_link_0 0.0181438
bw 10000 base_latency 14
L2_Replacement 0 Own_GETS 9456 Own_GET_INSTR
7023 Own_GETX 12033 Own_PUTX 0 Other_GETS
66192 Other_GET_INSTR 49161 Other_GETX
84231 Other_PUTX 0 Data 27977 - Transitions
- NP Load 7057 NP Ifetch 7021 NP Store
9426 NP Other_GETS 50429 NP Other_GET_INSTR
43420 NP Other_GETX 79638 NP Other_PUTX 0 lt--
I Load 2399 I Ifetch 2 I Store 140 I
L1_to_L2 189 I L2_to_L1D 139 I L2_to_L1I 1 I
L2_Replacement 0 lt-- I Other_GETS 3181 I
Other_GET_INSTR 0 lt--
M Store 1955 M L1_to_L2 12532 M L2_to_L1D
3888 M L2_to_L1I 260 M L2_Replacement 0 lt--
M Other_GETS 1310 M Other_GET_INSTR 16 M
Other_GETX 260 M Other_PUTX 0 lt-- IS_AD
Load 0 lt-- IS_AD Ifetch 0 lt-- IS_AD Store
0 lt-- IS_AD L1_to_L2 0 lt-- IS_AD L2_to_L1D
0 lt-- IS_AD L2_to_L1I 0 lt-- IS_AD
L2_Replacement 0 lt-- IS_AD Own_GETS
9456 IS_AD Own_GET_INSTR 7023 IS_AD Other_GETS
3376 IS_AD Other_GET_INSTR 411 IS_AD
Other_GETX 111 IS_AD Other_PUTX 0 lt-- IS_AD
Data 0 lt-- IM_AD Load 0 lt-- IM_AD Ifetch
0 lt-- IM_AD Store 0 lt-- IM_AD L1_to_L2 0
lt--
I Other_GETX 1024 I Other_PUTX 0 lt-- S
Load 3180 S Ifetch 14205 S Store 1930 S
L1_to_L2 24732 S L2_to_L1D 3208 S L2_to_L1I
14205 S L2_Replacement 0 lt-- S Other_GETS
2833 S Other_GET_INSTR 4903 S Other_GETX
1965 S Other_PUTX 0 lt-- O Load 175 O
Ifetch 54 O Store 537 O L1_to_L2 319 O
L2_to_L1D 207 O L2_to_L1I 54 O L2_Replacement
0 lt-- O Other_GETS 1358 O Other_GET_INSTR 0
lt-- O Other_GETX 635 O Other_PUTX 0 lt-- M
Load 1933 M Ifetch 260
switch_0_outlinks 1 links_utilized_percent_switch
_0 0.0302487 links_utilized_percent_switch_0_li
nk_0 0.0302487 bw 10000 base_latency 14
outgoing_messages_switch_0_link_0_Control 2098
16784 2098 0 0 0 base_latency 14
outgoing_messages_switch_0_link_0_Data 875 63000
0 875 0 0 base_latency 14 switch_1_inlinks
1 switch_1_outlinks 1 links_utilized_percent_swit
ch_1 0.00678496 links_utilized_percent_switch_1
_link_0 0.00678496 bw 10000 base_latency 14
outgoing_messages_switch_1_link_0_Control 797
6376 797 0 0 0 base_latency 14
outgoing_messages_switch_1_link_0_Data 160 11520
0 160 0 0 base_latency 14 switch_2_inlinks
1 switch_2_outlinks 1 links_utilized_percent_swit
ch_2 0.00898999 links_utilized_percent_switch_2
_link_0 0.00898999 bw 10000 base_latency 14
outgoing_messages_switch_2_link_0_Control 849
6792 849 0 0 0 base_latency 14
outgoing_messages_switch_2_link_0_Data 235 16920
0 235 0 0 base_latency 14 switch_3_inlinks
1 switch_3_outlinks 1 links_utilized_percent_swit
ch_3 0.00561116 links_utilized_percent_switch_3
_link_0 0.00561116 bw 10000 base_latency 14
outgoing_messages_switch_3_link_0_Control 608
4864 608 0 0 0 base_latency 14
--------------------- network SIMPLE_NETWORK vir
tual_net_0 active, ordered virtual_net_1
active, unordered virtual_net_2
inactive virtual_net_3 inactive Simics ruby
multiplier 2 Simics stall time 2000000000 Chip
Config ----------- TBEs_per_TBETable 128 Cache
config L1Cache_0_L1I cache_associativity 4
num_cache_sets_bits 8 num_cache_sets 256
cache_set_size_bytes 16384 cache_set_size_Kbyte
s 16 cache_set_size_Mbytes 0.015625
cache_size_bytes 65536 cache_size_Kbytes 64
cache_size_Mbytes 0.0625 Cache config
L1Cache_0_L1D cache_associativity 4
num_cache_sets_bits 8 num_cache_sets 256
cache_set_size_bytes 16384
outgoing_messages_switch_23_link_2_Control 3290
26320 3290 0 0 0 base_latency 14
outgoing_messages_switch_23_link_3_Control 3792
30336 3792 0 0 0 base_latency
14 switch_24_inlinks 1 switch_24_outlinks
4 links_utilized_percent_switch_24 0.0110297
links_utilized_percent_switch_24_link_0 0.011007
bw 10000 base_latency 14 links_utilized_percen
t_switch_24_link_1 0.0106885 bw 10000
base_latency 14 links_utilized_percent_switch_2
4_link_2 0.011556 bw 10000 base_latency 14
links_utilized_percent_switch_24_link_3
0.0108675 bw 10000 base_latency 14
outgoing_messages_switch_24_link_0_Control 3629
29032 3629 0 0 0 base_latency 14
outgoing_messages_switch_24_link_1_Control 3524
28192 3524 0 0 0 base_latency 14
outgoing_messages_switch_24_link_2_Control 3810
30480 3810 0 0 0 base_latency 14
outgoing_messages_switch_24_link_3_Control 3583
28664 3583 0 0 0 base_latency 14 Simics
Driver Transaction Stats -------------------------
--------- Insn requests 374980900 Data requests
88705149 Memory mapped IO register accesses
58 Device initiated accesses 0 Other initiated
accesses 0 Atomic load accesses
5066 Exceptions 5871 Non stallable accesses
17230 Prefetches 0 Cache Flush 737
switch_20_outlinks 4 links_utilized_percent_switc
h_20 0.255573 links_utilized_percent_switch_20_
link_0 0.199682 bw 10000 base_latency 14
links_utilized_percent_switch_20_link_1 0.736133
bw 10000 base_latency 14 links_utilized_percen
t_switch_20_link_2 0.0423597 bw 10000
base_latency 14 links_utilized_percent_switch_2
0_link_3 0.0441189 bw 10000 base_latency 14
outgoing_messages_switch_20_link_0_Control 28512
228096 28512 0 0 0 base_latency 14
outgoing_messages_switch_20_link_0_Data 4147
298584 0 4147 0 0 base_latency 14
outgoing_messages_switch_20_link_1_Control 28512
228096 28512 0 0 0 base_latency 14
outgoing_messages_switch_20_link_1_Data 23799
1713528 0 23799 0 0 base_latency 14
outgoing_messages_switch_20_link_2_Control 13966
111728 13966 0 0 0 base_latency 14
outgoing_messages_switch_20_link_3_Control 14546
116368 14546 0 0 0 base_latency
14 switch_21_inlinks 1 switch_21_outlinks
4 links_utilized_percent_switch_21 0.114848
links_utilized_percent_switch_21_link_0 0.141265
bw 10000 base_latency 14 links_utilized_percen
t_switch_21_link_1 0.107252 bw 10000
base_latency 14 links_utilized_percent_switch_2
1_link_2 0.108617 bw 10000 base_latency 14
links_utilized_percent_switch_21_link_3 0.102257
bw 10000 base_latency 14 outgoing_messages_sw
itch_21_link_0_Control 28512 228096 28512 0 0
0 base_latency 14 outgoing_messages_switch_21
_link_0_Data 2007 144504 0 2007 0 0
base_latency 14 outgoing_messages_switch_21_lin
k_1_Control 28512 228096 28512 0 0 0
base_latency 14 outgoing_messages_switch_21_lin
k_1_Data 761 54792 0 761 0 0 base_latency
14 outgoing_messages_switch_21_link_2_Control
28512 228096 28512 0 0 0 base_latency 14
outgoing_messages_switch_21_link_2_Data 811
58392 0 811 0 0 base_latency 14
outgoing_messages_switch_21_link_3_Control 28512
228096 28512 0 0 0 base_latency 14
NaN 0 virtual_network_2_delay_cycles
binsize 1 max 0 count 0 average NaN
standard deviation NaN 0
virtual_network_3_delay_cycles binsize 1 max
0 count 0 average NaN standard deviation NaN
0 Resource Usage -------------- page_size
4096 user_time 8259 system_time
8 page_reclaims 71016 page_faults 14 swaps
0 block_inputs 0 block_outputs 0 MessageBuffer
Chip 0 0, L1Cache, mandatoryQueue_in stats -
msgs2604 full0 MessageBuffer Chip 1 0,
L1Cache, mandatoryQueue_in stats - msgs797
full0 MessageBuffer Chip 2 0, L1Cache,
mandatoryQueue_in stats - msgs855
full0 MessageBuffer Chip 3 0, L1Cache,
mandatoryQueue_in stats - msgs608
full0 MessageBuffer Chip 4 0, L1Cache,
mandatoryQueue_in stats - msgs635
full0 MessageBuffer Chip 5 0, L1Cache,
mandatoryQueue_in stats - msgs4010
full0 MessageBuffer Chip 6 0, L1Cache,
mandatoryQueue_in stats - msgs38932
full0 MessageBuffer Chip 7 0, L1Cache,
mandatoryQueue_in stats - msgs1833
full0 Network Stats ------------- switch_0_inli
nks 1
14Weaknesses
- Requires a highly capable host Machine
- No modeling of Bus Based Architecture
- No simple way to disable Performance Statistics
15Conclusion
- Complexity of Cache Coherency Protocols
- Excellent Interface for testing new Protocols
- GEMS is useful if left running for days or weeks
simulating a real operating system environment - GEMS is not useful for a quick contrast of
coherency protocols.