Title: Welcome to the World of Cache
1Welcome to the World of Cache
2The Hidden agenda
- a) Basics of Cache
- 1) Memory Cache
- 2) Where the cache files are created
- 3) Naming Conventions
- 4) Cache Calculations
- b) Advanced Cache
- 1) Look up Cache
- 2) Aggregator Cache
- 3) Joiner Cache
- 4) Ranker Cache
-
3Lets get to the Basics
- Cache is a combination of
- Index Cache Server stores key values or
condition values used to index values at a faster
rate. - Data Cache Server stores output values.
- Caching Storage Overview
- For Index Cachesa) Aggregators store group by
values from Group-By ports.b) Rankers store
Group-By valuesc) Joiners store index values for
the master (Join condition columns) - d) Lookups Stores lookup condition
information - For Data Cachesa) Aggregators store aggregate
data based on Group-By ports (variable ports, - output ports, non group by ports)
- b) Rankers store ranking based on Group-By port
(output rows other than ranked column) - c) Joiners store master table (Output columns not
in Join condition). - d) Look ups Stores stores lookup data that
is not stored in the index cache.
4Memory Cache
- The server creates a memory cache based on size
specified in the session properties which can be
done manually based on certain calculations . - By default, the PowerCenter Server allocates 1 GB
to the index cache and 2GB to the data cache for
each transformation instance. - If the PowerCenter Server cannot allocate the
configured amount of cache memory, it cannot
initialize the session and the session fails. - If the PowerCenter Server requires more memory
than the configured cache size, it pages to the
Disc. Since paging to disk can slow session
performance, try to configure the index and data
cache sizes to store data in memory.
5Where are the Cache Files Created?
- The PowerCenter Server creates the index and data
cache files by default in the PowerCenter Server
variable directory, PMCacheDir. - If you do not define PMCacheDir, the PowerCenter
Server saves the files in the PMCache directory
specified in the UNIX configuration file or the
cache directory in the Windows registry. If the
UNIX PowerCenter Server does not find a directory
there, it creates the index and data files in the
installation directory. If the PowerCenter Server
on Windows does not find a directory there, it
creates the files in the system directory. - If a cache file handles more than 2 GB of data,
the PowerCenter Server creates multiple index and
data files. When creating these files, the
PowerCenter Server appends a number to the end of
the filename, such as PMAGG.idx1 and
PMAGG.idx2. The number of index and data files
are limited only by the amount of disk space
available in the cache directory. - Three Instances when the Cache File exists even
after Session completion - a) The session performs incremental aggregation.
- b) You configure the Lookup transformation to use
a persistent cache. - c) The session does not complete successfully.
6Naming convention followed by Informatica
Server
- ltName Prefixgt ltPrefixgt ltsession
IDgt_lttransformation IDgt_partition
indexltsuffixgt.overflow index - For example,
- PMLKUP8_4_2.idx,
- PMLKUP ? transformation type as Lookup,
- 8 ? the session ID
- 4 ? the transformation ID,
- 2 ? the partition index.
7(No Transcript)
8Cache Calculations
- AggregatorIndex size (Sum of column sizes in
group-by ports 17) X number of groups.Data
Size (Sum of column sizes of output ports 7) X
number of groups. - RankIndex size (Sum of column sizes in
group-by ports 17) X number of groups.Data
Size (Sum of column sizes of output ports 10)
X number of groups 20. - JoinerIndex Size (Sum of master column sizes
in join condition 16) X number rows in master
table.Data Size (Sum of master column sizes NOT
in join condition but on output ports 8)X
number of rows in master table - LookUp
- Index Size rows in lookup table ( S column
size) 16 2 - Data Size rows in lookup table ( S column
size) 8
9(No Transcript)
10Lookup Caches Overview
- The PowerCenter Server builds a cache in memory
when it processes the first row of data in a
cached Lookup transformation - It allocates memory for the cache based on the
amount you configure in the transformation or
session properties. - The PowerCenter Server stores condition values in
the index cache and output values in the data
cache - The PowerCenter Server queries the cache for each
row that enters the transformation. - The PowerCenter Server also creates cache files
by default in the PMCacheDir - If the data does not fit in the memory cache, the
PowerCenter Server stores the overflow values in
the cache files. When the session completes, the
PowerCenter Server releases cache memory and
deletes the cache files unless you configure the
Lookup transformation to use a persistent cache.
11Types of Lookup Cache
- When configuring a lookup cache, you can specify
any of the following options - Persistent cache. You can save the lookup cache
files and reuse them the next time the
PowerCenter Server processes a Lookup
transformation configured to use the cache - Recache from source. If the persistent cache is
not synchronized with the lookup table, you can
configure the Lookup transformation to rebuild
the lookup cache. - Static cache. You can configure a static, or
read-only, cache for any lookup source. By
default, the PowerCenter Server creates a static
cache. It caches the lookup file or table and
looks up values in the cache for each row that
comes into the transformation. When the lookup
condition is true, the PowerCenter Server returns
a value from the lookup cache. The PowerCenter
Server does not update the cache while it
processes the Lookup transformation. - Dynamic cache. If you want to cache the target
table and insert new rows or update existing rows
in the cache and the target, you can create a
Lookup transformation to use a dynamic cache. The
PowerCenter Server dynamically inserts or updates
data in the lookup cache and passes data to the
target table. You cannot use a dynamic cache with
a flat file lookup. - For example, your lookup table is your target
table. So when you create the Lookup selecting
the dynamic cache what It does is it will lookup
values and if there is no match it will insert
the row in both the target and the lookup cache
(hence the word dynamic cache it builds up as you
go along), or if there is a match it will update
the row in the target. On the other hand Static
caches dont get updated when you do a lookup. - Shared cache. You can share the lookup cache
between multiple transformations. You can share
an unnamed cache between transformations in the
same mapping. You can share a named cache between
transformations in the same or different
mappings.
12(No Transcript)
13(No Transcript)
14(No Transcript)
15Calculating the Lookup Index Cache
- The lookup index cache holds data for the columns
used in the lookup condition. - The formula for calculating the minimum lookup
index cache size is different than calculating
the maximum size. - For best session performance, specify the maximum
lookup index cache size. - Calculating the Minimum Lookup Index Cache
- 200 ( S column size) 16
- ?Columns in lookup condition.
- The minimum size for a lookup index cache is
independent of the number of source rows. - Calculating the Maximum Lookup Index Cache
- rows in lookup table ( S column size) 16
2 - ? Columns in lookup condition.
16Difference between Static and Dynamic Cache
- Static cache
- U can insert rows into the cache as u pass to the
target. - The informatica server returns a value from the
lookup table or cache when the condition is
true.When the condition is not true, informatica
server returns the default value for connected
transformations and null for unconnected
transformations. - You can use a relational or flat file lookup.
- Dynamic cache
- U can not insert or update the cache.
- The informatica server inserts rows into cache
when the condition is false.This indicates that
the the row is not in the cache or target table.
U can pass these rows to the target table - You can use a relational look up only
17(No Transcript)
18- Example
- The Lookup transformation, LKP_PROMOS, looks up
values based on the ITEM_ID. It uses the
following lookup condition - ITEM_ID IN_ITEM_ID1
- ITEM_ID column size ?Column in lookup condition
?integer 16 - The lookup condition uses one column, ITEM_ID,
and the table contains 60,000 rows. - Use the following calculation to determine the
minimum index cache requirements - 200 (16 16) 6,400
- Use the following calculation to determine the
maximum index cache requirements - 60,000 (16 16) 2 3,840,000
- Therefore, this Lookup transformation requires an
index cache size between 6,400 and 3,840,000
bytes.
19Calculating the Lookup Data Cache
- In a connected transformation, the data cache
contains data for the connected output ports, not
including ports used in the lookup condition. In
an unconnected transformation, the data cache
contains data from the return port. - 1) PROMOTION_ID - Connected output port not in
lookup condition Integer -gt 16 - 2) DISCOUNT - Connected output port not in lookup
condition - Decimal ? 16 - The lookup table has 60,000 rows.
- Use the following calculation to determine the
minimum data cache requirements - 60,000 (32 8) 2,400,000
- This Lookup transformation requires a data cache
size of 2,400,000 bytes.
20(No Transcript)
21Aggregator Cache
- When the PowerCenter Server runs a session with
an Aggregator transformation, it stores data in
memory until it completes the aggregation. - If you use incremental aggregation, the
PowerCenter Server saves the cache files in the
cache file directory. - Note The PowerCenter Server uses memory to
process an Aggregator transformation with sorted
ports. It does not use cache memory. You do not
need to configure cache memory for Aggregator
transformations that use sorted ports.
22Configuring the Session fro Incremental
Aggregation
- Use the following guidelines when you configure
the session for incremental aggregation - Verify the location where you want to store the
aggregate files. Configure the session to write
file names in the session log. - If you want the PowerCenter Server to write the
incremental aggregation cache file names in the
session log, configure the session with Verbose
Init tracing. - Verify the incremental aggregation settings in
the session properties. You can configure the
session for incremental aggregation in the
Performance settings on the Properties tab. - You can also configure the session to
reinitialize the aggregate cache. If you choose
to reinitialize the cache, the Workflow Manager
displays a warning indicating the PowerCenter
Server overwrites the existing cache and a
reminder to clear this option after running the
session.To configure a session for incremental
aggregation
23(No Transcript)
24Calculating the Aggregator Index Cache
- The index cache holds group information from the
group by ports. - groups ( S column size) 17
- Columns ? Group by columns
- As per example,
- STORE_ID Integer size ? 6
- ITEM - String size - 18
- Therefore total column size 18 6 24
- Assuming there are 72,000 input rows
- The Min Index Cache calculation is
- 72,000 (24 17) 2,952,000
- The max index cache calculation is double the
amount - 2,952,000 2 5,904,000
- Therefore, this Aggregator transformation
requires an index cache size between - 2,952,000 and 5,904,000 bytes.
25(No Transcript)
26Calculating the Aggregator Data Cache
- The data cache holds row data for variable ports
and connected output ports. As a result, the data
cache is generally larger than the index cache.
To reduce the data cache size, connect only the
necessary input/output ports to subsequent
transformations. Use the following information to
calculate the minimum aggregate data cache size - groups( S column size) 7
- Column size ? a) Non group by input/output ports.
- b) Local variable
ports. - c) Port containing
aggregate - function
(multiply by three). - In the example,
- ORDER_ID Integer ? 6
- SALES_PER_STORE_ITEMS - Decimal ? 30
- Total 36
- The total number of groups as calculated for the
index cache size is 72,000. Use the following
calculation to determine the minimum data cache
requirements - 72,000 (36 7) 3,096,000
- Therefore, this Aggregator transformation
requires a data cache size of 3,096,000 bytes.
27(No Transcript)
28Joiner Cache
- While using joiner cache informatica server first
reads the data from master source and built index
data cache in the master rows. After building
the cache,the PowerCenter Server then performs
the join based on the detail source data and the
cache data. - Server creates the Index cache as it reads the
master source into the data cache. The server
uses the Index cache to test the join condition.
When it finds a match, it retrieves rows values
from the data cache - The PowerCenter Server caches all master rows
with a unique key in the index cache, and all
master rows in the data cache. - For instance,
- Index cache. The PowerCenter Server caches
100 master rows with unique keys. Data cache. The
PowerCenter Server caches the master rows in the
data cache that correspond to the 100 rows in the
index cache. The number of rows it stores in the
data cache depends on the data. For example, if
every master row contains a unique key, the
PowerCenter Server stores 100 rows in the data
cache. However, if the master data contains
multiple rows with the same key, the PowerCenter
Server stores more than 100 rows in the data
cache.
29Joiner Index Cache Calculation
- The index cache holds rows from the master source
that are in the join condition. - master rows ( Sum of column size) 8
- Column Size ?Master column in join condition.
-
- In the example, it joins the sources ORDERS and
PRODUCTS on ITEM_NO - ITEM_NO Decimal(10) ? 16
- PRODUCTS is the master source and has 90,000
rows. Use the following calculation to determine
the minimum index cache requirements - 90,000 (16 16) 2,880,000
- Double the size to determine the maximum index
cache requirements - 2,880,000 2 5,760,000
- Therefore, this Joiner transformation requires an
index cache size between 2,880,000 and 5,760,000
bytes. -
30(No Transcript)
31Joiner Data Cache Calculation
- The data cache holds rows from the master source
until the PowerCenter Server joins the data. - master rows ( S column size) 8
- Column ? Master column not in join condition and
used for output. - In the example , The following figure shows the
connected output ports for JNR_ORDERS_PRODUCTS - ITEM_NAME string ? 32
- PRODUCT CATEGORY decimal ? 30
- Total column size 62
- The master source has 90,000 rows.
- Use the following calculation to determine the
minimum data cache requirements - 90,000 (62 8) 6,300,000
- This Joiner transformation requires a data cache
size of 6,300,000 bytes.
32(No Transcript)
33Rank Caches
- When the PowerCenter Server runs a session with a
Rank transformation, it compares an input row
with rows in the data cache. If the input row
out-ranks a stored row, the PowerCenter Server
replaces the stored row with the input row. - For example, you configure a Rank transformation
to find the top three sales. The PowerCenter
Server reads the following input data - SALES
- 10,000
- 12,210
- 5,000
- 2,455
- 6,324
- The PowerCenter Server caches the first three
rows (10,000, 12,210, and 5,000). When the
PowerCenter Server reads the next row (2,455) it
compares it to the cache values. Since the row is
lower in rank than the cached rows, it discards
the row with 2,455. The next row (6,324),
however, is higher in rank than one of the cached
rows. Therefore, the PowerCenter Server replaces
the cached row with the higher-ranked input row. - If the Rank transformation is configured to rank
across multiple groups, the PowerCenter Server
ranks incrementally for each group it finds.
34Calculating the Rank Index Cache
- The index cache holds group information from the
group by ports. Use the following information to
calculate the minimum rank index cache size - Rank Index Calculation
- groups ( S column size) 17
- Columns ? Group by columns.
- PRODUCT_CATEGORY (string(21)- column size) 24
- There are 10,000 product categories, so the total
number of groups is 10,000. Use the following
calculation to determine the minimum index cache
requirements - 10,000 (24 17) 410,000
- Double the size to determine the maximum index
cache requirements - 410,000 2 820,000
- Therefore, this Rank transformation requires an
index cache size between 410,000 and 820,000
bytes.
35(No Transcript)
36Calculating the Rank Data Cache
- The data cache size is proportional to the number
of ranks. It holds row data until the PowerCenter
Server completes the ranking and is generally
larger than the index cache. To reduce the data
cache size, connect only the necessary
input/output ports to subsequent transformations.
Use the following information to calculate the
minimum rank data cache size - groups ( ranks ( S column size 10)) 20
- ITEM_NO Decimal(10) 10
- ITEM_NAME String(23) 26
- PRICE Decimal (14) 10
- TOTAL COLUMN SIZE 46
- RNK_TOPTEN ranks by price, and the total number
of ranks is 10. The number of groups is 10,000. - Use the following calculation to determine the
minimum data cache requirements - 10,000(10 (46 10)) 20 5,800,000
- This Rank transformation requires a data cache
size of 5,800,000 - bytes.
37(No Transcript)