Welcome to the World of Cache

1 / 37
About This Presentation
Title:

Welcome to the World of Cache

Description:

d) Look ups Stores stores lookup data that is not stored in the index cache. Memory Cache : ... Verify the location where you want to store the aggregate files. ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 38
Provided by: Dab93

less

Transcript and Presenter's Notes

Title: Welcome to the World of Cache


1
Welcome to the World of Cache
2
The Hidden agenda
  • a) Basics of Cache
  • 1) Memory Cache
  • 2) Where the cache files are created
  • 3) Naming Conventions
  • 4) Cache Calculations
  • b) Advanced Cache
  • 1) Look up Cache
  • 2) Aggregator Cache
  • 3) Joiner Cache
  • 4) Ranker Cache

3
Lets get to the Basics
  • Cache is a combination of
  • Index Cache Server stores key values or
    condition values used to index values at a faster
    rate.
  • Data Cache Server stores output values.
  • Caching Storage Overview
  • For Index Cachesa) Aggregators store group by
    values from Group-By ports.b) Rankers store
    Group-By valuesc) Joiners store index values for
    the master (Join condition columns)
  • d) Lookups Stores lookup condition
    information
  • For Data Cachesa) Aggregators store aggregate
    data based on Group-By ports (variable ports,
  • output ports, non group by ports)
  • b) Rankers store ranking based on Group-By port
    (output rows other than ranked column)
  • c) Joiners store master table (Output columns not
    in Join condition).
  • d) Look ups Stores stores lookup data that
    is not stored in the index cache.

4
Memory Cache
  • The server creates a memory cache based on size
    specified in the session properties which can be
    done manually based on certain calculations .
  • By default, the PowerCenter Server allocates 1 GB
    to the index cache and 2GB to the data cache for
    each transformation instance.
  • If the PowerCenter Server cannot allocate the
    configured amount of cache memory, it cannot
    initialize the session and the session fails.
  • If the PowerCenter Server requires more memory
    than the configured cache size, it pages to the
    Disc. Since paging to disk can slow session
    performance, try to configure the index and data
    cache sizes to store data in memory.

5
Where are the Cache Files Created?
  • The PowerCenter Server creates the index and data
    cache files by default in the PowerCenter Server
    variable directory, PMCacheDir.
  • If you do not define PMCacheDir, the PowerCenter
    Server saves the files in the PMCache directory
    specified in the UNIX configuration file or the
    cache directory in the Windows registry. If the
    UNIX PowerCenter Server does not find a directory
    there, it creates the index and data files in the
    installation directory. If the PowerCenter Server
    on Windows does not find a directory there, it
    creates the files in the system directory.
  • If a cache file handles more than 2 GB of data,
    the PowerCenter Server creates multiple index and
    data files. When creating these files, the
    PowerCenter Server appends a number to the end of
    the filename, such as PMAGG.idx1 and
    PMAGG.idx2. The number of index and data files
    are limited only by the amount of disk space
    available in the cache directory.
  • Three Instances when the Cache File exists even
    after Session completion
  • a) The session performs incremental aggregation.
  • b) You configure the Lookup transformation to use
    a persistent cache.
  • c) The session does not complete successfully.

6
Naming convention followed by Informatica
Server
  • ltName Prefixgt ltPrefixgt ltsession
    IDgt_lttransformation IDgt_partition
    indexltsuffixgt.overflow index
  • For example,
  • PMLKUP8_4_2.idx,
  • PMLKUP ? transformation type as Lookup,
  • 8 ? the session ID
  • 4 ? the transformation ID,
  • 2 ? the partition index.

7
(No Transcript)
8
Cache Calculations
  • AggregatorIndex size (Sum of column sizes in
    group-by ports 17) X number of groups.Data
    Size (Sum of column sizes of output ports 7) X
    number of groups.
  • RankIndex size (Sum of column sizes in
    group-by ports 17) X number of groups.Data
    Size (Sum of column sizes of output ports 10)
    X number of groups 20.
  • JoinerIndex Size (Sum of master column sizes
    in join condition 16) X number rows in master
    table.Data Size (Sum of master column sizes NOT
    in join condition but on output ports 8)X
    number of rows in master table
  • LookUp
  • Index Size rows in lookup table ( S column
    size) 16 2
  • Data Size rows in lookup table ( S column
    size) 8

9
(No Transcript)
10
Lookup Caches Overview
  • The PowerCenter Server builds a cache in memory
    when it processes the first row of data in a
    cached Lookup transformation
  • It allocates memory for the cache based on the
    amount you configure in the transformation or
    session properties.
  • The PowerCenter Server stores condition values in
    the index cache and output values in the data
    cache
  • The PowerCenter Server queries the cache for each
    row that enters the transformation.
  • The PowerCenter Server also creates cache files
    by default in the PMCacheDir
  • If the data does not fit in the memory cache, the
    PowerCenter Server stores the overflow values in
    the cache files. When the session completes, the
    PowerCenter Server releases cache memory and
    deletes the cache files unless you configure the
    Lookup transformation to use a persistent cache.

11
Types of Lookup Cache
  • When configuring a lookup cache, you can specify
    any of the following options
  • Persistent cache. You can save the lookup cache
    files and reuse them the next time the
    PowerCenter Server processes a Lookup
    transformation configured to use the cache
  • Recache from source. If the persistent cache is
    not synchronized with the lookup table, you can
    configure the Lookup transformation to rebuild
    the lookup cache.
  • Static cache. You can configure a static, or
    read-only, cache for any lookup source. By
    default, the PowerCenter Server creates a static
    cache. It caches the lookup file or table and
    looks up values in the cache for each row that
    comes into the transformation. When the lookup
    condition is true, the PowerCenter Server returns
    a value from the lookup cache. The PowerCenter
    Server does not update the cache while it
    processes the Lookup transformation.
  • Dynamic cache. If you want to cache the target
    table and insert new rows or update existing rows
    in the cache and the target, you can create a
    Lookup transformation to use a dynamic cache. The
    PowerCenter Server dynamically inserts or updates
    data in the lookup cache and passes data to the
    target table. You cannot use a dynamic cache with
    a flat file lookup.
  • For example, your lookup table is your target
    table. So when you create the Lookup selecting
    the dynamic cache what It does is it will lookup
    values and if there is no match it will insert
    the row in both the target and the lookup cache
    (hence the word dynamic cache it builds up as you
    go along), or if there is a match it will update
    the row in the target. On the other hand Static
    caches dont get updated when you do a lookup.
  • Shared cache. You can share the lookup cache
    between multiple transformations. You can share
    an unnamed cache between transformations in the
    same mapping. You can share a named cache between
    transformations in the same or different
    mappings.

12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Calculating the Lookup Index Cache
  • The lookup index cache holds data for the columns
    used in the lookup condition.
  • The formula for calculating the minimum lookup
    index cache size is different than calculating
    the maximum size.
  • For best session performance, specify the maximum
    lookup index cache size.
  • Calculating the Minimum Lookup Index Cache
  • 200 ( S column size) 16
  • ?Columns in lookup condition.
  • The minimum size for a lookup index cache is
    independent of the number of source rows.
  • Calculating the Maximum Lookup Index Cache
  • rows in lookup table ( S column size) 16
    2
  • ? Columns in lookup condition.

16
Difference between Static and Dynamic Cache
  • Static cache
  • U can insert rows into the cache as u pass to the
    target.
  • The informatica server returns a value from the
    lookup table or cache when the condition is
    true.When the condition is not true, informatica
    server returns the default value for connected
    transformations and null for unconnected
    transformations.
  • You can use a relational or flat file lookup.
  • Dynamic cache
  • U can not insert or update the cache.
  • The informatica server inserts rows into cache
    when the condition is false.This indicates that
    the the row is not in the cache or target table.
    U can pass these rows to the target table
  • You can use a relational look up only

17
(No Transcript)
18
  • Example
  • The Lookup transformation, LKP_PROMOS, looks up
    values based on the ITEM_ID. It uses the
    following lookup condition
  • ITEM_ID IN_ITEM_ID1
  • ITEM_ID column size ?Column in lookup condition
    ?integer 16
  • The lookup condition uses one column, ITEM_ID,
    and the table contains 60,000 rows.
  • Use the following calculation to determine the
    minimum index cache requirements
  • 200 (16 16) 6,400
  • Use the following calculation to determine the
    maximum index cache requirements
  • 60,000 (16 16) 2 3,840,000
  • Therefore, this Lookup transformation requires an
    index cache size between 6,400 and 3,840,000
    bytes.

19
Calculating the Lookup Data Cache
  • In a connected transformation, the data cache
    contains data for the connected output ports, not
    including ports used in the lookup condition. In
    an unconnected transformation, the data cache
    contains data from the return port.
  • 1) PROMOTION_ID - Connected output port not in
    lookup condition Integer -gt 16
  • 2) DISCOUNT - Connected output port not in lookup
    condition - Decimal ? 16
  • The lookup table has 60,000 rows.
  • Use the following calculation to determine the
    minimum data cache requirements
  • 60,000 (32 8) 2,400,000
  • This Lookup transformation requires a data cache
    size of 2,400,000 bytes.

20
(No Transcript)
21
Aggregator Cache
  • When the PowerCenter Server runs a session with
    an Aggregator transformation, it stores data in
    memory until it completes the aggregation.
  • If you use incremental aggregation, the
    PowerCenter Server saves the cache files in the
    cache file directory.
  • Note The PowerCenter Server uses memory to
    process an Aggregator transformation with sorted
    ports. It does not use cache memory. You do not
    need to configure cache memory for Aggregator
    transformations that use sorted ports.

22
Configuring the Session fro Incremental
Aggregation
  • Use the following guidelines when you configure
    the session for incremental aggregation
  • Verify the location where you want to store the
    aggregate files. Configure the session to write
    file names in the session log.
  • If you want the PowerCenter Server to write the
    incremental aggregation cache file names in the
    session log, configure the session with Verbose
    Init tracing.
  • Verify the incremental aggregation settings in
    the session properties. You can configure the
    session for incremental aggregation in the
    Performance settings on the Properties tab.
  • You can also configure the session to
    reinitialize the aggregate cache. If you choose
    to reinitialize the cache, the Workflow Manager
    displays a warning indicating the PowerCenter
    Server overwrites the existing cache and a
    reminder to clear this option after running the
    session.To configure a session for incremental
    aggregation

23
(No Transcript)
24
Calculating the Aggregator Index Cache
  • The index cache holds group information from the
    group by ports.
  • groups ( S column size) 17
  • Columns ? Group by columns
  • As per example,
  • STORE_ID Integer size ? 6
  • ITEM - String size - 18
  • Therefore total column size 18 6 24
  • Assuming there are 72,000 input rows
  • The Min Index Cache calculation is
  • 72,000 (24 17) 2,952,000
  • The max index cache calculation is double the
    amount
  • 2,952,000 2 5,904,000
  • Therefore, this Aggregator transformation
    requires an index cache size between
  • 2,952,000 and 5,904,000 bytes.

25
(No Transcript)
26
Calculating the Aggregator Data Cache
  • The data cache holds row data for variable ports
    and connected output ports. As a result, the data
    cache is generally larger than the index cache.
    To reduce the data cache size, connect only the
    necessary input/output ports to subsequent
    transformations. Use the following information to
    calculate the minimum aggregate data cache size
  • groups( S column size) 7
  • Column size ? a) Non group by input/output ports.
  • b) Local variable
    ports.
  • c) Port containing
    aggregate
  • function
    (multiply by three).
  • In the example,
  • ORDER_ID Integer ? 6
  • SALES_PER_STORE_ITEMS - Decimal ? 30
  • Total 36
  • The total number of groups as calculated for the
    index cache size is 72,000. Use the following
    calculation to determine the minimum data cache
    requirements
  • 72,000 (36 7) 3,096,000
  • Therefore, this Aggregator transformation
    requires a data cache size of 3,096,000 bytes.

27
(No Transcript)
28
Joiner Cache
  • While using joiner cache informatica server first
    reads the data from master source and built index
    data cache in the master rows. After building
    the cache,the PowerCenter Server then performs
    the join based on the detail source data and the
    cache data.
  • Server creates the Index cache as it reads the
    master source into the data cache. The server
    uses the Index cache to test the join condition.
    When it finds a match, it retrieves rows values
    from the data cache
  • The PowerCenter Server caches all master rows
    with a unique key in the index cache, and all
    master rows in the data cache.
  • For instance,
  • Index cache. The PowerCenter Server caches
    100 master rows with unique keys. Data cache. The
    PowerCenter Server caches the master rows in the
    data cache that correspond to the 100 rows in the
    index cache. The number of rows it stores in the
    data cache depends on the data. For example, if
    every master row contains a unique key, the
    PowerCenter Server stores 100 rows in the data
    cache. However, if the master data contains
    multiple rows with the same key, the PowerCenter
    Server stores more than 100 rows in the data
    cache.

29
Joiner Index Cache Calculation
  • The index cache holds rows from the master source
    that are in the join condition.
  • master rows ( Sum of column size) 8
  • Column Size ?Master column in join condition.
  • In the example, it joins the sources ORDERS and
    PRODUCTS on ITEM_NO
  • ITEM_NO Decimal(10) ? 16
  • PRODUCTS is the master source and has 90,000
    rows. Use the following calculation to determine
    the minimum index cache requirements
  • 90,000 (16 16) 2,880,000
  • Double the size to determine the maximum index
    cache requirements
  • 2,880,000 2 5,760,000
  • Therefore, this Joiner transformation requires an
    index cache size between 2,880,000 and 5,760,000
    bytes.

30
(No Transcript)
31
Joiner Data Cache Calculation
  • The data cache holds rows from the master source
    until the PowerCenter Server joins the data.
  • master rows ( S column size) 8
  • Column ? Master column not in join condition and
    used for output.
  • In the example , The following figure shows the
    connected output ports for JNR_ORDERS_PRODUCTS
  • ITEM_NAME string ? 32
  • PRODUCT CATEGORY decimal ? 30
  • Total column size 62
  • The master source has 90,000 rows.
  • Use the following calculation to determine the
    minimum data cache requirements
  • 90,000 (62 8) 6,300,000
  • This Joiner transformation requires a data cache
    size of 6,300,000 bytes.

32
(No Transcript)
33
Rank Caches
  • When the PowerCenter Server runs a session with a
    Rank transformation, it compares an input row
    with rows in the data cache. If the input row
    out-ranks a stored row, the PowerCenter Server
    replaces the stored row with the input row.
  • For example, you configure a Rank transformation
    to find the top three sales. The PowerCenter
    Server reads the following input data
  • SALES
  • 10,000
  • 12,210
  • 5,000
  • 2,455
  • 6,324
  • The PowerCenter Server caches the first three
    rows (10,000, 12,210, and 5,000). When the
    PowerCenter Server reads the next row (2,455) it
    compares it to the cache values. Since the row is
    lower in rank than the cached rows, it discards
    the row with 2,455. The next row (6,324),
    however, is higher in rank than one of the cached
    rows. Therefore, the PowerCenter Server replaces
    the cached row with the higher-ranked input row.
  • If the Rank transformation is configured to rank
    across multiple groups, the PowerCenter Server
    ranks incrementally for each group it finds.

34
Calculating the Rank Index Cache
  • The index cache holds group information from the
    group by ports. Use the following information to
    calculate the minimum rank index cache size
  • Rank Index Calculation
  • groups ( S column size) 17
  • Columns ? Group by columns.
  • PRODUCT_CATEGORY (string(21)- column size) 24
  • There are 10,000 product categories, so the total
    number of groups is 10,000. Use the following
    calculation to determine the minimum index cache
    requirements
  • 10,000 (24 17) 410,000
  • Double the size to determine the maximum index
    cache requirements
  • 410,000 2 820,000
  • Therefore, this Rank transformation requires an
    index cache size between 410,000 and 820,000
    bytes.

35
(No Transcript)
36
Calculating the Rank Data Cache
  • The data cache size is proportional to the number
    of ranks. It holds row data until the PowerCenter
    Server completes the ranking and is generally
    larger than the index cache. To reduce the data
    cache size, connect only the necessary
    input/output ports to subsequent transformations.
    Use the following information to calculate the
    minimum rank data cache size
  • groups ( ranks ( S column size 10)) 20
  • ITEM_NO Decimal(10) 10
  • ITEM_NAME String(23) 26
  • PRICE Decimal (14) 10
  • TOTAL COLUMN SIZE 46
  • RNK_TOPTEN ranks by price, and the total number
    of ranks is 10. The number of groups is 10,000.
  • Use the following calculation to determine the
    minimum data cache requirements
  • 10,000(10 (46 10)) 20 5,800,000
  • This Rank transformation requires a data cache
    size of 5,800,000
  • bytes.

37
(No Transcript)
Write a Comment
User Comments (0)