session properties -> general tab -> treat input link as 'AND'/'OR'
session partition with aggregator transformation
preview data
********************************************************
High availability. You can use the high availability option to eliminate single
points of
failure in the PowerCenter environment and reduce service interruptions in the e
vent of
failure. High availability provides resilience, failover, and recovery for servi
ces.
infacmd
infasetup
pmrep
Versioned Objects
Workflow recovery
Custom transformation. The Custom transformation has the following enhancements:
Procedures with thread-specific operations.
Java transformation.
***********************************************************
Error codes
BR - error related reader process, including ERP, flat file, relation sources
CMN - error related databases, memory allocation, lookup, joiner and internal er
rors
DBGR - error related to debugger
SE, TM, WID - errors related to transformations
PMF - error related caching in aggregator, lookup, joiner, rank
RR - error related to relational sources
EP - error related to external procedure
LM - error related Load Manager
REP - error related to repository functions
VAR - error related to mapping variable
WRT - error related to writer
***********************************************************
Netezza and oracle - rownum and limit
************************************************************
Aggregator - Active & Connected
source qualifier - Active & Connected
Filter - Active & Connected
Expression - Passive & Connected
joiner - Active & Connected
lookup - passive & connected/unconnected
HTTP - passive & connected
normalizer - active & connected
rank - active & connected
router - active & connected
sequence - passive & connected
sorter - active & connected
stored procedure - passive & connected/unconnected
union - active & connected
*************************************************************************8
A transformation is a repository object that generates, modifies, or passes dat
a. The Designer provides a set of transformations that perform specific function
s. For example, an Aggregator transformation performs calculations on groups of
data.
Lookup transformation:
be sure to delete the unwanted columns from the lookup as they affect the loo
kup cache very much.
if the Lookup transformation is after the source qualifier and there is no ac
tive transformation in-between, you can as well go for the SQL over ride of sour
ce qualifier
The cache that you assigned for the lookup is not sufficient to hold the data
or index of the lookup. Whatever data that doesn't fit into the cache is spilt
into the cache files designated in $PMCacheDir. When the PowerCenter doesn't fin
d the data you are lookingup in the cache, it swaps the data from the file to th
e cache and keeps doing this until it finds the data. This is quite expensive fo
r obvious reasons being an I/O operation. Increase the cache so that the whol
e data resides in the memory
Sequential and Concurrent caches: The 8.x version of PowerCenter gives us this w
onderful option to build the caches of the lookups either concurrently or in a s
equential manner depending on the business rule. If no business rule dictates ot
herwise, concurrent cache building is a very handy option.
[HINTS]
Difference b/w Aggregator and Expression Transformation? Expression transformati
on permits you to perform calculations row by row basis only. In Aggregator you
can perform calculations on groups.
HTTP Transformation
Passive & Connected. It allows you to connect to an HTTP server to use its servi
ces and applications. With an HTTP transformation, the Integration Service conne
cts to the HTTP server, and issues a request to retrieves data or posts data to
the target or downstream transformation in the mapping.
***********************************************************************888
****************************************************************************
Q. What type of repositories can be created using Informatica Repository Manager
?
A. Informatica PowerCenter includeds following type of repositories :
Standalone Repository : A repository that functions individually and this is unr
elated to any other repositories.
Global Repository : This is a centralized repository in a domain. This repositor
y can contain shared objects across the repositories in a domain. The objects ar
e shared through global shortcuts.
Local Repository : Local repository is within a domain and its not a global repos
itory. Local repository can connect to a global repository using global shortcut
s and can use objects in its shared folders.
Versioned Repository : This can either be local or global repository but it allo
ws version control for the repository. A versioned repository can store multiple
copies, or versions of an object. This features allows to efficiently develop,
test and deploy metadata in the production environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or more l
anguages. The code page is selected based on source of the data. For example if
source contains Japanese text then the code page should be selected to support J
apanese text.
When a code page is chosen, the program or application for which the code page i
s set, refers to a specific set of data that describes the characters the applic
ation recognizes. This influences the way that application stores, receives, and
sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to?
A. PowerCenter Server on Windows can connect to following databases:
IBM DB2
Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to?
A. PowerCenter Server on UNIX can connect to following databases:
IBM DB2
Informix
Oracle
Sybase
Teradata
Infomratica Mapping Designer
Q. How to execute PL/SQL script from Informatica mapping?
A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. I
n SP Transformation PL/SQL procedure name can be specified. Whenever the session
is executed, the session will call the pl/sql procedure.
Q. How can you define a transformation? What are different types of transformati
ons available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes d
ata. The Designer provides a set of transformations that perform specific functi
ons. For example, an Aggregator transformation performs calculations on groups o
f data. Below are the various transformations available in Informatica:
Aggregator
Application Source Qualifier
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Output
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a
relational or flat file source when it runs a session. When a relational or a f
lat file source definition is added to a mapping, it is connected to a Source Qu
alifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation wh
enever it runs the session. The default query is SELET statement containing all
the source columns. Source Qualifier has capability to override this default que
ry by changing the default settings of the transformation properties. The list o
f selected ports or the order they appear in the default query should not be cha
nged in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such
as averages and sums. Unlike Expression Transformation, the Aggregator transform
ation can only be used to perform calculations on groups. The Expression transfo
rmation permits calculations on a row-by-row basis only.
Aggregator Transformation contains group by ports that indicate how to group the
data. While grouping the data, the aggregator transformation outputs the last r
ow of each group unless otherwise specified in the transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAS
T, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the ses
sion option for Incremental Aggregation can be enabled. When PowerCenter perform
s incremental aggregation, it passes new source data through the mapping and use
s historical cache data to perform new aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be
used to merge data from various sources (or pipelines). This transformation wor
ks just like UNION ALL statement in SQL, that is used to combine result set of t
wo SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file source
s.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational tab
le, view or synonym. It compares lookup transformation ports (input ports) to th
e source column values based on the lookup condition. Later returned values can
be passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is the difference between a connected look up and unconnected look up?
A. Connected lookup takes input values directly from other transformations in th
e pipleline.
Unconnected lookup doesnt take inputs directly from any other transformation, but
it can be used in any transformation (like expression) and can be invoked as a
function using :LKP expression. So, an unconnected lookup can be called multiple
times in a mapping.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The ma
pplet contains set of transformations and it allows us to reuse that transformat
ion logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusabl
e transformation is stored as a metadata separate from any other mapping that us
es the transformation. Whenever any changes to a reusable transformation are mad
e, all the mappings where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is mar
ked to be inserted in the target table. If the row has to be updated/inserted ba
sed on some logic Update Strategy transformation is used. The condition can be s
pecified in Update Strategy to mark the processed row for update or insert.
Following options are available for update strategy :
DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equ
ivalent numeric value of DD_INSERT is 0.
DD_UPDATE : If this is used the Update Strategy flags the row for update. Equiva
lent numeric value of DD_UPDATE is 1.
DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equi
valent numeric value of DD_DELETE is 2.
DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equ
ivalent numeric value of DD_REJECT is 3.
*******************************************************8
SESSION LOGS
Information that reside in a session log:
- Allocation of system shared memory
- Execution of Pre-session commands/ Post-session commands
- Session Initialization
- Creation of SQL commands for reader/writer threads
- Start/End timings for target loading
- Error encountered during session
- Load summary of Reader/Writer/ DTM statistics
Other Information
- By default, the server generates log files based on the server code page.
Thread Identifier
Ex: CMN_1039
Reader and Writer thread codes have 3 digit and Transformation codes have 4 digi
ts.
The number following a thread name indicate the following:
(a) Target load order group number
(b) Source pipeline number
(c) Partition number
(d) Aggregate/ Rank boundary number
Log File Codes
Error Codes Description
BR - Related to reader process, including ERP, relational and flat file.
CMN - Related to database, memory allocation
DBGR - Related to debugger
EP- External Procedure
LM - Load Manager
TM - DTM
REP - Repository
WRT - Writer
Load Summary
(a) Inserted
(b) Updated
(c) Deleted
(d) Rejected
Statistics details
(a) Requested rows shows the no of rows the writer actually received for the
specified operation
(b) Applied rows shows the number of rows the writer successfully applied to
the target (Without Error)
(c) Rejected rows show the no of rows the writer could not apply to the targe
t
(d) Affected rows shows the no of rows affected by the specified operation
Detailed transformation statistics
The server reports the following details for each transformation in the mapping
(a) Name of Transformation
(b) No of I/P rows and name of the Input source
(c) No of O/P rows and name of the output target
(d) No of rows dropped
Tracing Levels
Normal - Initialization and status information, Errors encountered, Transf
ormation errors, rows skipped, summarize session details (Not at the level of in
dividual rows)
Terse - Initialization information as well as error messages, and notifica
tion of rejected data
Verbose Init - Addition to normal tracing, Names of Index, Data files used
and detailed transformation statistics.
Verbose Data - Addition to Verbose Init, Each row that passes in to mappin
g detailed transformation statistics.
NOTE
When you enter tracing level in the session property sheet, you override tracing
levels configured for transformations in the mapping.
MULTIPLE SERVERS
With Power Center, we can register and run multiple servers against a local or g
lobal repository. Hence you can distribute the repository session load across av
ailable servers to improve overall performance. (You can use only one Power Mart
server in a local repository)
Issues in Server Organization
- Moving target database into the appropriate server machine may improve effi
ciency
- All Sessions/Batches using data from other sessions/batches need to use the
same server and be incorporated into the same batch.
- Server with different speed/sizes can be used for handling most complicated
sessions.
Session/Batch Behavior
- By default, every session/batch run on its associated Informatica server. T
hat is selected in property sheet.
- In batches, that contain sessions with various servers, the property goes t
o the servers, that?s of outer most batch.
Session Failures and Recovering Sessions
Two types of errors occurs in the server
- Non-Fatal
- Fatal
(a) Non-Fatal Errors
It is an error that does not force the session to stop on its first occurrence.
Establish the error threshold in the session property sheet with the stop on opt
ion. When you enable this option, the server counts Non-Fatal errors that occur
in the reader, writer and transformations.
Reader errors can include alignment errors while running a session in Unicode mo
de.
Writer errors can include key constraint violations, loading NULL into the NOT-N
ULL field and database errors.
Transformation errors can include conversion errors and any condition set up as
an ERROR,. Such as NULL Input.
(b) Fatal Errors
This occurs when the server can not access the source, target or repository. Thi
s can include loss of connection or target database errors, such as lack of data
base space to load data.
If the session uses normalizer (or) sequence generator transformations, the serv
er can not update the sequence values in the repository, and a fatal error occur
s.
© Others
Usages of ABORT function in mapping logic, to abort a session when the server en
counters a transformation error.
Stopping the server using pmcmd (or) Server Manager
Performing Recovery
- When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY t
able and notes the rowid of the last row commited to the target database. The se
rver then reads all sources again and starts processing from the next rowid.
- By default, perform recovery is disabled in setup. Hence it won?t make entr
ies in OPB_SRVR_RECOVERY table.
- The recovery session moves through the states of normal session schedule, w
aiting to run, Initializing, running, completed and failed. If the initial recov
ery fails, you can run recovery as many times.
- The normal reject loading process can also be done in session recovery proc
ess.
- The performance of recovery might be low, if
o Mapping contain mapping variables
o Commit interval is high
Un recoverable Sessions
Under certain circumstances, when a session does not complete, you need to trunc
ate the target and run the session from the beginning.
Commit Intervals
A commit interval is the interval at which the server commits data to relational
targets during a session.
(a) Target based commit
- Server commits data based on the no of target rows and the key constraints
on the target table. The commit point also depends on the buffer block size and
the commit interval.
- During a session, the server continues to fill the writer buffer, after it
reaches the commit interval. When the buffer block is full, the Informatica serv
er issues a commit command. As a result, the amount of data committed at the com
mit point generally exceeds the commit interval.
- The server commits data to each target based on primary ?foreign key constr
aints.
(b) Source based commit
- Server commits data based on the number of source rows. The commit point is
the commit interval you configure in the session properties.
- During a session, the server commits data to the target based on the number
of rows from an active source in a single pipeline. The rows are referred to as
source rows.
- A pipeline consists of a source qualifier and all the transformations and t
argets that receive data from source qualifier.
- Although the Filter, Router and Update Strategy transformations are active
transformations, the server does not use them as active sources in a source base
d commit session.
- When a server runs a session, it identifies the active source for each pipe
line in the mapping. The server generates a commit row from the active source at
every commit interval.
- When each target in the pipeline receives the commit rows the server perfor
ms the commit.
Reject Loading
During a session, the server creates a reject file for each target instance in t
he mapping. If the writer of the target rejects data, the server writers the rej
ected row into the reject file.
You can correct those rejected data and re-load them to relational targets, usin
g the reject loading utility. (You cannot load rejected data into a flat file ta
rget)
Each time, you run a session, the server appends a rejected data to the reject f
ile.
Locating the BadFiles
$PMBadFileDir
Filename.bad
When you run a partitioned session, the server creates a separate reject file fo
r each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
(a) Row indicator
Row indicator tells the writer, what to do with the row of wrong data.
Row indicator Meaning Rejected By
0 Insert Writer or target
1 Update Writer or target
2 Delete Writer or target
3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy
expression marked it for reject.
(b) Column indicator
Column indicator is followed by the first column of data, and another column ind
icator. They appears after every column of data and define the type of data prec
eding it
Column Indicator Meaning Writer Treats as
D Valid Data Good Data. The target accepts
it unless a database error
occurs, such as finding
duplicate key.
? Overflow Bad Data.
N Null Bad Data.
T Truncated Bad Data
NOTE
NULL columns appear in the reject file with commas marking their column.
Correcting Reject File
Use the reject file and the session log to determine the cause for rejected data
.
Keep in mind that correcting the reject file does not necessarily correct the so
urce of the reject.
Correct the mapping and target database to eliminate some of the rejected data w
hen you run the session again.
Trying to correct target rejected rows before correcting writer rejected rows is
not recommended since they may contain misleading column indicator.
For example, a series of ?N? indicator might lead you to believe the target data
base does not accept NULL values, so you decide to change those NULL values to Z
ero.
However, if those rows also had a 3 in row indicator. Column, the row was reject
ed b the writer because of an update strategy expression, not because of a targe
t database restriction.
If you try to load the corrected file to target, the writer will again reject th
ose rows, and they will contain inaccurate 0 values, in place of NULL values.
Why writer can reject ?
- Data overflowed column constraints
- An update strategy expression
Why target database can Reject ?
- Data contains a NULL column
- Database errors, such as key violations
Steps for loading reject file:
- After correcting the rejected data, rename the rejected file to reject_file
.in
- The rejloader used the data movement mode configured for the server. It als
o used the code page of server/OS. Hence do not change the above, in middle of t
he reject loading
- Use the reject loader utility
Pmrejldr pmserver.cfg [folder name] [session name]
Other points
The server does not perform the following option, when using reject loader
(a) Source base commit
(b) Constraint based loading
(c) Truncated target table
(d) FTP targets
(e) External Loading
Multiple reject loaders
You can run the session several times and correct rejected data from the several
session at once. You can correct and load all of the reject files at once, or w
ork on one or two reject files, load then and work on the other at a later time.
External Loading
You can configure a session to use Sybase IQ, Teradata and Oracle external loade
rs to load session target files into the respective databases.
The External Loader option can increase session performance since these database
s can load information directly from files faster than they can the SQL commands
to insert the same data into the database.
Method:
When a session used External loader, the session creates a control file and targ
et flat file. The control file contains information about the target flat file,
such as data format and loading instruction for the External Loader. The control
file has an extension of ?*.ctl ? and you can view the file in $PmtargetFilesDi
r.
For using an External Loader:
The following must be done:
- configure an external loader connection in the server manager
- Configure the session to write to a target flat file local to the server.
- Choose an external loader connection for each target file in session proper
ty sheet.
Issues with External Loader:
- Disable constraints
- Performance issues
o Increase commit intervals
o Turn off database logging
- Code page requirements
- The server can use multiple External Loader within one session (Ex: you are
having a session with the two target files. One with Oracle External Loader and
another with Sybase External Loader)
Other Information:
- The External Loader performance depends upon the platform of the server
- The server loads data at different stages of the session
- The serve writes External Loader initialization and completing messaging in
the session log. However, details about EL performance, it is generated at EL l
og, which is getting stored as same target directory.
- If the session contains errors, the server continues the EL process. If the
session fails, the server loads partial target data using EL.
- The EL creates a reject file for data rejected by the database. The reject
file has an extension of ?*.ldr? reject.
- The EL saves the reject file in the target file directory
- You can load corrected data from the file, using database reject loader, an
d not through Informatica reject load utility (For EL reject file only)
Configuring EL in session
- In the server manager, open the session property sheet
- Select File target, and then click flat file options
Caches
- server creates index and data caches in memory for aggregator ,rank ,joiner
and Lookup transformation in a mapping.
- Server stores key values in index caches and output values in data caches :
if the server requires more memory ,it stores overflow values in cache files .
- When the session completes, the server releases caches memory, and in most
circumstances, it deletes the caches files .
- Caches Storage overflow :
- releases caches memory, and in most circumstances, it deletes the caches fi
les .
Caches Storage overflow :
Transformation index cache data cache
Aggregator stores group values stores calculations
As configured in the based on Group-by ports
Group-by ports.
Rank stores group values as stores ranking information
Configured in the Group-by based on Group-by ports .
Joiner stores index values for stores master source rows .
The master source table
As configured in Joiner condition.
Lookup stores Lookup condition stores lookup data that?s
Information. Not stored in the index cache.
Determining cache requirements
To calculate the cache size, you need to consider column and row requirements as
well as processing overhead.
- server requires processing overhead to cache data and index information.
Column overhead includes a null indicator, and row overhead can include row to k
ey information.
Steps:
- first, add the total column size in the cache to the row overhead.
- Multiply the result by the no of groups (or) rows in the cache this gives t
he minimum cache requirements .
- For maximum requirements, multiply min requirements by 2.
Location:
? by default , the server stores the index and data files in the directory $P
MCacheDir.
? the server names the index files PMAGG*.idx and data files PMAGG*.dat. if t
he size exceeds 2GB,you may find multiple index and data files in the directory
.The server appends a number to the end of filename(PMAGG*.id*1,id*2,etc).
Aggregator Caches
? when server runs a session with an aggregator transformation, it stores dat
a in memory until it completes the aggregation.
? when you partition a source, the server creates one memory cache and one di
sk cache and one and disk cache for each partition .It routes data from one part
ition to another based on group key values of the transformation.
? server uses memory to process an aggregator transformation with sort ports.
It doesn?t use cache memory .you don?t need to configure the cache memory, that
use sorted ports.
Index cache:
#Groups (( column size) + 7)
Aggregate data cache:
#Groups (( column size) + 7)
Rank Cache
- when the server runs a session with a Rank transformation, it compares an i
nput row with rows with rows in data cache. If the input row out-ranks a stored
row,the Informatica server replaces the stored row with the input row.
- If the rank transformation is configured to rank across multiple groups, th
e server ranks incrementally for each group it finds .
Index Cache :
#Groups (( column size) + 7)
Rank Data Cache:
#Group [(#Ranks * ( column size + 10)) + 20]
Joiner Cache:
- When server runs a session with joiner transformation, it reads all rows fr
om the master source and builds memory caches based on the master rows.
- After building these caches, the server reads rows from the detail source a
nd performs the joins
- Server creates the Index cache as it reads the master source into the data
cache. The server uses the Index cache to test the join condition. When it finds
a match, it retrieves rows values from the data cache.
- To improve joiner performance, the server aligns all data for joiner cache
or an eight byte boundary.
Index Cache :
#Master rows [( column size) + 16)
Joiner Data Cache:
#Master row [( column size) + 8]
Lookup cache:
- When server runs a lookup transformation, the server builds a cache in memo
ry, when it process the first row of data in the transformation.
- Server builds the cache and queries it for the each row that enters the tra
nsformation.
- If you partition the source pipeline, the server allocates the configured a
mount of memory for each partition. If two lookup transformations share the cach
e, the server does not allocate additional memory for the second lookup transfor
mation.
- The server creates index and data cache files in the lookup cache drectory
and used the server code page to create the files.
Index Cache :
#Rows in lookup table [( column size) + 16)
Lookup Data Cache:
#Rows in lookup table [( column size) + 8]
Transformations
A transformation is a repository object that generates, modifies or passes data.
(a) Active Transformation:
a. Can change the number of rows, that passes through it (Filter, Normalizer,
Rank ..)
(b) Passive Transformation:
a. Does not change the no of rows that passes through it (Expression, lookup
..)
NOTE:
- Transformations can be connected to the data flow or they can be unconnecte
d
- An unconnected transformation is not connected to other transformation in t
he mapping. It is called with in another transformation and returns a value to t
hat transformation
Reusable Transformations:
When you are using reusable transformation to a mapping, the definition of trans
formation exists outside the mapping while an instance appears with mapping.
All the changes you are making in transformation will immediately reflect in ins
tances.
You can create reusable transformation by two methods:
(a) Designing in transformation developer
(b) Promoting a standard transformation
Change that reflects in mappings are like expressions. If port name etc. are cha
nges they won?t reflect.
Handling High-Precision Data:
- Server process decimal values as doubles or decimals.
- When you create a session, you choose to enable the decimal data type or le
t the server process the data as double (Precision of 15)
Example:
- You may have a mapping with decimal (20,0) that passes through. The value m
ay be 40012030304957666903.
If you enable decimal arithmetic, the server passes the number as it is. If you
do not enable decimal arithmetic, the server passes 4.00120303049577 X 1019.
If you want to process a decimal value with a precision greater than 28 digits,
the server automatically treats as a double value.
Mapplets
When the server runs a session using a mapplets, it expands the mapplets. The se
rver then runs the session as it would any other sessions, passing data through
each transformations in the mapplet.
If you use a reusable transformation in a mapplet, changes to these can invalida
te the mapplet and every mapping using the mapplet.
You can create a non-reusable instance of a reusable transformation.
Mapplet Objects:
(a) Input transformation
(b) Source qualifier
(c) Transformations, as you need
(d) Output transformation
Mapplet Won?t Support:
- Joiner
- Normalizer
- Pre/Post session stored procedure
- Target definitions
- XML source definitions
Types of Mapplets:
(a) Active Mapplets - Contains one or more active transformations
(b) Passive Mapplets - Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to
the original, the copy does not inherit your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Default value for I/P port - NULL
Default value for O/P port - ERROR
Default value for variables - Does not support default values
Session Parameters
This parameter represent values you might want to change between sessions, such
as DB Connection or source file.
We can use session parameter in a session property sheet, then define the parame
ters in a session parameter file.
The user defined session parameter are:
(a) DB Connection
(b) Source File directory
(c) Target file directory
(d) Reject file directory
Description:
Use session parameter to make sessions more flexible. For example, you have the
same type of transactional data written to two different databases, and you use
the database connections TransDB1 and TransDB2 to connect to the databases. You
want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database
connection parameter, like $DBConnectionSource, and use it as the source databa
se connection for the session.
When you create a parameter file for the session, you set $DBConnectionSource to
TransDB1 and run the session. After it completes set the value to TransDB2 and
run the session again.
NOTE:
You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a val
ue for a session parameter, it fails to initialize the session.
Session Parameter File
- A parameter file is created by text editor.
- In that, we can specify the folder and session name, then list the paramete
rs and variables used in the session and assign each value.
- Save the parameter file in any directory, load to the server
- We can define following values in a parameter
o Mapping parameter
o Mapping variables
o Session parameters
- You can include parameter and variable information for more than one sessio
n in a single parameter file by creating separate sections, for each session wit
h in the parameter file.
- You can override the parameter file for sessions contained in a batch by us
ing a batch parameter file. A batch parameter file has the same format as a sess
ion parameter file
Locale
Informatica server can transform character data in two modes
(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data
(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional ch
ecks at session level, to ensure data integrity.
Code pages contains the encoding to specify characters in a set of one or more l
anguages. We can select a code page, based on the type of character data in the
mappings.
Compatibility between code pages is essential for accurate data movement.
The various code page components are
- Operating system Locale settings
- Operating system code page
- Informatica server data movement mode
- Informatica server code page
- Informatica repository code page
Locale
(a) System Locale - System Default
(b) User locale - setting for date, time, display
© Input locale
Mapping Parameter and Variables
These represent values in mappings/mapplets.
If we declare mapping parameters and variables in a mapping, you can reuse a map
ping by altering the parameter and variable values of the mappings in the sessio
n.
This can reduce the overhead of creating multiple mappings when only certain att
ributes of mapping needs to be changed.
When you want to use the same value for a mapping parameter each time you run th
e session.
Unlike a mapping parameter, a mapping variable represent a value that can change
through the session. The server saves the value of a mapping variable to the re
pository at the end of each successful run and used that value the next time you
run the session.
Mapping objects:
Source, Target, Transformation, Cubes, Dimension
Debugger
We can run the Debugger in two situations
(a) Before Session: After saving mapping, we can run some initial tests.
(b) After Session: real Debugging process
Metadata Reporter:
- Web based application that allows to run reports against repository metadat
a
- Reports including executed sessions, lookup table dependencies, mappings an
d source/target schemas.
Repository
Types of Repository
(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that mult
iple developers can use through shortcuts. These may include operational or appl
ication source definitions, reusable transformations, mapplets and mappings
(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository.
Use4 the Local Repository for development.
© Standard Repository
a. A repository that functions individually, unrelated and unconnected to oth
er repository
NOTE:
- Once you create a global repository, you can not change it to a local repos
itory
- However, you can promote the local to global repository
Batches
- Provide a way to group sessions for either serial or parallel execution by
server
- Batches
o Sequential (Runs session one after another)
o Concurrent (Runs sessions at same time)
Nesting Batches
Each batch can contain any number of session/batches. We can nest batches severa
l levels deep, defining batches within batches
Nested batches are useful when you want to control a complex series of sessions
that must run sequentially or concurrently
Scheduling
When you place sessions in a batch, the batch schedule override that session sch
edule by default. However, we can configure a batched session to run on its own
schedule by selecting the ?Use Absolute Time Session? Option.
Server Behavior
Server configured to run a batch overrides the server configuration to run sessi
ons within the batch. If you have multiple servers, all sessions within a batch
run on the Informatica server that runs the batch.
The server marks a batch as failed if one of its sessions is configured to run i
f ?Previous completes? and that previous session fails.
Sequential Batch
If you have sessions with dependent source/target relationship, you can place th
em in a sequential batch, so that Informatica server can run them is consecutive
order.
They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)
Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same ti
me
Concurrent batches take advantage of the resource of the Informatica server, red
ucing the time it takes to run the session separately or in a sequential batch.
Concurrent batch in a Sequential batch
If you have concurrent batches with source-target dependencies that benefit from
running those batches in a particular order, just like sessions, place them int
o a sequential batch.
Server Concepts
The Informatica server used three system resources
(a) CPU
(b) Shared Memory
(c) Buffer Memory
Informatica server uses shared memory, buffer memory and cache memory for sessio
n information and to move data between session threads.
LM Shared Memory
Load Manager uses both process and shared memory. The LM keeps the information s
erver list of sessions and batches, and the schedule queue in process memory.
Once a session starts, the LM uses shared memory to store session details for th
e duration of the session run or session schedule. This shared memory appears as
the configurable parameter (LMSharedMemory) and the server allots 2,000,000 byt
es as default.
This allows you to schedule or run approximately 10 sessions at one time.
DTM Buffer Memory
The DTM process allocates buffer memory to the session based on the DTM buffer p
oll size settings, in session properties. By default, it allocates 12,000,000 by
tes of memory to the session.
DTM divides memory into buffer blocks as configured in the buffer block size set
tings. (Default: 64,000 bytes per block)
Running a Session
The following tasks are being done during a session
1. LM locks the session and read session properties
2. LM reads parameter file
3. LM expands server/session variables and parameters
4. LM verifies permission and privileges
5. LM validates source and target code page
6. LM creates session log file
7. LM creates DTM process
8. DTM process allocates DTM process memory
9. DTM initializes the session and fetches mapping
10. DTM executes pre-session commands and procedures
11. DTM creates reader, writer, transformation threads for each pipeline
12. DTM executes post-session commands and procedures
13. DTM writes historical incremental aggregation/lookup to repository
14. LM sends post-session emails
Stopping and aborting a session
- If the session you want to stop is a part of batch, you must stop the batch
- If the batch is part of nested batch, stop the outermost batch
- When you issue the stop command, the server stops reading data. It continue
s processing and writing data and committing data to targets
- If the server cannot finish processing and committing data, you can issue t
he ABORT command. It is similar to stop command, except it has a 60 second timeo
ut. If the server cannot finish processing and committing data within 60 seconds
, it kills the DTM process and terminates the session.
Recovery:
- After a session being stopped/aborted, the session results can be recovered
. When the recovery is performed, the session continues from the point at which
it stopped.
- If you do not recover the session, the server runs the entire session the n
ext time.
- Hence, after stopping/aborting, you may need to manually delete targets bef
ore the session runs again.
NOTE:
ABORT command and ABORT function, both are different.
When can a Session Fail
- Server cannot allocate enough system resources
- Session exceeds the maximum no of sessions the server can run concurrently
- Server cannot obtain an execute lock for the session (the session is alread
y locked)
- Server unable to execute post-session shell commands or post-load stored pr
ocedures
- Server encounters database errors
- Server encounter Transformation row errors (Ex: NULL value in non-null fiel
ds)
- Network related errors
When Pre/Post Shell Commands are useful
- To delete a reject file
- To archive target files before session begins
Session Performance
- Minimum log (Terse)
- Partitioning source data.
- Performing ETL for each partition, in parallel. (For this, multiple CPUs ar
e needed)
- Adding indexes.
- Changing commit Level.
- Using Filter trans to remove unwanted data movement.
- Increasing buffer memory, when large volume of data.
- Multiple lookups can reduce the performance. Verify the largest lookup tabl
e and tune the expressions.
- In session level, the causes are small cache size, low buffer memory and sm
all commit interval.
- At system level,
o WIN NT/2000-U the task manager.
o UNIX: VMSTART, IOSTART.
Hierarchy of optimization
- Target.
- Source.
- Mapping
- Session.
- System.
Optimizing Target Databases:
- Drop indexes /constraints
- Increase checkpoint intervals.
- Use bulk loading /external loading.
- Turn off recovery.
- Increase database network packet size.
Source level
- Optimize the query (using group by, group by).
- Use conditional filters.
- Connect to RDBMS using IPC protocol.
Mapping
- Optimize data type conversions.
- Eliminate transformation errors.
- Optimize transformations/ expressions.
Session:
- concurrent batches.
- Partition sessions.
- Reduce error tracing.
- Remove staging area.
- Tune session parameters.
System:
- improve network speed.
- Use multiple preservers on separate systems.
- Reduce paging.
Session Process
Info server uses both process memory and system shared memory to perform ETL pro
cess.
It runs as a daemon on UNIX and as a service on WIN NT.
The following processes are used to run a session:
(a) LOAD manager process: - starts a session
? creates DTM process, which creates the session.
(b) DTM process: - creates threads to initialize the session
? read, write and transform data.
? handle pre/post session opertions.
Load manager processes:
- manages session/batch scheduling.
- Locks session.
- Reads parameter file.
- Expands server/session variables, parameters .
- Verifies permissions/privileges.
- Creates session log file.
DTM process:
The primary purpose of the DTM is to create and manage threads that carry out th
e session tasks.
The DTM allocates process memory for the session and divides it into buffers. Th
is is known as buffer memory. The default memory allocation is 12,000,000 bytes
.it creates the main thread, which is called master thread .this manages all oth
er threads.
Various threads functions
Master thread- handles stop and abort requests from load
manager.
Mapping thread- one thread for each session.
Fetches session and mapping information.
Compiles mapping.
Cleans up after execution.
Reader thread- one thread for each partition.
Relational sources uses relational threads and
Flat files use file threads.
Writer thread- one thread for each partition writes to target
.
Transformation thread- one or more transformation for each partit
ion.
Note:
When you run a session, the threads for a partitioned source execute concurr
ently. The threads use buffers to move/transform data.
*************************************************************************8
What r the out put files that the informatica server creates during the session
running?
Informatica server log: Informatica server(on unix) creates a log for all status
and error messages(default name: pm.server.log).It also creates an error log fo
r error messages.These files will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session.I
t writes information about session into log files such as initialization process
,creation of sql commands for reader and writer threads,errors encountered and l
oad summary.The amount of detail in session log file depends on the tracing leve
l that u set.
Session detail file: This file contains load statistics for each targets in mapp
ing.Session detail include information such as table name,number of rows written
or rejected.U can view this file by double clicking on the session in monitor w
indow
Performance detail file: This file contains information known as session perform
ance details which helps U where performance can be improved.To genarate this fi
le select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite t
o targets.
Control file: Informatica server creates control file and a target file when U r
un a session that uses the external loader.The control file contains the informa
tion about the target flat file such as data format and loading instructios for
the external loader.
Post session email: Post session email allows U to automatically communicate inf
ormation about a session run to designated recipents.U can create two different
messages.One if the session completed sucessfully the other if the session fails
.
Indicator file: If u use the flat file as a target,U can configure the informati
ca server to create indicator file.For each target row,the indicator file contai
ns a number to indicate whether the row was marked for insert,update,delete or r
eject.
output file: If session writes to a target file,the informatica server creates t
he target file based on file prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates ca
che files.For the following circumstances informatica server creates index and d
atacache files.
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
******************************************************************************
********************************************************************************
**
********************************************************************************
**
********************************************************************************
**
Add EXPRESSION transformation after a SQ and before the target. If the source or
target definition changes reconnecting ports is much easier.
Denormalization using aggregator
LAST function in aggregator
Returns the last row in the selected port. Optionally, you can apply a filter to
limit the rows the Integration Service reads. You can nest only one other aggre
gate function within LAST.
LAST(AMOUNT, MONTH)
MQ Source qualifier transformation
***********************************************
Test load takes the no.of rows from the SQ as specified in the test rows. and sh
ows to which instance it goes.
not loaded to the target.
Not working for flat file target
***********************************************
EVENTWAIT-EVENTRAISE tasks
First create an event by right click the workflow and edit-> events tab -> creat
e an event
place eventraise task in the workflow space and give the created event name in t
he "user defined event" (properties tab)
Place eventwait task in the workflow space-> 2 options in events tab
predefined-this is a file watch
userdefined-this is an event crea
ted on the workflow properties events tab
user-defined event to watch and give the the event created
when the eventraise is executed, it creates the event-> it triggers the event wa
it to continue
***********************************************
Timer Task
You can specify the period of time to wait before the Integration Service runs t
he next task in the workflow with the Timer task.
two options:
Absolute time - give the exact time when to start the next task or refer a datet
ime variable
Relative time - give the hours, minute, seconds - from the start time of this ta
sk
- from the start time of the par
ent workflow/worklet
- from the start time of the to
p-level workflow
***********************************************
Decision Task
Use the Decision task instead of multiple link conditions in a workflow. Instead
of specifying multiple link conditions, use the predefined condition variable i
n a Decision task to simplify link conditions.
can be achieve without this task also
***********************************************
Assignment task
You can assign a value to a user-defined workflow variable with the Assignment t
ask.
**********************************************
Control Task
fail me
fail parent
abort parent
fail parentlevel workflow
stop parent level workflow
abort parent level workflow
**********************************************
DECODE( ITEM_ID, 10, 'Flashlight',
14, 'Regulator',
20, 'Knife',
40, 'Tank',
'NONE' )
DECODE ( CONST_NAME,
'Five', 5,
'Pythagoras', '1.414213562',
'Archimedes', '3.141592654',
'Pi', 3.141592654 )
**********************************************
If you use an output default value other than ERROR, the default value overrides
the ERROR function in an expression. For example, you use the ERROR function in
an expression, and you assign the default value, 1234, to the output port. Each t
ime the Integration Service encounters the ERROR function in the expression, it
overrides the error with the value 1234 and passes 1234 to the next transformation.
It does not skip the row, and it does not log an error in the session log.
IIF( SALARY < 0, ERROR ('Error. Negative salary found. Row skipped.', EMP_SALARY
)
SALARY RETURN VALUE
10000 10000
-15000 'Error. Negative salary found. Row skipped.'
NULL NULL
150000 150000
1005 1005
**********************************************
Returns e raised to the specified power (exponent), where e=2.71828183. For exam
ple, EXP(2) returns 7.38905609893065.
**********************************************
Returns the first value found within a port or group. Optionally, you can apply
a filter to limit the rows the Integration Service reads. You can nest only one
other aggregate function within FIRST.
FIRST( value [, filter_condition ] )
FIRST( ITEM_NAME, ITEM_PRICE > 10 )
LAST(ITEM_NAME, ITEM_PRICE > 10)
LPAD( PART_NUM, 6, '0')
RIM( LAST_NAME, 'S.')
*********************************************
Returns the future value of an investment, where you make periodic, constant pay
ments and the investment earns a constant interest rate.
Syntax
FV( rate, terms, payment [, present value, type] )
FV(0.0075, 12, -250, -2000, TRUE)
*********************************************
GET_DATE_PART( DATE_SHIPPED, 'HH12' )
GET_DATE_PART( DATE_SHIPPED, 'DD' )
GET_DATE_PART( DATE_SHIPPED, 'MON' )
*********************************************
GREATEST( QUANTITY1, QUANTITY2, QUANTITY3 )
*********************************************
INDEXOF( ITEM_NAME, diving hood, flashlight, safety knife)
1 if the input value matches string1, 2 if the input value matches string2, and
so on.
0 if the input value is not found.
NULL if the input is a null value.
*********************************************
INITCAP( string )
*********************************************
INSTR( COMPANY, 'a', 1, 2 )
*********************************************
Is_Date() returns 0 or 1
*********************************************
Mapping -> Debugger -> use the existing session instance -> click next instance
[shows one row at a time when moves throuthout the mapping, displays the next ro
w]
*********************************************
Is_Number()
Is_spaces() Returns whether a string value consists entirely of spaces.
LAST_DAY() Returns the date of the last day of the month for each date in a port
.
********************************************
MAX( ITEM_NAME, MANUFACTURER_ID='104' ) - The MAX function uses the same sort or
der that the Sorter transformation uses. However, the MAX function is case sensi
tive, and the Sorter transformation may not be case sensitive.
You can also use MAX to return the latest date or the largest numeric value in a
port or group.
*******************************************
MD5 - Message-Digest algorithm 5
METAPHONE - Encodes string values. You can specify the length of the string that
you want to encode.
*******************************************
The following expression returns the average order for a Stabilizing Vest, based
on the first five rows in the Sales port, and thereafter, returns the average f
or the last five rows read:
MOVINGAVG( SALES, 5 )
MOVINGSUM( SALES, 5 )
*******************************************
POWER( NUMBERS, EXPONENT )
*******************************************
RAND (1) Returns a random number between 0 and 1. This is useful for probabilit
y scenarios.
*******************************************
REG_EXTRACT
REPLACESTR ( CaseFlag, InputString, OldString1, [OldString2, ... OldStringN,] Ne
wString )
REVERSE( string )
RPAD( ITEM_NAME, 16, '.')
TO_FLOAT() - 0 if the value in the port is blank or a non-numeric character.
******************************************
What is a global and local shortcuts?
Standalone Repository : A repository that functions individually and this is unr
elated to any other repositories.
Global Repository : This is a centralized repository in a domain. This repositor
y can contain shared objects across the repositories in a domain. The objects ar
e shared through global shortcuts.
Local Repository : Local repository is within a domain and its not a global repos
itory. Local repository can connect to a global repository using global shortcut
s and can use objects in its shared folders.
**************************************************
When bulk loading, the Integration Service bypasses the database log, which spee
ds performance. Without writing to the database log, however, the target databas
e cannot perform rollback. As a result, you may not be able to perform recovery.
When you use bulk loading, weigh the importance of improved session performance
against the ability to recover an incomplete session.
*************************************************
IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y',
VAL_A + VAL_B + VAL_C,
IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N',
VAL_A + VAL_B ,
IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y',
VAL_A + VAL_C,
IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N',
VAL_A ,
IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y',
VAL_B + VAL_C,
IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N',
VAL_B ,
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y',
VAL_C,
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N',
0.0,
))))))))
This expression requires 8 IIFs, 16 ANDs, and at least 24 comparisons.
If you take advantage of the IIF function, you can rewrite that expression as:
IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0.
0)
*************************************************
******************************************************
Look for performance bottlenecks in the following order:
1. Target
2. Source
3. Mapping
4. Transformation
5. Session
6. grid deployments
7. Powercenter components
8. System
******************************************
Run test sessions. You can configure a test session to read from a flat file s
ource or to write to a flat file target to identify source and target bottleneck
s.
Analyze performance details. Analyze performance details, such as performance
counters, to determine where session performance decreases.
Analyze thread statistics. Analyze thread statistics to determine the optimal
number of partition points.
Monitor system performance. You can use system monitoring tools to view the pe
rcentage of CPU use, I/O waits, and paging to identify system bottlenecks. You c
an also use the Workflow Monitor to view system resource usage.
******************************************
If the reader or writer thread is 100% busy, consider using string datatypes in
the source or target ports. Non-string ports require more processing.
If a transformation thread is 100% busy, consider adding a partition point in
the segment. When you add partition points to the mapping, the Integration Servi
ce increases the number of transformation threads it uses for the session. Howev
er, if the machine is already running at or near full capacity, do not add more
threads.
If one transformation requires more processing time than the others, consider
adding a pass-through partition point to the transformation.
*****************************************
Complete the following tasks to eliminate target bottlenecks:
Have the database administrator optimize database performance by optimizing th
e query.
Increase the database network packet size.
Configure index and key constraints.
****************************************
Source Bottlenecks
Performance bottlenecks can occur when the Integration Service reads from a sour
ce database. Inefficient query or small database network packet sizes can cause
source bottlenecks.
Identifying Source Bottlenecks
You can read the thread statistics in the session log to determine if the source
is the bottleneck. When the Integration Service spends more time on the reader
thread than the transformation or writer threads, you have a source bottleneck.
If the session reads from a relational source, use the following methods to iden
tify source bottlenecks:
Filter transformation
Read test mapping
Database query
If the session reads from a flat file source, you probably do not have a source
bottleneck.
Using a Filter Transformation
You can use a Filter transformation in the mapping to measure the time it takes
to read source data.
Add a Filter transformation after each source qualifier. Set the filter conditio
n to false so that no data is processed passed the Filter transformation. If the
time it takes to run the new session remains about the same, you have a source
bottleneck.
Using a Read Test Mapping
You can create a read test mapping to identify source bottlenecks. A read test m
apping isolates the read query by removing the transformation in the mapping.
To create a read test mapping, complete the following steps:
1. Make a copy of the original mapping.
2. In the copied mapping, keep only the sources, source qualifiers, and any cus
tom joins or queries.
3. Remove all transformations.
4. Connect the source qualifiers to a file target.
Run a session against the read test mapping. If the session performance is simil
ar to the original session, you have a source bottleneck.
Using a Database Query
To identify source bottlenecks, execute the read query directly against the sour
ce database.
Copy the read query directly from the session log. Execute the query against the
source database with a query tool such as isql. On Windows, you can load the re
sult of the query in a file. On UNIX, you can load the result of the query in /d
ev/null.
Measure the query execution time and the time it takes for the query to return t
he first row.
Eliminating Source Bottlenecks
Complete the following tasks to eliminate source bottlenecks:
Set the number of bytes the Integration Service reads per line if the Integrat
ion Service reads from a flat file source.
Have the database administrator optimize database performance by optimizing th
e query.
Increase the database network packet size.
Configure index and key constraints.
If there is a long delay between the two time measurements in a database query
, you can use an optimizer hint.
******************A simple source filter on the source database can sometimes ne
gatively impact performance because of the lack of indexes. You can use the Powe
rCenter conditional filter in the Source Qualifier to improve performance.******
**
*****************************************************
Mapping Bottlenecks
If you determine that you do not have a source or target bottleneck, you may hav
e a mapping bottleneck.
Generally, you reduce the number of transformations in the mapping and delete un
necessary links between transformations to optimize the mapping. Configure the m
apping with the least number of transformations and expressions to do the most a
mount of work possible. Delete unnecessary links between transformations to mini
mize the amount of data moved.
****************************************************
Optimizing the Line Sequential Buffer Length
If the session reads from a flat file source, you can improve session performanc
e by setting the number of bytes the Integration Service reads per line. By defa
ult, the Integration Service reads 1024 bytes per line. If each line in the sour
ce file is less than the default setting, you can decrease the line sequential b
uffer length in the session properties.
***************************************************
Single-pass reading allows you to populate multiple targets with one source qual
ifier. Consider using single-pass reading if you have multiple sessions that use
the same sources. You can combine the transformation logic for each mapping in
one mapping and use one source qualifier for each source.
***************************************************
You can optimize performance for pass-through mappings. To pass directly from so
urce to target without any other transformations, connect the Source Qualifier t
ransformation directly to the target. If you use the Getting Started Wizard to c
reate a pass-through mapping, the wizard creates an Expression transformation be
tween the Source Qualifier transformation and the target.
***************************************************
Use integer values in place of other datatypes when performing comparisons using
Lookup and Filter transformations. For example, many databases store U.S. ZIP c
ode information as a Char or Varchar datatype. If you convert the zip code data
to an Integer datatype, the lookup database stores the zip code 94303-1234 as 94
3031234. This helps increase the speed of the lookup comparisons based on zip co
de.
**************************************************
Optimizing aggregator
Grouping By Simple Columns - use integer values
Filtering Data Before You Aggregate
Limiting Port Connections - Limit the number of connected input/output or output
ports to reduce the amount of data the Aggregator transformation stores in the
data cache.
Using Sorted Input
Using Incremental Aggregation
If you can capture changes from the source that affect less than half the target
, you can use incremental aggregation to optimize the performance of Aggregator
transformations.
When you use incremental aggregation, you apply captured changes in the source t
o aggregate calculations in a session. The Integration Service updates the targe
t incrementally, rather than processing the entire source and recalculating the
same calculations every time you run the session.
You can increase the index and data cache sizes to hold all data in memory witho
ut paging to disk.
************************************************
Optimizing Joiner
Designate the master source as the source with fewer duplicate key values.
Designate the master source as the source with fewer rows.
Perform joins in a database when possible - Create a pre-session stored procedu
re to join the tables in a database.
Use the Source Qualifier transforma
tion to perform the join.
Join sorted data when possible - minimize disk input and output.
************************************************
Optimizing Lookup Transformations
Native Drivers
Caching Lookup Tables
If a mapping contains Lookup transformations, you might want to enable lookup ca
ching. When you enable caching, the Integration Service caches the lookup table
and queries the lookup cache during the session. When this option is not enabled
, the Integration Service queries the lookup table on a row-by-row basis.
The result of the Lookup query and processing is the same, whether or not you ca
che the lookup table. However, using a lookup cache can increase session perform
ance for smaller lookup tables. In general, you want to cache lookup tables that
need less than 300 MB.
Use the following types of caches to increase performance:
Shared cache. You can share the lookup cache between multiple transformations.
You can share an unnamed cache between transformations in the same mapping. You
can share a named cache between transformations in the same or different mappin
gs.
Persistent cache. To save and reuse the cache files, you can configure the tra
nsformation to use a persistent cache. Use this feature when you know the lookup
table does not change between session runs. Using a persistent cache can improv
e performance because the Integration Service builds the memory cache from the c
ache files instead of from the database.
Enabling Concurrent Caches - default Auto
***********************************************
To optimize Sequence Generator transformations, create a reusable Sequence Gener
ator and using it in multiple mappings simultaneously. Also, configure the Numbe
r of Cached Values property.
The Number of Cached Values property determines the number of values the Integra
tion Service caches at one time. Make sure that the Number of Cached Value is no
t too small. Consider configuring the Number of Cached Values to a value greater
than 1,000.
If you do not have to cache values, set the Number of Cache Values to 0. Sequenc
e Generator transformations that do not use cache are faster than those that req
uire cache.
**********************************************
Allocating Memory
For optimal performance, configure the Sorter cache size with a value less than
or equal to the amount of available physical RAM on the Integration Service node
. Allocate at least 16 MB of physical memory to sort data using the Sorter trans
formation. The Sorter cache size is set to 16,777,216 bytes by default. If the I
ntegration Service cannot allocate enough memory to sort data, it fails the sess
ion.
If the amount of incoming data is greater than the amount of Sorter cache size,
the Integration Service temporarily stores data in the Sorter transformation wor
k directory. The Integration Service requires disk space of at least twice the a
mount of incoming data when storing data in the work directory. If the amount of
incoming data is significantly greater than the Sorter cache size, the Integrat
ion Service may require much more than twice the amount of disk space available
to the work directory.
Use the following formula to determine the size of incoming data:
# input rows ([Sum(column size)] + 16)
*********************************************
Use the Select Distinct option for the Source Qualifier transformation if you wa
nt the Integration Service to select unique values from a source. Use Select Dis
tinct option to filter unnecessary data earlier in the data flow. This can impro
ve performance.
********************************************
Transformation errors occur when the Integration Service encounters conversion e
rrors, conflicting mapping logic, and any condition set up as an error, such as
null input. Check the session log to see where the transformation errors occur.
If the errors center around particular transformations, evaluate those transform
ation constraints.
********************************************
*********************************************
************************************************
****************************************************
Tools -> options -> general tab
Reload Tables/Mappings when Opening a Folder
Ask Whether to Reload the Tables/Mappings
Group Source by Database
********************************************
Target file having datatime as header -
Header in session properties - Command - give direct command or batch script
********************************************
Join condition in SQL OVERRIDE - {} not needed for one join condition alone
Lookup SQL Override column order and the port order - does not affect
Update employee set employeename = case when empid = 1 then 'RK'
when empid = 2 then 'Valluri'
else null
end
Incremental aggregation: When you enable both truncate target tables and increm
ental aggregation in the session properties, the Workflow Manager issues a warni
ng that you cannot enable truncate target tables and incremental aggregation in
the same session.
Test load: When you enable both truncate target tables and test load, the Integr
ation Service disables the truncate table function, runs a test load session, an
d writes the following message to the session log:
**********************************************
Java Transformation Examples
if(isNull("LegacyAssociatedOrgRefno"))
{
IdentifyingType="Parent";
LegacyAssociatedOrgRefno = LegacyIdentifyingRefno;
generateRow();
IdentifyingType="Child";
LegacyAssociatedOrgRefno = LegacyIdentifyingRefno;
generateRow();
IdentifyingType="Self";
LegacyAssociatedOrgRefno = LegacyIdentifyingRefno;
generateRow();
}
if(isNull("LegacyAssociatedOrgRefno") && HOTYPE.equals("STHA"))
{
LegacyAssociatedOrgRefno=-1;
Level=0;
generateRow();
}
Swaping columns example:
IdentifyingType = "Parent";
generateRow();
IdentifyingType = "Child";
int x = LegacyAssociatedOrgRefno;
int y = LegacyIdentifyingRefno;
LegacyIdentifyingRefno = x;
LegacyAssociatedOrgRefno = y;
generateRow();
***************************************************
The Integration Service runs sessions and workflows. When you configure the Inte
gration Service, you can specify where you want it to run
Dynamic flat file creation depending on the levels (vary with each trust)
Sorter - to sort out the level
Transaction control
filename port in target instance
use TRANSACTION CONTROL transformation
IIF(Out_Level_old3 <> Out_Level_new3, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)
Generating Flat File Targets By Transaction
You can generate a separate output file each time the Integration Service starts
a new transaction. You can dynamically name each target flat file.
To generate a separate output file for each transaction, add a FileName port to
the flat file target definition. When you connect the FileName port in the mappi
ng, the Integration Service creates a separate target file at each commit point.
The Integration Service uses the FileName port value from the first row in each
transaction to name the output file.
**************************************************
UNIX
man
ls -ltr
wc [-l -w]
mv filename
diff
cp
rm
gzip
gunzip
ff to find the file any where in the system
mkdir
cd
pwd
grep looks for string in the file
ftp
chmod 4+2+1 (x+w+r) ex: chmod 755
cat f1 f2 >f3
^-c (ctrl-c) to kill a running process
^-d (ctrl-d) to close an open window
10. What difference between cmp and diff commands?
cmp - Compares two files byte by byte and displays the first mismatch
diff - tells the changes to be made to make the files identical
tar - tape archiver, tar takes a bunch of files, and munges them into one .tar f
ile, the files are often compressed with the gzip algorithm, and use the .tar.gz
extension. to create a tar tar -cf archive.tar /directory, then to extract the
archive to the current directory run tar -xf archive.tar to use gzip, just add a
z to the options, to create a tar.gz: tar -czf archive.tar.gz /dir to extract i
t tar -xzf archive.tar.gz
head
tail
head 100 tail 1
IIF(NOT ISNULL(In_UniqueLegacyTreatmentFunctionIdentifier) AND v_TreatmentFnType
Code_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)')
IIF(ISNULL(v_UniqueLegacyTreatmentFunctionIdentifier_lkp), NULL, IIF( v_Treatmen
tFnTypeCode_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)'))
IIF(ISNULL(v_UniqueLegacySpecialtyIdentifier_lkp), NULL, IIF( v_SpecTypeCode_dom
= 'CC_CPSPE', NULL, 'B61413(v_SpecTypeCode)'))
IIF(ISNULL(v_UniqueLegacyCurrentSpecialtyIdentifier_lkp), NULL, IIF(v_CurrentSpe
cTypeCode_dom = 'CC_CPSPE', NULL, 'B61414(v_CurrentSpecTypeCode)'))
*************************************************
**********************************************************
**********************************************************
************************************************************
**********************************************************
DW
**************************************************************
Data Warehouse
Data warehouse is an architecture for organizing information system. It is a pro
cess for building decision support systems and knowledge management enviroment t
hat supports both day-to-day tactical decision making and long-term business str
ategies.
Bill Inmon "Subject-oriented, integrated, time variant, non-volatile collection
of data in support of management's decision making process."
Ralph Kimball "a copy of transaction data specifically structured for query and
analysis."
Operational Data Store(ODS)
An operational data store is an integrated, subject-oriented, volatile(including
update/deletion), current valued structure designed to serve operational users
as they do high performance integrated processing.
OLTP(Online Transaction Processing)>
OLTP is a class of program that facilitates and manages transaction-oriented app
lications, typically for data entry and retrieval transaction processing. OLTP s
ystems are optimized for data entry operations. e.g. Order Entry, Banking, CRM,
ERP applications etc
Data Warehouse Operational/Transactional
Subject oriented Application oriented
Summarized, refined Detailed
Represents value over time Accurate as of moment
Supports managerial needs Supports day-to-day needs
Read only data Can be updated
Batch processing Real time transactions
Completely different life cycle Software Development Life Cycle
Analysis driven Transaction driven
Dimensional model Entity Relational Diagram
Large amount of data Small amount of data
Relaxed availability High availability
Flexible structure Static structure
Business Intelligence (BI)
Business Intelligence is a set of business processes for collecting and analying
business information. BI functions include trend analysis, aggregation of data
, drilling down to complex levels of detail, slice-dice, data rotation for compa
rative viewing.
OLAP(On-Line Analytical Processing) Querying and presenting data from data ware
house exemplifying as multiple dimensions.
ROLAP(Relational OLAP) Applications and set of user interfaces that retrieve da
ta from RDBMS and present as dimensional model.
MOLAP(Multidimensional OLAP) Applications, set of user interfaces and database
technologies that have dimensional model.
DOLAP(Desktop OLAP) Designed for low-end, single user. Data is stored/download
ed on the desktop.
HOLAP(Hybrid OLAP) is a combination of all the above OLAP methodologies.
Top-Down Bottom-Up Hybrid Fede
rated
Practitioner Bill Inmon Ralph Kimball Many practitioners Doug
Hackney
Emphasize Data Warehouse Data Marts DW and data marts Inte
grate heterogeneous BI environments
Design Enterprise based normalized model; marts use a subject orient di
mensional model
Dimensional model of data mart, consists star schema
Start enterprise and local models; one or more star schemas
An achitecture of architectures; share dimensions, facts, rules,
definitions across organizations
Architect
Multi-tier comprised of staging area and dependent data marts
Staging area and data marts
High-level normalized enterprise model; initial marts
Reality of change in organizations and systems
Data set
DW atomic level data; marts summary data
Contains both atomic and summary data
Populates marts with atomic and summary data via a non-persistent staging area.
Use of whatever means possible to integrate business needs
*****************************************************************
E. F. Codd(father of the relational database)'s 12 rules for OLAP
Multidimensional conceptual view. This supports EIS "slice-and-dice" operations
and is usually required in financial modeling.
Transparency. OLAP systems should be part of an open system that supports hetero
geneous data sources. Furthermore, the end user should not have to be concerned
about the details of data access or conversions.
Accessibility. The OLAP should present the user with a single logical schema of
the data.
Consistent reporting performance. Performance should not degrade as the number o
f dimensions in the model increases.
Client/server architecture. Requirement for open, modular systems.
Generic dimensionality. Not limited to 3-D and not biased toward any particular
dimension. A function applied to one dimension should also be able to be applied
to another.
Dynamic sparse-matrix handling. Related both to the idea of nulls in relational
databases and to the notion of compressing large files, a sparse matrix is one i
n which not every cell contains data. OLAP systems should accommodate varying st
orage and data-handling options.
Multiuser support. OLAP systems, like EISes, need to support multiple concurrent
users, including their individual views or slices of a common database.
Unrestricted cross-dimensional operations. Similar to rule 6; all dimensions are
created equal, and operations across data dimensions do not restrict relationsh
ips between cells.
Intuitive data manipulation. Ideally, users shouldn't have to use menus or perfo
rm complex multiple-step operations when an intuitive drag-and-drop action will
do.
Flexible reporting. Save a tree. Users should be able to print just what they ne
ed, and any changes to the underlying financial model should be automatically re
flected in reports.
Unlimited dimensional and aggregation levels. A serious tool should support at l
east 15, and preferably 20, dimensions.
********************************************************************
Dimensional data model is most often used in data warehousing systems. This is d
ifferent from the 3rd normal form, commonly used for transactional (OLTP) type s
ystems. As you can imagine, the same data would then be stored differently in a
dimensional model than in a 3rd normal form model.
Dimension: A category of information. For example, the time dimension. eg: demog
raphy, date, product, customer, HR etc
Attribute: A unique level within a dimension. For example, Month is an attribute
in the Time Dimension.
Hierarchy: The specification of levels that represents relationship between diff
erent attributes within a dimension. For example, one possible hierarchy in the
Time dimension is Year ? Quarter ? Month ? Day.
Dimensional Data Model: Dimensional data model is commonly used in data warehous
ing systems. This section describes this modeling technique, and the two common
schema types, star schema and snowflake schema.
Fact Table: A fact table is a table that contains the measures of interest. For
example, sales amount would be such a measure. This measure is stored in the fac
t table with the appropriate granularity. For example, it can be sales amount by
store by day. In this case, the fact table would contain three columns: A date
column, a store column, and a sales amount column.
Attributes are the non-key columns in the lookup tables.
Lookup Table: The lookup table provides the detailed information about the attri
butes. For example, the lookup table for the Quarter attribute would include a l
ist of all of the quarters available in the data warehouse. Each row (each quart
er) may have several fields, one for the unique ID that identifies the quarter,
and one or more additional fields that specifies how that particular quarter is
represented on a report (for example, first quarter of 2001 may be represented a
s "Q1 2001" or "2001 Q1").
STAR SCHEMA
All measures in the fact table have the same level of granularity.
complex star can have more than one fact table.
A star schema is characterized by one OR more very large fact tables that contai
n the primary information in the data warehouse, and a number of much smaller di
mension tables (OR lookup tables), each of which contains information about the
entries for a particular attribute in the fact table.
A star query is a join between a fact table and a number of dimension tables. Ea
ch dimension table is joined to the fact table using a primary key to foreign ke
y join, but the dimension tables are not joined to each other.
The main advantages of star schemas are that they:
Provide a direct and intuitive mapping between the business entities being analy
zed by end users and the schema design.
Provide highly optimized performance for typical star queries.
Are widely supported by a large number of business intelligence tools, which may
anticipate OR even require that the data-warehouse schema contains dimension ta
bles
The snowflake schema is an extension of the star schema, where each point of the
star explodes into more points. In a star schema, each dimension is represented
by a single dimensional table, whereas in a snowflake schema, that dimensional
table is normalized into multiple lookup tables, each representing a level in th
e dimensional hierarchy.
the Time Dimension that consists of 2 different hierarchies:
1. Year ? Month ? Day
2. Week ? Day
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dim
ension data has been grouped into multiple tables instead of one large table. Fo
r example, a location dimension table in a star schema might be normalized into
a location table and city table in a snowflake schema. While this saves space, i
t increases the number of dimension tables and requires more foreign key joins.
The result is more complex queries and reduced query performance. Figure above p
resents a graphical representation of a snowflake schema.
Fact Constellation Schema
This Schema is used mainly for the aggregate fact tables, OR where we want to sp
lit a fact table for better comprehension. The split of fact table is done only
when we want to focus on aggregation over few facts & dimensions.
********************************************************************