ETL Datastage Blog

Monday, 15 July 2019

Restart Datastage Server version 11.7

Step1. Stop existing services.
a. su - dsadm
b. cd $DSHOME
c. ./dsenv
d. bin/uv -admin -stop
e. su - root
f. cd ...../ASBNode/bin
g. ./NodeAgents.sh stop
h. cd ..../ASBServer/bin
i. .MetadataServer.sh stop
If ps ax|grep -v grep|grep server1 > /dev/null
pkill -9 server1
Else
echo "WAS already down"

Step2. Start Datastage server
a. su - db2inst
b. db2start
c. su - root
d. ./MetadataServer.sh run
e. cd ...../ASBNode/bin
f. ./NodeAgents.sh start
g. su - dsadm
h. cd $DSHOME
i. ./dsenv
j. bin/uv -admin -start

#Services will start on port which the product was installed. Check netstat output as well if required.

Monday, 27 January 2014

To Get a JobInfo/JobStatus/JobReport Of DataSatge Through Unix

To see the list of all projects present in the server:

dsjob –lprojects

To see the list of jobs in a particular project:

dsjob –ljobs project_name

To see the list of all the stages used in your job for a particular project:

dsjobs –lstages project_name job_name

To see the list of all the links used in your job for a particular project:

dsjobs –links project_name job_name

To see the list of all the paramete used in your job for a particular project:

dsjobs –lparams project_name job_name

To see a basic job information of a particular project:

dsjob -jobinfo project_name job_name

To see a detailed leve report of job information of a particular project:

dsjob -report project job_name BASIC|DETAIL|XML

To see the list of latest 5 fatal errors from the log of the job that was just run:

dsjob -logsum –type FATAL –max 5 project_name job_name

To Run and Stop a Job Of DataSatge Through Unix

1.To run a DataStage Job:
dsjob –run project_name job_name

2.If you have parameters to set or paratemeterset values to set for job Then unix command will be:
dsjob –run –param variable_name=”VALUE” –param psParameterSet=”vsValueSet” project_name job_name

3.To stop a DataStage job:
dsjob –stop project_name job_name

Unix Command to Restart(STOP and START) DataStage Server:

1. Log in to the engine tier computer as the IBM InfoSphere DataStage administrator (usually dsadm).
2. Change to the engine directory and set the environment.
3. cd $DSHOME
. ./dsenv
$DSHOME is the name of the engine directory. The default directory is
/opt/IBM/InformationServer/Server/DSEngine.
4. Stop the engine.
./bin/uv –admin -stop
The command displays a message when the engine stops.
5. Restart the engine.
./bin/uv –admin -start

Friday, 17 May 2013

Datastage Routine to Count Records in a Particular Link Or To Compare Records in Different Link in Datastage Job.

DataStage Routine to validate number of records as input with total number of output records.
Below example is work for single input sources with target as output link,ignore link and reject link. You can also manage multiple number of link in a job by little bit modifying below routine.

$INCLUDE DSINCLUDE JOBCONTROL.H

FLAG=0
RjctRowCount=0
InpRowCnt=0
OpRowCnt=0
OpRowCnt2=0

JobName1=Field(Arg1,';',1)
StgLnkName1=Field(Arg1,';',2)

Call DSLogInfo('Checking Records in Job': JobName1,RtnStgChkLnkRec)

RegStageCnt=Arg2

JobHandle1=DSAttachJob(JobName1,DSJ.ERRFATAL)

InpStgName=Field(StgLnkName1,',',1)
InpStgLnkName=Field(StgLnkName1,',',2)
InpRowCnt=DSGetLinkInfo(JobHandle1,InpStgName,InpStgLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Input Records is ': InpRowCnt, RtnStgRecCnt)

OpStgName=Field(StgLnkName1,',',3)
OpLnkName=Field(StgLnkName1,',',4)

If OpStgName<>''
Then
OpRowCnt=DSGetLinkInfo(JobHandle1,OpStgName,OpLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output link is ' : OpRowCnt,RtnStgRecCnt)
End

OpStgName2=Field(StgLnkName1,',',5)
OpLnkName2=Field(StgLnkName1,',',6)

If OpStgName2<>''
Then
OpRowCnt2=DSGetLinkInfo(JobHandle1,OpStgName2,OpLnkName2,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output2 link is ' : OpRowCnt2,RtnStgRecCnt)
End

RjctStgName=Field(StgLnkName1,',',7)
RjctLnkName=Field(StgLnkName1,',',8)

If RjctStgName<>''
Then
RjctRowCount=DSGetLinkInfo(JobHandle1,RjctStgName,RjctLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of records in reject link is ' : RjctRowCount, RtnStgRecCnt)
End

If INT(InpRowCnt)=INT(OpRowCnt)+INT(RjctRowCount)+INT(OpRowCnt2)
Then
Call DSLogInfo('Input Row count is exactly as sum of Op and Reject row counts', RtnStgRecCnt)
FLAG=FLAG
End
Else
Call DSLogInfo('There is data loss in between job',RtnStgRecCnt)
Call DSLogInfo('Aborting Routines',RtnStgRecCnt)
FLAG=FLAG+1
GOTO ExitingRtn
End

ExitingRtn:
Ans=FLAG

{Return value is 0 for No Data Lose in job and in case of data lose return value should be non-zero(1)}

Thursday, 17 January 2013

ETL THROUGH UNIX

3.DB2 "reason code 3" table is in deadlock situation?
Solution:-
To free table from deadlock you need to apply below command in db2 server.

"load from /dev/null of del modified by usedefaults terminate into schema.table nonrecoverable"

NB: don't forget to use nonrecoverable in your command as it helps your table having lesser transaction log.If you have any other deadlock situation in your table then post you query,will give you solution.

4. How see processes run by particular user?

Solutions:

run below command in unix platform...

$ps -ef|grep usename

NB:will show you ran by current user. slave process created by datastage. If datastage job got locked than need to kill these slave processes

5.How to print a entire row in a '|' delimited file having a specific value in it's 2nd column.

Solutions:

awk -F'|' '{if($2=="Spwcific Value")print $0;}' filename

6.A '|' delimited file having 3 columns.How to print them in reverse order?

Solutions:

awk -F'|' '{ print $3, $2, $1 }' filename

7.Print row having similar value.

Solutions:

awk '/desired value/' filename

8.How to replace a specific value with some particular value in a file in unix.

Solutions:

sed 's/find what/replace with/g' filename

9.Search for a particular value in a file

Solutions:

grep "value" filename

10.Search for a exact value in a file

Solutions:

grep -w "value" filename

11.Search for a value except the specified value.

Solutions:

grep -v "except value" filename

12.How to list a file in current directory

Solutions:

ls -ltr

13.How list all the processes running by specific user

Solutions:

ps -ef|grep user name

14.How to see the current user

Solutions:

whoami

15.How see my current working path
Solutions:

pwd

ETL THROUGH DB2

1.A typical sql of datawarehousing is actually stands for?

A SQL query is taking more response time to reply(more then 1hrs).
The sql is joning 6 tables. Primarily it is joining through single foreign
key which all the tables have. But as a typical rdbms query ,there is nothing wrong with this.
But scenario is for 3 tables that column is a unique key and for other three table thata column
is a part of unique key.So while we try to join all this 6 tables,we need to join with all
the remaining unique key for later 3 table also. Thus query response time is quite acceptable( around 5sec).

ETL THROUGH DATASTAGE

1.Verb "UNLOCK" is not in your VOC

Solution:-

First log in into administrator and go to execute command option of specific project. Than follow below process.

Write “SET.FILE UV VOC UV.VOC” click in execute then write “COPY FROM UV.VOC TO VOC UNLOCK” and after you have to write “UNLOCK USER number ALL”

NB:Next time onward you don’t need to write the command. May be you may need admin ID for this.

2. DataStage Routine to validate number of records as input with total number of output records.
Below example is work for single input sources with target as output link,ignore link and reject link. You can also manage multiple number of link in a job by little bit modifying below routine.

Solution:-

$INCLUDE DSINCLUDE JOBCONTROL.H

FLAG=0
RjctRowCount=0
InpRowCnt=0
OpRowCnt=0
OpRowCnt2=0

JobName1=Field(Arg1,';',1)
StgLnkName1=Field(Arg1,';',2)

Call DSLogInfo('Checking Records in Job': JobName1,RtnStgChkLnkRec)

RegStageCnt=Arg2

JobHandle1=DSAttachJob(JobName1,DSJ.ERRFATAL)

InpStgName=Field(StgLnkName1,',',1)
InpStgLnkName=Field(StgLnkName1,',',2)
InpRowCnt=DSGetLinkInfo(JobHandle1,InpStgName,InpStgLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Input Records is ': InpRowCnt, RtnStgRecCnt)

OpStgName=Field(StgLnkName1,',',3)
OpLnkName=Field(StgLnkName1,',',4)

If OpStgName<>''
Then
OpRowCnt=DSGetLinkInfo(JobHandle1,OpStgName,OpLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output link is ' : OpRowCnt,RtnStgRecCnt)
End

OpStgName2=Field(StgLnkName1,',',5)
OpLnkName2=Field(StgLnkName1,',',6)

If OpStgName2<>''
Then
OpRowCnt2=DSGetLinkInfo(JobHandle1,OpStgName2,OpLnkName2,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output2 link is ' : OpRowCnt2,RtnStgRecCnt)
End

RjctStgName=Field(StgLnkName1,',',7)
RjctLnkName=Field(StgLnkName1,',',8)

If RjctStgName<>''
Then
RjctRowCount=DSGetLinkInfo(JobHandle1,RjctStgName,RjctLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of records in reject link is ' : RjctRowCount, RtnStgRecCnt)
End

If INT(InpRowCnt)=INT(OpRowCnt)+INT(RjctRowCount)+INT(OpRowCnt2)
Then
Call DSLogInfo('Input Row count is exactly as sum of Op and Reject row counts', RtnStgRecCnt)
FLAG=FLAG
End
Else
Call DSLogInfo('There is data loss in between job',RtnStgRecCnt)
Call DSLogInfo('Aborting Routines',RtnStgRecCnt)
FLAG=FLAG+1
GOTO ExitingRtn
End

ExitingRtn:
Ans=FLAG

{Return value is 0 for No Data Lose in job and in case of data lose return value should be non-zero(1)}

Tuesday, 6 March 2012

DATASTAGE development

Dear Friends currently I'm working as ETL expert in Datawarehousing technology . If you facing any issues in ETL or developing a job in Datastage/IBM Infosphere.Then please post your query. I will give you solutions.

Saturday, 9 July 2011

DataStage

It integrates data on demand with a high performance parallel framework, extended metadata management, and enterprise connectivity.

Supports the collection, integration and transformation of large volumes of data, with data structures ranging from simple to highly complex.
Offers scalable platform that enables companies to solve large-scale business problems through high-performance processing of massive data volumes
Supports real-time data integration.
Enables developers to maximize speed, flexibility and effectiveness in building, deploying, updating and managing their data integration infrastructure.
Completes connectivity between any data source and any application.