DataStage Routine to validate number of records as input with total number of output records.
Below
example is work for single input sources with target as output
link,ignore link and reject link. You can also manage multiple number of
link in a job by little bit modifying below routine.
$INCLUDE DSINCLUDE JOBCONTROL.H
FLAG=0
RjctRowCount=0
InpRowCnt=0
OpRowCnt=0
OpRowCnt2=0
JobName1=Field(Arg1,';',1)
StgLnkName1=Field(Arg1,';',2)
Call DSLogInfo('Checking Records in Job': JobName1,RtnStgChkLnkRec)
RegStageCnt=Arg2
JobHandle1=DSAttachJob(JobName1,DSJ.ERRFATAL)
InpStgName=Field(StgLnkName1,',',1)
InpStgLnkName=Field(StgLnkName1,',',2)
InpRowCnt=DSGetLinkInfo(JobHandle1,InpStgName,InpStgLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Input Records is ': InpRowCnt, RtnStgRecCnt)
OpStgName=Field(StgLnkName1,',',3)
OpLnkName=Field(StgLnkName1,',',4)
If OpStgName<>''
Then
OpRowCnt=DSGetLinkInfo(JobHandle1,OpStgName,OpLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output link is ' : OpRowCnt,RtnStgRecCnt)
End
OpStgName2=Field(StgLnkName1,',',5)
OpLnkName2=Field(StgLnkName1,',',6)
If OpStgName2<>''
Then
OpRowCnt2=DSGetLinkInfo(JobHandle1,OpStgName2,OpLnkName2,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output2 link is ' : OpRowCnt2,RtnStgRecCnt)
End
RjctStgName=Field(StgLnkName1,',',7)
RjctLnkName=Field(StgLnkName1,',',8)
If RjctStgName<>''
Then
RjctRowCount=DSGetLinkInfo(JobHandle1,RjctStgName,RjctLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of records in reject link is ' : RjctRowCount, RtnStgRecCnt)
End
If INT(InpRowCnt)=INT(OpRowCnt)+INT(RjctRowCount)+INT(OpRowCnt2)
Then
Call DSLogInfo('Input Row count is exactly as sum of Op and Reject row counts', RtnStgRecCnt)
FLAG=FLAG
End
Else
Call DSLogInfo('There is data loss in between job',RtnStgRecCnt)
Call DSLogInfo('Aborting Routines',RtnStgRecCnt)
FLAG=FLAG+1
GOTO ExitingRtn
End
ExitingRtn:
Ans=FLAG
{Return value is 0 for No Data Lose in job and in case of data lose return value should be non-zero(1)}
Friday, 17 May 2013
Thursday, 17 January 2013
ETL THROUGH UNIX
3.DB2 "reason code 3" table is in deadlock situation?
Solution:-
To free table from deadlock you need to apply below command in db2 server.
"load from /dev/null of del modified by usedefaults terminate into schema.table nonrecoverable"
NB: don't forget to use nonrecoverable in your command as it helps your table having lesser transaction log.If you have any other deadlock situation in your table then post you query,will give you solution.
4. How see processes run by particular user?
Solutions:
run below command in unix platform...
$ps -ef|grep usename
NB:will show you ran by current user. slave process created by datastage. If datastage job got locked than need to kill these slave processes
5.How to print a entire row in a '|' delimited file having a specific value in it's 2nd column.
Solutions:
awk -F'|' '{if($2=="Spwcific Value")print $0;}' filename
6.A '|' delimited file having 3 columns.How to print them in reverse order?
Solutions:
awk -F'|' '{ print $3, $2, $1 }' filename
7.Print row having similar value.
Solutions:
awk '/desired value/' filename
8.How to replace a specific value with some particular value in a file in unix.
Solutions:
sed 's/find what/replace with/g' filename
9.Search for a particular value in a file
Solutions:
grep "value" filename
10.Search for a exact value in a file
Solutions:
grep -w "value" filename
11.Search for a value except the specified value.
Solutions:
grep -v "except value" filename
12.How to list a file in current directory
Solutions:
ls -ltr
13.How list all the processes running by specific user
Solutions:
ps -ef|grep user name
14.How to see the current user
Solutions:
whoami
15.How see my current working path
Solutions:
pwd
Solution:-
To free table from deadlock you need to apply below command in db2 server.
"load from /dev/null of del modified by usedefaults terminate into schema.table nonrecoverable"
NB: don't forget to use nonrecoverable in your command as it helps your table having lesser transaction log.If you have any other deadlock situation in your table then post you query,will give you solution.
4. How see processes run by particular user?
Solutions:
run below command in unix platform...
$ps -ef|grep usename
NB:will show you ran by current user. slave process created by datastage. If datastage job got locked than need to kill these slave processes
5.How to print a entire row in a '|' delimited file having a specific value in it's 2nd column.
Solutions:
awk -F'|' '{if($2=="Spwcific Value")print $0;}' filename
6.A '|' delimited file having 3 columns.How to print them in reverse order?
Solutions:
awk -F'|' '{ print $3, $2, $1 }' filename
7.Print row having similar value.
Solutions:
awk '/desired value/' filename
8.How to replace a specific value with some particular value in a file in unix.
Solutions:
sed 's/find what/replace with/g' filename
9.Search for a particular value in a file
Solutions:
grep "value" filename
10.Search for a exact value in a file
Solutions:
grep -w "value" filename
11.Search for a value except the specified value.
Solutions:
grep -v "except value" filename
12.How to list a file in current directory
Solutions:
ls -ltr
13.How list all the processes running by specific user
Solutions:
ps -ef|grep user name
14.How to see the current user
Solutions:
whoami
15.How see my current working path
Solutions:
pwd
ETL THROUGH DB2
1.A typical sql of datawarehousing is actually stands for?
A SQL query is taking more response time to reply(more then 1hrs).
The sql is joning 6 tables. Primarily it is joining through single foreign
key which all the tables have. But as a typical rdbms query ,there is nothing wrong with this.
But scenario is for 3 tables that column is a unique key and for other three table thata column
is a part of unique key.So while we try to join all this 6 tables,we need to join with all
the remaining unique key for later 3 table also. Thus query response time is quite acceptable( around 5sec).
A SQL query is taking more response time to reply(more then 1hrs).
The sql is joning 6 tables. Primarily it is joining through single foreign
key which all the tables have. But as a typical rdbms query ,there is nothing wrong with this.
But scenario is for 3 tables that column is a unique key and for other three table thata column
is a part of unique key.So while we try to join all this 6 tables,we need to join with all
the remaining unique key for later 3 table also. Thus query response time is quite acceptable( around 5sec).
ETL THROUGH DATASTAGE
1.Verb "UNLOCK" is not in your VOC
Solution:-
First log in into administrator and go to execute command option of specific project. Than follow below process.
Write “SET.FILE UV VOC UV.VOC” click in execute then write “COPY FROM UV.VOC TO VOC UNLOCK” and after you have to write “UNLOCK USER number ALL”
NB:Next time onward you don’t need to write the command. May be you may need admin ID for this.
2. DataStage Routine to validate number of records as input with total number of output records.
Below example is work for single input sources with target as output link,ignore link and reject link. You can also manage multiple number of link in a job by little bit modifying below routine.
Solution:-
$INCLUDE DSINCLUDE JOBCONTROL.H
FLAG=0
RjctRowCount=0
InpRowCnt=0
OpRowCnt=0
OpRowCnt2=0
JobName1=Field(Arg1,';',1)
StgLnkName1=Field(Arg1,';',2)
Call DSLogInfo('Checking Records in Job': JobName1,RtnStgChkLnkRec)
RegStageCnt=Arg2
JobHandle1=DSAttachJob(JobName1,DSJ.ERRFATAL)
InpStgName=Field(StgLnkName1,',',1)
InpStgLnkName=Field(StgLnkName1,',',2)
InpRowCnt=DSGetLinkInfo(JobHandle1,InpStgName,InpStgLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Input Records is ': InpRowCnt, RtnStgRecCnt)
OpStgName=Field(StgLnkName1,',',3)
OpLnkName=Field(StgLnkName1,',',4)
If OpStgName<>''
Then
OpRowCnt=DSGetLinkInfo(JobHandle1,OpStgName,OpLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output link is ' : OpRowCnt,RtnStgRecCnt)
End
OpStgName2=Field(StgLnkName1,',',5)
OpLnkName2=Field(StgLnkName1,',',6)
If OpStgName2<>''
Then
OpRowCnt2=DSGetLinkInfo(JobHandle1,OpStgName2,OpLnkName2,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output2 link is ' : OpRowCnt2,RtnStgRecCnt)
End
RjctStgName=Field(StgLnkName1,',',7)
RjctLnkName=Field(StgLnkName1,',',8)
If RjctStgName<>''
Then
RjctRowCount=DSGetLinkInfo(JobHandle1,RjctStgName,RjctLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of records in reject link is ' : RjctRowCount, RtnStgRecCnt)
End
If INT(InpRowCnt)=INT(OpRowCnt)+INT(RjctRowCount)+INT(OpRowCnt2)
Then
Call DSLogInfo('Input Row count is exactly as sum of Op and Reject row counts', RtnStgRecCnt)
FLAG=FLAG
End
Else
Call DSLogInfo('There is data loss in between job',RtnStgRecCnt)
Call DSLogInfo('Aborting Routines',RtnStgRecCnt)
FLAG=FLAG+1
GOTO ExitingRtn
End
ExitingRtn:
Ans=FLAG
{Return value is 0 for No Data Lose in job and in case of data lose return value should be non-zero(1)}
Solution:-
First log in into administrator and go to execute command option of specific project. Than follow below process.
Write “SET.FILE UV VOC UV.VOC” click in execute then write “COPY FROM UV.VOC TO VOC UNLOCK” and after you have to write “UNLOCK USER number ALL”
NB:Next time onward you don’t need to write the command. May be you may need admin ID for this.
2. DataStage Routine to validate number of records as input with total number of output records.
Below example is work for single input sources with target as output link,ignore link and reject link. You can also manage multiple number of link in a job by little bit modifying below routine.
Solution:-
$INCLUDE DSINCLUDE JOBCONTROL.H
FLAG=0
RjctRowCount=0
InpRowCnt=0
OpRowCnt=0
OpRowCnt2=0
JobName1=Field(Arg1,';',1)
StgLnkName1=Field(Arg1,';',2)
Call DSLogInfo('Checking Records in Job': JobName1,RtnStgChkLnkRec)
RegStageCnt=Arg2
JobHandle1=DSAttachJob(JobName1,DSJ.ERRFATAL)
InpStgName=Field(StgLnkName1,',',1)
InpStgLnkName=Field(StgLnkName1,',',2)
InpRowCnt=DSGetLinkInfo(JobHandle1,InpStgName,InpStgLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Input Records is ': InpRowCnt, RtnStgRecCnt)
OpStgName=Field(StgLnkName1,',',3)
OpLnkName=Field(StgLnkName1,',',4)
If OpStgName<>''
Then
OpRowCnt=DSGetLinkInfo(JobHandle1,OpStgName,OpLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output link is ' : OpRowCnt,RtnStgRecCnt)
End
OpStgName2=Field(StgLnkName1,',',5)
OpLnkName2=Field(StgLnkName1,',',6)
If OpStgName2<>''
Then
OpRowCnt2=DSGetLinkInfo(JobHandle1,OpStgName2,OpLnkName2,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of Records in output2 link is ' : OpRowCnt2,RtnStgRecCnt)
End
RjctStgName=Field(StgLnkName1,',',7)
RjctLnkName=Field(StgLnkName1,',',8)
If RjctStgName<>''
Then
RjctRowCount=DSGetLinkInfo(JobHandle1,RjctStgName,RjctLnkName,DSJ.LINKROWCOUNT)
Call DSLogInfo('Number of records in reject link is ' : RjctRowCount, RtnStgRecCnt)
End
If INT(InpRowCnt)=INT(OpRowCnt)+INT(RjctRowCount)+INT(OpRowCnt2)
Then
Call DSLogInfo('Input Row count is exactly as sum of Op and Reject row counts', RtnStgRecCnt)
FLAG=FLAG
End
Else
Call DSLogInfo('There is data loss in between job',RtnStgRecCnt)
Call DSLogInfo('Aborting Routines',RtnStgRecCnt)
FLAG=FLAG+1
GOTO ExitingRtn
End
ExitingRtn:
Ans=FLAG
{Return value is 0 for No Data Lose in job and in case of data lose return value should be non-zero(1)}
Subscribe to:
Comments (Atom)