Dear Readers, Welcome to Datastage Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your Job interview for the subject of Datastage. These Datastage Questions are very important for campus placement test and job interviews. As per my experience good interviewers hardly plan to ask any particular questions during your Job interview and these model questions are asked in the online technical test and interview of many IT companies.
? DataStage support parallel processing which informatica doesn’t.
? Links are object in the DataStage ,in Informatica it’s a port to port connectivity .
? In Informatica its easy to implement Slowly Changing Dimensions which is little bit complex in DataStage.
? DataStage doesn’t support complete error handling.
? Client Components - Administrator, Director, Manager, and Designer.
? Server Components - Repository, Server and Plug-ins.
System variables comprise of a set of variables which are used to get system information and they can be accessed from a transformer or a routine. They are read only and start with an @.
Many new stages were introduced compared to datastage version 7.0. In server jobs we have stored procedure stage, CDC Stage ,command stage and generate report option was there in file tab. In job sequence many stages like startloop activity, end loop activity,terminate loop activity and user variables activities were introduced. In parallel jobs surrogate key stage, stored procedure stage were introduced.
? Design jobs for Extraction, Transformation and Loading(ETL).
? Ideal tool data integration projects such as data warehouses, data marts and system migrations.
? Import,export,create and managed metadata for use within jobs.
? Schedule, run and monitor jobs all within DataStage
? Adminsters your Datastage development and execution environments.
? Merge is a stage that is available in both parallel and server jobs.
? The merge stage is used to join two tables(server/parallel) or two tables/datasets(parallel).
? Merge requires that the master table/dataset and the update table/dataset to be sorted.
? Merge is performed on a key field, and the key field is mandatory in the master and update dataset/table.
A sequencer allows you to synchronize the control flow of multiple activities in a job sequence. It can have multiple input triggers as well as multiple output triggers.
Version Control stores different versions of DS jobs runs different versions of same job reverts to previos version of a job view version histories.
Passive Stages are used for data extraction and loading ,while Active Stage are used to implements and process the business rules.
DataStage has the following features to aid the design and processing required to build a data warehouse :
? Uses graphical design tools. With simple point and click techniques you can draw a scheme to represent your processing requirements.
? Extracts data from any number or types of database.
? Handles all the metadata definitions required to define your data warehouse. You can view and modify the table definitions at any point during the design of your application.
? Aggregates data. You can modify SQL SELECT statements used to extract data.
? Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
? Loads the data warehouse.
An operational data source usually contains records of individual transactions such as product sales. If the user of a data warehouse only needs a summed total, you can reduce records to a more manageable number by aggregating the data.
If we used the IPC Stage between source and target .One process will handling the communication from sequential file stage to IPC stage, and other will handling communication from IPC stage to ODBC stage. As soon as the Sequential File stage has opened its output link, the IPC stage can start passing data to the ODBC stage.
An intermediate processing variable that retains value during read and doesn’t pass the value into target column.
There are Two types of containers namely:
? Local Container
? Shared Container
DataStage stored his repository in IBM Universe Database.
These are the temporary variables created in transformer for calculation.
? Using the Routine
? KeyMgtGetNextVal
? KeyMgtGetNextValConn
? They can also be done by Oracle Sequence.
Server Jobs works in sequential way while parallel jobs work in parallel fashion (Parallel Extender work on the principal of pipeline and partition) for Inpur/Output processing.
Account option create the hashfile in the current project itself using condition and for Directory option we have to specify the path of the directory.
? Server Jobs
? Parallel Jobs
? Mainframe Jobs
Routines are the functions which we develop in BASIC Code for required tasks, which we Datastage is not fully supported (Complex).
These Parameters are used to provide Administrative access and change run time values of the job.EDIT > JOBPARAMETERSIn that Parameters Tab we can define the name, prompt, type, value.
Hash file stores the data based on hash algorithm and on a key value. A sequential file is just a file with no key column. Hash file used as a reference for look up. Sequential file cannot.
A design interface used to create DataStage applications(known as jobs). Each job specifies the data sources, the transforms required, and the destination of the data. Jobs are compiled to create executables that are scheduled by the Director and run by the Server.
Datastage contains two components :
Client Components :
? Datastage Designer
? Datastage Director
? Datastage Manager
? Datastage Administrator
Server components :
? Repository
? Datastage Server
? Datastage Package Installer
In server canvas we can improve performance in two ways:
? Firstly we can increase the memory by enabling interprocess row buffering in job properties
? Secondly by inserting an IPC stage we break a process into two processes.We can use this stage to connect two passive stages or two active stages.
This Stage is used to Bulk Load the Oracle Target Database.
Datastage understands the architecture of the system through this file(APT_CONFIG_FILE). For example this file consists information of node names, disk storage information etc.
APT_CONFIG is just an environment variable used to idetify the *.apt file. Dont confuse that with *.apt file that has the node's information and Configuration of SMP/MMP server.
These are called as plug-in stages orabulk is used when v have bulk data in oracle then v go for orabulk for other than oracle database we go for bcp stages.
A fact table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules.
The Merge stage combines a sorted master data set with one or more sorted update data sets. The columns from the records in the master and update data sets are merged so that the output record contains all the columns from the master record plus any additional columns from each update record.
Using the DataStage Director we can schedule or monitor the job.
Containers are the reusable set of stages.
There are two types of container are :
? Local Container
? Shared Container
Local Container is local to the particular job in which we developed the container.
Shared Container is can be used in any other jobs also.
Using DataStage Director and DataStage Administration Client.
HASH file is used for Lookup while populating fact table ,basically it keeps the data in buffer and use Hashing algorithm which improves the performance of the job.It uses GENERAL or SEQ.NUM. algorithm.
? A datawarehouse is a repository(centralized as well as distributed) of Data, able to answer any adhoc,analytical,historical or complex queries.
? Metadata is data about data.
? The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element.
? Metadata is stored in a data dictionary and repository.
For Columns - We can directly map the single source columns to two different targets.
For Rows - We have to put some constraint (condition ).
There are three types of views in Datastage Director :
Job view :
? Dates of Jobs Compiled.
Log view :
? Status of Jobs last Run.
Status view :
? Warning messages, Event messages, Program Generated messages.
Sequencers are job control programs that execute other jobs with preset job parameters.
? Validating
? Scheduling
? Executing
? Monitoring Jobs (server Jobs).
It uses GENERAL or SEQ.NUM. algorithm
Folder contains repository objects such as sources, targets, mappings, transformation which are helps logically organize our data warehouse.
? Active Stage : In which Transformation, Aggregation etc are done. Ex: Transformer, Aggregator.
? Passive Stage : In which Read/Write Process is done. Ex: ODBC, Sequential, Hash File.
It is a JAVA engine running at the background.
? Server Job
? Parallel Job
? Sequencer Job
? Container Job.
Using DataStage Manager.
Stage is used to Bulk Load the Oracle Target Database.
These are the Date functions, which we use to convert the Dates from internal format to External format
? iConv - External to Internal
? oConv- Internal to External
? Oracle bulk
? BCP Load
? A dataware house is a decision support database for organisational needs.It is subject oriented,non volatile,integrated ,time varient collect of data.
? ODS(Operational Data Source) is a integrated collection of related information . it contains maximum 90 days information.
? ODS is nothing but operational data store is the part of transactional database. this db keeps integrated data from different tdb and allow common operations across organisation.
Eg: banking transaction.
? In simple terms ODS is dynamic data.
Using Pivot Stage.
Basically Environment variable is predefined variable those we can use while creating DS job.We can set eithere as Project level or Job level.Once we set specific variable that variable will be availabe into the project/job.
? Stage Variable - An intermediate processing variable that retains value during read and does not pass the value into target column.
? Derivation - Expression that specifies value to be passed on to the target column.
? Constant - Conditions that are either true or false that specifies flow of data with a link.
If you mean by Oracle Call Interface (OCI), it is a set of low-level APIs used to interact with Oracle databases. It allows one to use operations like logon, execute, parss etc. using a C or C++ program.