Monday, February 16, 2015

Data Integration


Data Integration

 

Data integration involves synthesizing data placed in different sources and providing users with a unified view of these data.
Say you want to build a report regarding the student attendance and you have to send a letter to each of the students' home. So you will get names, addresses, roll numbers of a student from database. You will have to search the class records to get the attendance of each student. If you want a remarks from each of your teacher on each student then you have to meet every teacher in your class. So here are many sources of data. You have to collect them and combine them in such a way that you can prepare a report. This is a simple example of data integration.
In many fields like commercial, educational, health etc data integration is used in a very large scale.
Many tools help to integrate data. Like Talend Open Studio;.

Talend Open Studio for Data Interation




Talend Open Studio for Data Interation


What is Talend ?
According to Rafael Herrera, Head of BI International at Groupon
Talend is cost-effective, easy to use, readily adaptable and extremely versatile. With the help of the graphical user interface we can easily and quickly link up a large number of source systems using the standard connectors!”

   It is an open source software vendor that provides data integration, data migration, data management, enterprise application integration and big data software and services.

   If you don't know what is Data Integration then click to know data integration.

What is Open Studio for Data Integration ?

   It is an open source application. It has got graphical development environment for designing and developing data integration job. It has many utility functions that helps to integrate data with the application faster and easier. Exporting any excel or delimited file data to database becomes easier with this application.

   Following example will help to introduce you with the tool.

Example 1 (Inserting CSV file data into database) :

Prerequisite : Install the Talend Open Studio for Data Integration. If you have not installed then please read the installation guide. <a href=””>Talend Open Studio for Data Integration Installation Guide</a>.

Step 1 : Start the Talend Open Studio for Data Integration.

Step 2: Create a CSV file in your file system. Say you have created Student.csv and the contents of the file is as below

                                        Roll,Name,Age,Address
                                        1,Ajijul,20,Kolkata
                                        2,Anirban,19,Mumbai
                                        3,Ankita,20,Delhi
                                        4,Monojit,22,Chennai
                                        5,Saumajeet,21,Kolkata
Step 3: Create a new job

Create new job image
new Job Window
  • Right click on Job Design link that is in the Repository tab.
  • Click on Create Job
  • A overlay window will come that will contain different fields. Fill them up. Refer the below screen shot.
  • Then click on Finish.
Step 4: Now find the Pallet tab right of your IDE.
Step 5: Under File/Input directory you will find tFileInputDelimited. Drag that to the workarea. Refer the screen shot.

adding tFileInputDelimited component image
tFileInputDelimited

Step 6: Under Processing directory you will find tMap. Drag that into your work area.
Step 7: Under File/Output directory you will find tFileOutputDelimited. Drag that to work area.

Now the work area will look like this -

componenets in work area image
components in workspace
Step 8: It is recommended to rename all the component that is in you work area to increase usability. Click on the name of the component and then click again (Not double click. A click then 1 sec pause then again click) then enter a relevant name you want to give. So I am changing tFileinputDelimited_2 to StudentInput, tMap to StudentMap, tFileOutputDelimited_1 to StdentOutput. Refer the screen shot.

rename the components image
rename

Note: You can see the yellow caution symbols in each component. Yo can hover on it to get more details. Lets leave that for future use. In your editor you will also see two tabs. One is Designer and other one is Code. You are currently on Designer tab. You can toggle between these two tabs. The Code tab contains the Java code that is generated in the back end according to your design. The java code is read-only. You can't modify it but it will help you to debug your design and fix any issue. You will gradually learn.

Step 9: Now click on the StudentInput component. Under component tab, you can change input file path. You can change them if you want. Refer the screen shot.

 
StdeuntInput parameters image
Student input parameter

Here '\n' is row separator. ',' is filed separator, Header 1 means you are including header. If you have n headers you will write n there. You can change any of these according to your input file format.


Step 10: Now you have to map StudentInput to StudentOutput using StudentMap. Right click on the StudentInput component. Click on Main in the menu then click on Main in the sub menu. A link with a plug icon will come place that onto StudentMap.

Mapping StudentInput image
Mapping input file


Then right click on the StudentMapComponent click on the menu Row. Then click on the *New output*(Main) sub menu then another link will come place that onto StudentMapOutput. Give a name to the output say result. Refer the screen shot. 

StudentMap output image
StudentMap Output

 
Rename the input link for ease of use. After that it will look like this -

Step 11: Now click on StudentMap component. Under component tab select Basic Setting. Then click on the button next to Map Editor. This will open a window like this -

StudentMap editor image
StdentMap Editor


Step 12: Add Columns in schema editor for input and result using '+' button . You can also delete the same using '–' button. Refer the screen shot.

Step 13: Now you have to map input columns to output columns. If you have given the same name then click on Automap. This will automatically map input firlds to output fields. Otherwise you can drag input fields to output fields. The click OK. Refer the screen shot.

Step14: Now click on StudentOutput. You can modify it's path like StudentInput.

Now you Job is ready. You can execute the job.

Executing Job:

Step 1: Click on the Run Tab.

Step 2: Under Basic Run menu click on the Run button. The program will run and your output file will be generated. Refer the screen shot.

Run talend job image
Run Talend job
Your output should be : 
          StudentOut.csv

                                                      1,Ajijul,20,Kolkata,AjijulKolkata
                                                      2,Anirban,19,Mumbai,AnirbanMumbai
                                                      3,Ankita,20,Delhi,AnkitaDelhi
                                                      4,Monojit,22,Chennai,MonojitChennai
                                                      5,Saumajeet,21,Kolkata,SaumajeetKolkata

_______________________________________________________________________________________________

    Hope you enjoyed learning. Any feed back will be helpful.

Wednesday, February 4, 2015

Stack


<data:blog.title/> <data:blog.pageName/>
Concept Of Stack

Introduction

   Let us learn the concept of Stack data structure in Computer Science. After that we will discuss about the implementation of Stack in Java and many problems that can be solved using Stack data structure. Finally we will discuss interview questions on Stack that will help you whenever you needed.
    Let us take a scenario. You and some of your nearest friends are hosting a party and you have invited all your friends in it. Every one of hosts have been given a task like welcoming guests, giving their drinks, foods etc.
    You are told to serve plates to every ones table and being an intelligent boy / girl you are doing it efficiently. If you have served all the plates to the guests then you just performed a Stack operation. Have you got the point? If yes then congratulation. If no then it's not a problem I will tell you.
    Say you are carrying plates in your hand and your hand can hold up to 10 plates at a time. So you are picking one plate from the top of the pile of 10 plates on your hand and placing it in front of a guest and so on. When you are out of plates then you are again taking another pile of 10 plates and serving the gusts and so till every one gets one. Notice the plate you are placing on your hand at the first time (i.e. the first plate in the pile) gets picked by you at the last time and the last plate of the pile is on the top of the pile so you will pick the last plate at the first time. So if you call the pile a Stack then no one is going to object you. Because that is correct.

Stack is an important data structure in Computer Science.

Properties

     So we can summarize some it's property from the above discussion.
1. It's a linear data structure.
2. We can add element to the top of the stack. This operation is called PUSH.

    Stack A (initially empty and size 4)
    Now perform push operation on A :       

PUSH Operation Image


3. We can remove element from the top of the stack. This operation is called POP.
    Stack A (contains 4 elements 1,2,3&4)

     Now perform POP operation on A

         POP Operation Image

4. Element that is PUSHED at first can be POPED at last and that is PUSHED at last can be POPED at first. This property is called Last In First Out (LIFO).
5. You have to mention the size of a stack before declaring it. It's a static data structure.
6. If you try to PUSH more elements than it can hold (i.e. the size) then a situation will arise that is called STACK OVERFLOW.

Stack Overflow Image
        
7. If you try to POP an element when it has no element then a situation will arise that is called STACK UNDERFLOW.
   
Stack Underflow Image
                 
8. Stack is a recursive data structure. If it's not empty then it contains a Stack Top and the rest part which is a Stack.

Applications

    You must know the applications of Stack in Computer Science. Some innovative solution has been possible because of Stack's LIFO property.
1. It's used to validate mathematical expressions. Compiler's syntax check for matching braces is implemented using stack.
2. In editor undo operations are handled using Stack.
3. If you are solving any problem using Backtracking mechanism then you can use stack.
4. Recursion uses Stack.
5. Space for parameters and local variables is created internally using a stack.



       

Monday, February 2, 2015

Linked List

<h2>Linked List Concept | Linked List for Beginners | Let us Learn Linked List</h2>
Basic Concept of Singly Linked List
    Lets start with a game. You have been given the name of a person who lives in a particular address. But you don't know the person. There will be multiple other person on the way who knows the address of next person only. So how will you reach to the proper person who's name has been given to you? (Provided none of your helpers is liar). It's not a big deal. You will start your journey and ask a person for his name and if the name is same with the given name then you have found him otherwise you will ask for the address of next person and so on. Finally you will get to the person if the person really exists. If you got this far then I will say that you just learned the basic concept of linked list.
    Yes you are right. Linked list is nothing but a sequence of nodes that contains the value and the address of the next node. The final node will contain the value and a NULL address that points to nothing.
    In the above pic the 1st node contains Data = A and Address of next node(i.e. contains item B) = X
    2nd node contains Data = B and Address of next node(i.e.  contains item C) = Y
    3rd node contains Data = C and Address of next node(i.e.  contains item D) = Z
    4th node contains Data = D and Address = NULL that points to nothing. Its end of the linked list.
    So, this one of the important data structures in computer science.
    Now what are the properties that we have got so far from the description of linked?
    Please validate the points :
    1. It's linear (i.e. one dimensional).
    2. Each node contains the data and an address to the next node which is similar to next node unless it reaches to it's last node.
    3. We can't access any node directly. We have to come across all the preceding nodes. i.e. sequential access is needed.
    Now you need a place on the street for each people in our game. Same is applicable here also, you need primary memory addresses to keep the data and address in nodes. We can add or remove as many people as we want on the street at random places during the game to test your patient, we really don't need to worry about specifying number of people on the street. Likewise during run time the linked list will automatically allocate memory to it for creating new nodes. So another feature comes from this discussion.
    4. Linked list is dynamic data structure which allocates the needed memory at the run time.
    5. Insertion and deletion of data in a node is easy.
    6. Incontiguous memory allocation.
    Now what if I tell you to go back to your initial position after you got the given person. Difficult ! Yeah little bit. So add another point based on the discussion so far.
    7. Traversing backward is difficult for linked list.