Process of Data Extraction in Data Science




0 members


Data Extraction is basically defined as the process of making the data more simpler and efficient. This basically involves two steps:

1 Data Pre-Processing: The first step involves data pre-processing in which a large data-set is taken and processed to a very basic form so that it contains one observation per row and one variable per column. This data set is very easy to understand and efficient to manipulate.

2 Data-Manipulation: The next step involved in Data Extraction is Data Manipulation in which the pre-processed data set is modified to a very simpler and efficient form. Then, this data so manipulated is useful for data visualization. This process also involves in modifying the data into available set of variables.

Before Data Extraction, this raw data is collected from a source. So here the data so obtained is usually not in a state to be processed. This kind of a data is usually not in a state for data analysis and data pre-processing. This kind of a data is referred as a source data.

In case of Data Extraction in Data Science there are basically two types of data sets available:

1 Structured Data Set: In this case the entire process of data extraction is generally performed within the source system.

2 Unstructured Data Set: While in this case, a large part of the job is set to the data in a particular way so that it can be very easily extracted.

To Read more about Data Extraction: