Data wrangling is the process of identifying, structuring, cleaning and transforming the data into usable format.
What are the steps in Data wrangling process?
a. Data Identification or Discovering: Based on the project requirement, relevant raw data needs to be identified from the heap of data and need to be separated out for further processing.
b. Structuring data: Identified data may have a raw format and it need to be separated out or structured in a readable format by using delimiters for CSV file or convert it to excel format for easy processing.
c. Cleaning: Structured data has lot of anomalies or irrelevant data which need to be fixed before data is processed.
Data which is not required needs to be removed from the dataset to avoid complexity and reduce the processing time during analysis.
d. Data validation: Data need to be verified and validate to ensure that it accomplishes the business objective and have complete details required for processing.