Exploratory data analysis (EDA) is one of the first and most important steps in the data analysis workflow. In the early stages, a data set is examined for its distribution, outliers and anomalies.
Data is typically visualized, plotted and manipulated without any assumptions about the underlying structure of the data. Most of the techniques use graphical representations, and with today's modern computing languages, stunning visualizations can be used to produce data-driven action. Modern programming languages, such as R and Python, are uniquely positioned to visualize complex data sets using elegant solutions.
The objectives of EDA can be summarized as follows:
Maximize insight into the data structure
Visualize potential relationships between response and predictive variables
Detect outliers and anomalies
Develop statistical models
Extract and create relevant variables