Analysis using Plotly, Seaborn and Folium
What and Why Exploratory Data Analysis?
Data analysts/ Scientists use exploratory data analysis (EDA) to analyze and investigate data and datasets and summarize their key features, often employing data visualization methods. It helps in understanding the dataset very easily. Helps in manipulating data for further use.
EDA is used to see what data can reveal beyond the formal modelling or hypothesis testing task and provides better understanding of data set variables and the relationship between them. Originally developed by American mathematician Jhon Tukey in the 1970s, EDA techniques continue to be a widely used method in the data discovery process today. EDA can help us deliver great business results, by improving our existing knowledge and can also help in giving out new insights that we might not be aware of
Tools Used
opendatasets
(Jovian library to download a Kaggle dataset)
Data cleaning:
Pandas
Numpy
Data Visualization
Matplotlib
Seaborn
plotly
Heatmap
About the Project
In this project, we are trying to Analyse Global Cargo Data. This selected dataset covers import and export volumes for 5.000 commodities across most countries on Earth over the last 30 years. Personally, I find commodities quite interesting because they help to analyze not only the income of a country but also international behaviors and relations of countries.
Steps followed
Step 1: Selecting a real-world dataset:
- We will download our dataset from
Kaggle
using the libraryopendataset
created byJovian
which imports the datasets directly from the 'Kaggle' website - import opendatasets as od
- dataset = ‘https://www.kaggle.com/ unitednations/global-commodity-trade-statistics’ od.download(dataset)
Step 2: Performing data preparation & cleaning
- We will load the dataset into a data Frame using Pandas, explore the different columns and range of values, handle missing values and incorrect datatypes, and make our data ready to use for our analysis.
Step 3: Perform exploratory analysis and visualization and asking interesting questions
- We will compute the mean, sum, range, and other interesting statistics for numeric columns, explore distributions of numeric columns using histogram etc., make a note of interesting insights from the exploratory analysis, ask interesting questions about your dataset, and look for their answers through visualizing our data.
Step 4: Summarizing inferences & writing a conclusion
- We will write a summary of what we’ve learnt from our analysis, share ideas for future work that can be explored in future with this data and share links to resources we found useful during our analysis.
How to Run the Code
option 1: Running using free online resources (1-click, recommended)
The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Collab. You can also select “Run on Binder” or “Run on Kaggle”, but you’ll need to create an account on Google Collab or Kaggle to use these platforms. Also, Collab will provide the most memory that will be needed for this project to run.
Option 2: Running on your computer locally
To run the code on your computer locally, you’ll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.
Jupyter Notebooks: This is a Jupyter notebook — a document made of cells. Each cell can contain code written in Python or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files, etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don’t be afraid to mess around with the code & break things — you’ll learn a lot by encountering and fixing errors. You can use the “Kernel > Restart & Clear Output” menu option to clear all outputs and start again from the top.
Downloading the Dataset
Step 1: We will download the dataset from “https://www.kaggle.com/" using the opendatasets
library created by Jovian. So, let's begin by downloading the data, and listing the files within the dataset: 'https://www.kaggle.com/unitednations/global-commodity-trade-statistics'
Data preparation and Cleaning
Data cleaning is the process by which we make sure that the data that we are using for our analysis is completely ready. It means we don't have any duplicates, or missing values, the data is in the right format, not corrupted and thus ready to be used for analysis.
As we can see we have 10L rows and 10 columns to work with. OfCourse we cannot work with all of this for what we do today. So let us start working on cleaning this data. i.e. selecting the required data and putting it in the form of our analysis. Before that we shall see a sample of data, type of data and using describe
method we shall see various insights of dataset.
As we can see that by default float64 and integer64 bit datatypes are used, though the values that have been used are not that large and can be stored in a 32bit datatype as well.
We can convert the dataset 64bit datatypes to 32 bit so that we can increase the speed and decrease the space the dataset holds.
Now we have finally read 10L rows from the ‘trade_data_csv’ with our selected datatypes. But we still might have some duplicated and missing values. We have solved it in the notebook which has been embedded here. You can view the file by clicking on “View File”. Let's jump directly on to Analysis and Visualization section. Before that let’s describe every column in our dataset.
Description of the columns
country_or_area
: country name of recordyear
: year in which the trade has taken placecomm_code
: the harmonized and coding system referredcommodity
: description of a particular commodity codeflow
: flow of trade i.e. export, import, otherstrade_usd
: value of the trade in USD.weight_kg
: weight of the commodity in kilogramquantity_name
: description of the quantity measurement type given the type of item (i.e. number of items, weights in, etc.quantity
: count of the quantity of a given item based on the quantity namecategory
: category to identify commodity
Exploratory Analysis and Visualization
Here, let us try to understand the dataset better with visualization. Which will also help us to have answers to some interesting questions delivering meaningful insights into data
Let us try to understand the correlation between numerical values if there are any.
As we can see itself in the plot, there is no correlation between numerical columns in our dataset.
Let us see how much data we have gathered over the years, so that we can have an idea about timeline and quantity of data
It looks like the dataset has gained more data from 2000 to 2015. Probably the internet revolution helped to get relevant data more easily. But reduced data at the end may be manual entry error nothing to do with actual cargo transportation.
Let's create a Choropleth map to better visualize the relative responses from various countries.
Now, with the above map we can clearly understand how best the countries have responded while collecting the global cargo data. It also gives another insight here that the USA is handling its data collection strategy very well compared to other countries.
So, with all these insights about data let us move on to the question asking answering section.
Asking and Answering Questions
- What is the stat of import and export of India according to this data??
So, we can say that Over the last 30 Years India’s import and export has increased significantly. Although it got reduced in 2009 may be due to global market crash it recovered Instantly. Another important thing to notice here is that increase in India’s export Value increasing year by year.
2.What is the Value of Trade of India while comparing total world trade all these years according to the data which is available now??
By looking at the above graph, we can say that even when India’s presence in Global cargo marketing is increasing when we compare it to entire world, it's far too lesser.
3.What is the Percentage of India Trade in World Trade in %??
It appears that, although India’s presence in market was increasing around 1990 to 1995. it got reduced suddenly. The reason could be the Indo-Pak war. And then India tried to recover but again faces drawback from 2009 recession. But suddenly after that India recovered soon.
4.Which are the top 15 countries in trading when we consider Volume?
As you can see these are the top 15 countries in trading when we consider volume as criteria.
5.What is the position of India in Trading in 2000?
As you can see India’s position in global trading in 2000 is only about 1.01% of global market. Europe, China, and Japan have biggest position.
6.What is the position of India in Trading in 2015?
As we can see India’s position in global cargo is 1.56% as of 2015. But It seems that some major changes have happened. Let us compare above last two plots so that we can get some interesting insights.
When we compare India’s trading volume in 2000 it was only 1.01% and increased to 1.56% of total world trade volume by 2015. One eye catching part here is that China's trading volume almost doubled from 3.65% in 2000 to 7.9% in 2015 (in the same time limit), whereas Japan’s position reduced from 4.36% to 2.79%.
7.What are the top 10 commodities in Indian Import Trade (USD), 2000 vs 2014?
As the graph shows from 200 to 2015 India’s top commodity for import is concentrated on different oils like palm oil, crude, sunflower oil etc.
8.What are the top 10 commodities in Indian export Trade(USD) , 2000 vs 2016?
It seems that, from 2000 to 2016 India’s top commodity for export is concentrated on raw coffee and tea, Ground nuts, cashew and other spice items.
Inference and conclusion
Here is the conclusion that we could draw about the Global Cargo Market from our Analysis:
1.We discovered about India’s market share in the global Market and raise of Indian market share
2.We can say that India is far too behind compared other in Global cargo trading by considering the value of trade, or in other words we can say that India still has long way to go if it wants to achieve dominance in global cargo market.
3.Despite of seeing many difficulties, India recovered faster and its growing at an excellent rate in global cargo trading.
4.When we compare India’s trading volume in 2000 it was only 1.01% and increased to 1.56% of total world trade volume by 2015. One eye catching part here is that China's trading volume almost doubled from 3.65% in 2000 to 7.9% in the same time limit.
5.From 2000 to 2015 India’s top commodity for import is concentrated on different oils like palm oil, crude, etc. And from 2000 to 2006 the top commodity for export is concentrated on spice items.
6.Even after the increase in export of goods of India compared to some of the countries like China, India must grow faster in-order to catch the world cargo trading market and it should use the available opportunity
Future Work
In future, I would like to improve this project further by taking the following actions on this dataset
- Analyse more and different columns from the dataset to derive some more result
- Asking more questions related to some of the specific commodities
- Visualizing answers to some more questions.
- Using volume of the specific commodity by specific country to know about global major distributor of specific commodity/goods.
Coffee:
Meanwhile, if you find this blog insightful, you can buy me a coffee!! 🤗🤗 at : https://www.buymeacoffee.com/hebbaraditya
References
[1] Aakash N S. Analyzing Tabular Data with Pandas. https://jovian.ai/aakashns/python-pandas-data-analysis
[2] Matplotlib Documentation https://matplotlib.org
[3] Stack overflow: https://stackoverflow.com
[4] Folium Documentation http://python-visualization.github.io/folium/
[5] Aakash N S. Data Visualization using Python Matplotlib and Seaborn. https://jovian.ai/aakashns/python-matplotlib-data-visualization
[6] Aakash N S. Advanced Data Analysis Techniques with Python & Pandas. https://jovian.ai/aakashns/advanced-data-analysis-pandas
[7] Aakash N S. Interactive Visualization with Plotly, 2021. https://jovian.ai/aakashns/interactive-visualization-plotly
[8] Aakash N S. plotly-line-chart, 2021. https://jovian.ai/aakashns/plotly-line-chart
[9] Plotly Documentation. https://plotly.com/python/