Wednesday, April 15, 2020

Coronavirus| Data Analysis with python in easy way

Coronavirus Data Analysis with python in easy way



Hi, there hope you all are safe and doing something productive in these quarantine days.
today I am back with a very interesting topic of 2020 that is data analysis.
today we'll analyze the cases of coronavirus over the world.


Important Note                                                                          


In this tutorial I am personally using jupyter notebook in my laptop but but if you don't have that much powerful laptop or you don't know how to setup jupyter notebook no worries I already taken care of you just visit the link below a jupyter notebook hosted on google cloud opens in front of you.

 the part of exploring different features of this notebook I'll leave it on you and you can watch tutorials on youtube to speed up the learning process.


Important packages to install                                      


ok, guys, let's get back to work below python packages are very important for this tutorial if you are using jupyter notebook offline install all packages mention below but if you're using it online from the link above just install package number 2 and 3 how to do that just copy-paste below lines in the shell one by one...

1. pip install pandas                                                        
2. pip install geopandas                                                  
3. pip install descartes                                                    
4. pip install matplotlib                                                  
5. pip install request                                                       

done with this part lets move ahead......



Initialize packages                                                           


ok now lets import all the packages we had installed just type...

import pandas as pd                                                 
import geopandas as god                                         
import descartes                                                       
import matplotlib.pyplot as plt                                 
import requests                                                         

if any error occurs please let me know in the comment section with a proper screenshot then or at StackOverflow.

if no errors great job lets begin the real game...



Initializing the variables                                                  


Now, let's fetch the data of covid19 first to do that.....
we'll initialize three variables namely confirmed_cases, recovered_cases and death_cases taking fresh and updated CSV from Johns Hopkins University Github repository..... 

confirmed_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

recovered_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"

death_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"


Reading with pandas                                                        


now we will read CSV files fetched by variables vie pd.read_csv() function.
here we created three data frames namely df_confirmed, df_recovered and df_deaths...

df_confirmed = pd.read_csv(confirmed_cases_url)           
df_recovered = pd.read_csv(recovered_cases_url)             
df_deaths = pd.read_csv(death_cases_url)                          


Viewing the data                                                               


after doing this much work I am pretty much sure that you were interested in seeing how the data is stored inside these data frames so let's check it out

In python, we have three ways to see data

1. head() →  to see the first 5 rows of the table in the data frame we write...

 df_confirmed.head()                                                      


output:

coronavirus data


2. tail() →  similarly, to see the last five lines we write...

df_confirmed.tail()                                                          

output:
corona virus data


3. name →  to see entire data we use just write the data frame name...

df_confirmed                                                                   

output:
coronavirus data


Task: Try the above three things with df_recovered and df_deaths.


Additional stuff...                                                                 


ok let's explore some more cool python function commonly used in data analysis

1. suppose someone asked you whats the matrix of a particular data frame here the matrix is nothing but the number of rows and columns. we use shape function for that so we can easily find it by writing...

df_confirmed.shape                                                       

output:
            (264, 88)                                                              

2. if you were just interested in finding out how many columns or heading are there is a data frame just write...

df_confirmed.columns                                                          

output:
covd19 data


Melting the columns                                                            


now before we move ahead lets first see what melting is actually like an ice cube when it melts the water drops fell vertically on the ground in similar fashion in a table when we convert rows into columns we call it melting now clear...

ok now let's see how our table looks by typing...

df_confirmed                                                                   

output:



in the above image, you can clearly see we have four main columns in Green Province/State, Country/Region, Lat, Long, and the Red ones are just dates now the question is can't we make the fifth column which contains all these dates so data visualization become much easier the answer is yes just type...

confirm_df = df_confirmed.melt(id_vars=['Province/State','Country/Region','Lat','Long'])            

output:
covid cases

Woh! we just did it now we have all our dates in a single column with corresponding values...


Renaming                                                                            


now we have successfully melted our data but the fifth and sixth column names are not looking so good to me lets change the variable to date and value to confirmed.
we can do it by just typing... 

confirm_df.rename(columns={'variable':'Date','value':'Confirmed'},inplace=True)               
confirm.tail()                                                                       

output:
corona19 cases

take a glance at the columns we have just changed the variable to date and value to confirmed.


Joining                                                                               


isn't it is good instead of calling confirm, death recovered again and again if we can club them together in a single file and call that file to see entire data yaa it can be possible.

in the code below we will join all three data frames together inside a final_df variable to do this just type...

final_df = confirm_df.join(recovered_df["Recovered"]).join(deaths_df["Deaths"])               

output:
corona pandemic data

 see the columns we have successfully merged all three CSV files together. 



Plotting                                                                                


till now we have just seen our data in table format now its time to see it visually by just typing...

gdf01 =god.GeoDataFrame(final_df,geometry = god.points_from_xy(final_df['Long'],final_df['Lat']))
gdf01.plot(figsize=(20,10))                                   

output:
covid19 cases of world map


this plotting is not looking much appealing as we have covid19 data of world lets try to plot this data on a world map.

first, load the world map by typing the following...

world = god.read_file(god.datasets.get_path('naturalearth_lowres'))      
ax = world.plot(figsize=(20,10))                                                            
ax.axis('off')                                                                                        

output:
covid19 worldmap

now we'll plot the points on the world map we can do by typing...


fig,ax =plt.subplots(figsize=(50,20))                                                                               
gdf01.plot(cmap='Purples',ax=ax)                                                                                   
world.geometry.boundary.plot(color=None,edgecolor='k',linewidth=1,ax=ax)              
ax.axis('off')                                                                                                                      

output...
covid-19-cases-in-world


finally, we did it together this is my first blog on data visualization practical,
to get the full code visit ...

GitHub:- https://github.com/Lalitsingh5522/Covid19

let me know if you want more tutorials like this or a topic of your choice...😊  




Thanks!!! for Visiting Asaanhai or lucky5522  😀😀















2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. A rhetoric blog that explains the vast usage of Python framework with its library packages for this COVID-19 pandemic. Good to find the best of python packages that have the ability to in-take volumes of data sets and then compute the vital statistics for understanding the transmission, spread of coronavirus globally. This blog should be of an immense interest and a complete information digest for data scientist and enterprise application developers. Thanks for the blog post.

    Farah from Way2Smile - Data Analytics Consulting Company in Dubai.

    ReplyDelete

Everything Need To Know About BATTLEGROUNDS MOBILE INDIA

 BATTLEGROUNDS MOBILE INDIA  Finally, Krafton unveiled India’s beloved battle-royale game PUBG Mobile, but with a new name – Battlegrounds M...