World Happiness Report

Part I. Describe The Data

Context

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

Content

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.

Data has been downloaded from Kaggle, data is including 2015, 2016 and 2017 data. I will try to find the happiest countries and unhappy group with reasons.

1.Country

Name of the country

2.Happiness Rank

Rank of the country based on the Happiness Score.

3.Happiness Score

A metric measured in 2016 by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest"

4.Whisker High

Lower Confidence Interval of the Happiness Score

5.Whisker Low

Upper Confidence Interval of the Happiness Score

6.Economy (GDP per Capita)

The extent to which GDP contributes to the calculation of the Happiness Score.

7.Family

The extent to which Family contributes to the calculation of the Happiness Score

8.Health (Life Expectancy)

The extent to which Life expectancy contributed to the calculation of the Happiness Score

9.Freedom

The extent to which Freedom contributed to the calculation of the Happiness Score

10.Trust (Government Corruption)

The extent to which Perception of Corruption contributes to Happiness Score

11.Generosity

The extent to which Generosity contributed to the calculation of the Happiness Score

12.Dystopia Residual

The extent to which Dystopia Residual contributed to the calculation of the Happiness Score.

In [14]:
#We will load necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import random
import seaborn as sns

from pandas.plotting import scatter_matrix
from pandas.tools.plotting import scatter_matrix

#!pip install plotly

import plotly 
plotly.tools.set_credentials_file(username='meryemdikmen', api_key='HkPHBVsn5LcaL3ogcTm2')
In [15]:
#Data is imported and checked the variables
df5 = pd.read_csv("C:/Users/merye/Anaconda3/Datasets/World_Happiness/2015.csv", sep =',') 
df6= pd.read_csv("C:/Users/merye/Anaconda3/Datasets/World_Happiness/2016.csv", sep =',')
df7= pd.read_csv("C:/Users/merye/Anaconda3/Datasets/World_Happiness/2017.csv", sep =',')
#frames = [df5, df6, df7]
#df = pd.concat(frames)
df7.head() #See 5 head values of data 
Out[15]:
Country Happiness.Rank Happiness.Score Whisker.high Whisker.low Economy..GDP.per.Capita. Family Health..Life.Expectancy. Freedom Generosity Trust..Government.Corruption. Dystopia.Residual
0 Norway 1 7.537 7.594445 7.479556 1.616463 1.533524 0.796667 0.635423 0.362012 0.315964 2.277027
1 Denmark 2 7.522 7.581728 7.462272 1.482383 1.551122 0.792566 0.626007 0.355280 0.400770 2.313707
2 Iceland 3 7.504 7.622030 7.385970 1.480633 1.610574 0.833552 0.627163 0.475540 0.153527 2.322715
3 Switzerland 4 7.494 7.561772 7.426227 1.564980 1.516912 0.858131 0.620071 0.290549 0.367007 2.276716
4 Finland 5 7.469 7.527542 7.410458 1.443572 1.540247 0.809158 0.617951 0.245483 0.382612 2.430182

We can see above table that top 5 happiest countries acc to ranking, Happiness Score is 7,537 out of 10.

In [16]:
#Check last 5 values of the data 
df7.tail()
Out[16]:
Country Happiness.Rank Happiness.Score Whisker.high Whisker.low Economy..GDP.per.Capita. Family Health..Life.Expectancy. Freedom Generosity Trust..Government.Corruption. Dystopia.Residual
150 Rwanda 151 3.471 3.543030 3.398970 0.368746 0.945707 0.326425 0.581844 0.252756 0.455220 0.540061
151 Syria 152 3.462 3.663669 3.260331 0.777153 0.396103 0.500533 0.081539 0.493664 0.151347 1.061574
152 Tanzania 153 3.349 3.461430 3.236570 0.511136 1.041990 0.364509 0.390018 0.354256 0.066035 0.621130
153 Burundi 154 2.905 3.074690 2.735310 0.091623 0.629794 0.151611 0.059901 0.204435 0.084148 1.683024
154 Central African Republic 155 2.693 2.864884 2.521116 0.000000 0.000000 0.018773 0.270842 0.280876 0.056565 2.066005

In above table is showing last 5 countries acc to Ranking, lowest Happiness Score is 2,693 for Central African Republic from 155 countries.

In [17]:
#Checking the data types for each column 
df7.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 155 entries, 0 to 154
Data columns (total 12 columns):
Country                          155 non-null object
Happiness.Rank                   155 non-null int64
Happiness.Score                  155 non-null float64
Whisker.high                     155 non-null float64
Whisker.low                      155 non-null float64
Economy..GDP.per.Capita.         155 non-null float64
Family                           155 non-null float64
Health..Life.Expectancy.         155 non-null float64
Freedom                          155 non-null float64
Generosity                       155 non-null float64
Trust..Government.Corruption.    155 non-null float64
Dystopia.Residual                155 non-null float64
dtypes: float64(10), int64(1), object(1)
memory usage: 14.6+ KB

Most of our values are in float type, only Country is object and Ranking is integer.

In [18]:
#As we have long column names, i will change some of them 
df=df7.rename(columns = {'Happiness.Rank':'Happ.Rank', 'Happiness.Score':'Happ.Score', 'Economy..GDP.per.Capita.':'GDP', 
                     'Health..Life.Expectancy.':'Life.Expect','Trust..Government.Corruption.':'Trust.to.Gov', 'Dystopia.Residual':'Dystop.Res'})
df.head()
Out[18]:
Country Happ.Rank Happ.Score Whisker.high Whisker.low GDP Family Life.Expect Freedom Generosity Trust.to.Gov Dystop.Res
0 Norway 1 7.537 7.594445 7.479556 1.616463 1.533524 0.796667 0.635423 0.362012 0.315964 2.277027
1 Denmark 2 7.522 7.581728 7.462272 1.482383 1.551122 0.792566 0.626007 0.355280 0.400770 2.313707
2 Iceland 3 7.504 7.622030 7.385970 1.480633 1.610574 0.833552 0.627163 0.475540 0.153527 2.322715
3 Switzerland 4 7.494 7.561772 7.426227 1.564980 1.516912 0.858131 0.620071 0.290549 0.367007 2.276716
4 Finland 5 7.469 7.527542 7.410458 1.443572 1.540247 0.809158 0.617951 0.245483 0.382612 2.430182
In [19]:
#Checking if we have any NA value
df.isnull().values.any()
Out[19]:
False

Our 2017 data has no "null" value.

In [20]:
#Statistical values for all numeric variables as count, max, mean and quantiles
df.describe()
Out[20]:
Happ.Rank Happ.Score Whisker.high Whisker.low GDP Family Life.Expect Freedom Generosity Trust.to.Gov Dystop.Res
count 155.000000 155.000000 155.000000 155.000000 155.000000 155.000000 155.000000 155.000000 155.000000 155.000000 155.000000
mean 78.000000 5.354019 5.452326 5.255713 0.984718 1.188898 0.551341 0.408786 0.246883 0.123120 1.850238
std 44.888751 1.131230 1.118542 1.145030 0.420793 0.287263 0.237073 0.149997 0.134780 0.101661 0.500028
min 1.000000 2.693000 2.864884 2.521116 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.377914
25% 39.500000 4.505500 4.608172 4.374955 0.663371 1.042635 0.369866 0.303677 0.154106 0.057271 1.591291
50% 78.000000 5.279000 5.370032 5.193152 1.064578 1.253918 0.606042 0.437454 0.231538 0.089848 1.832910
75% 116.500000 6.101500 6.194600 6.006527 1.318027 1.414316 0.723008 0.516561 0.323762 0.153296 2.144654
max 155.000000 7.537000 7.622030 7.479556 1.870766 1.610574 0.949492 0.658249 0.838075 0.464308 3.117485

Describe function is showing us our data statistical values as mean, median, standard deviation and Quartiles.

Part II. Visualization

In [21]:
#Let's check all columns by pair plot 
import seaborn as sns; sns.set(style="ticks", color_codes=True)
g = sns.pairplot(df, kind="reg")

Scatterplots show possible associations or relationships between two variables. I wanted to see each variable positive and negative relationships, uphill lines are showing positive, downhill lines are negative relationships. In above plots are showing that we have weaker, stronger relationships. To quantify the strength of a linear (straight) relationship, we will use a correlation analysis.

In [22]:
#Check correlation of each values 
corrmat = df.corr()
sns.heatmap(corrmat, vmax=.8, square=True)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x185e80b88d0>

Above 2 graphs are showing us how is the correlation between each variable, we will concentrate the highly correlated variables. Happiness Rank and Happiness Score have negative correlation, while Happiness Score is increasing, Ranking is going to decrease (1 is the top, 155 is the last ranking). Therefore, we will analyze Happiness Score relations with GDP, Life Expectations, Freedom and Trust to Government Corruption as these values are highly correlated to each other. GDP is the main factor which is effecting others as Family, Life Expectations and Freedom.

In [23]:
#let's check highly correlated columns separately 

%config InlineBackend.figure_format = 'retina'

plt.figure(figsize=(16, 10))
for i, key in enumerate(['GDP', 'Family', 'Life.Expect', 'Freedom', 'Trust.to.Gov']):
    plt.subplot(2, 3, i+1)
    plt.xlabel(key)
    plt.scatter(df[key], df['Happ.Score'], alpha=0.5)

GDP, Family and Life Expectations are very highly correlated, in other words when GDP increases Happiness is increasing relatively.

GDP and Happiness : It is very common to use a country’s GDP or GDP per capita to evaluate a country’s development and productivity, as well as the well-beings of people. However, over-emphasizing on the ranking of GDP could also be misleading. In particular, it is important to examine people’s happiness index instead of using GDP as a proxy for well-being.

Family and Happiness : Relationships with family, friends, contact with the natural environment is important for people to be happy. When it comes to happiness, our nearest and dearest really matter. Research shows people who have strong relationships with a partner, family or close friends are happier, healthier and live longer. And it works both ways - for us and for them too.

Life Expectations and Happiness : When people belive that they will have longer and healthier life, they feel more happier as they do not have to think about diseases and other factors that will shorten their life.

Freedom and Happiness : Freedom is positively related to happiness among rich nations, but not among poor nations. Therefore we can not see highly correlation between. Apparently freedom does not pay in poverty. Further, freedom is related to happiness only when 'opportunity' and 'capability' coincide.

Trust to Government Corruption : Trust to government is including all governmental activities as protection of human rights, the degree of corruption, the level of social trust in a society. In other words, low levels of corruption, a high degree of social trust bringsd high levels of happines and social well-being.

Let's see Happiness Score on the world map for 2017

In [24]:
#I wanted to see happiness score distribution on the world map for 2017 results
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot #loading necessary libraries for mapping
init_notebook_mode(connected=True)

data = dict(type = 'choropleth', #As we have only country names in data, we can use country names to see the happiness
           locations = df['Country'],
           locationmode = 'country names',
           z = df['Happ.Score'], 
           text = df['Country'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Global Happiness 2017', 
             geo = dict(showframe = False, 
                       projection = {'type': 'Mercator'}))
choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3) 

Red countries are seems the most happy countries on above map. Northern Europe, North America, Canada, Australia are on the top. Turkey is in the average with 5.5 score. It is expected that African countries are below the average score, we can expect this result as these countries GDP per Capita is very low, Health and Life Expectancies are very low. China is very interesting Country that country is growing economically but happiness of people are very low. Income inequality is another problem to be solved by the Governments. On the other hand South America has interesting happiness rates, most of them have income inequality, lower GDP per Capita but their happiness is above the average. There are another factors for people to be happy.

I want to see the results of 2015

In [26]:
#Let's check head for 2015 
df5.head()
Out[26]:
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
1 Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
2 Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
3 Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
4 Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176

Happiest Countries are same but in a different rows, Western Europe is always on top.

In [27]:
#Last 5 countries in the list 
df5.tail()
Out[27]:
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
153 Rwanda Sub-Saharan Africa 154 3.465 0.03464 0.22208 0.77370 0.42864 0.59201 0.55191 0.22628 0.67042
154 Benin Sub-Saharan Africa 155 3.340 0.03656 0.28665 0.35386 0.31910 0.48450 0.08010 0.18260 1.63328
155 Syria Middle East and Northern Africa 156 3.006 0.05015 0.66320 0.47489 0.72193 0.15684 0.18906 0.47179 0.32858
156 Burundi Sub-Saharan Africa 157 2.905 0.08658 0.01530 0.41587 0.22396 0.11850 0.10062 0.19727 1.83302
157 Togo Sub-Saharan Africa 158 2.839 0.06727 0.20868 0.13995 0.28443 0.36453 0.10731 0.16681 1.56726

Unhappy countries are also from same regions, African Countries are always in the bottom levels. It is main problem of all over the world, Africa resources are very limited, climate and environment are not allowing them to improve their life standards. On the other hand, there are wars in some regions like Syria. People had to move from their countries, most of them are trying to survive in other countries refugee camps. This is another big problem for the world, not only Syria but also other countries as Iraq, Afghanistan, Pakistan...etc.

Let's see Happiness Score on the world map for 2015

In [28]:
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

data = dict(type = 'choropleth', 
           locations = df5['Country'],
           locationmode = 'country names',
           z = df5['Happiness Score'], 
           text = df5['Country'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Global Happiness 2015', 
             geo = dict(showframe = False, 
                       projection = {'type': 'Mercator'}))
choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3)

When we compare 2015 map with 2017, we do not see appreciable changes. There is very small changes in the scores and rankings. World resources are getting limited day by day, we will not see a better picture in future.

In [29]:
#2017 10 Happiest Countries 
dfl= df.groupby(['Country'], sort=False)['Happ.Score'].max().head(10) #I wanted to see top 10 Happy Countries
dfl
Out[29]:
Country
Norway         7.537
Denmark        7.522
Iceland        7.504
Switzerland    7.494
Finland        7.469
Netherlands    7.377
Canada         7.316
New Zealand    7.314
Sweden         7.284
Australia      7.284
Name: Happ.Score, dtype: float64
In [30]:
#2015 10 Happiest Countries
dfh = df5.groupby(['Country'], sort=False)['Happiness Score'].max().head(10)
dfh
Out[30]:
Country
Switzerland    7.587
Iceland        7.561
Denmark        7.527
Norway         7.522
Canada         7.427
Finland        7.406
Netherlands    7.378
Sweden         7.364
New Zealand    7.286
Australia      7.284
Name: Happiness Score, dtype: float64

As we can see that top 10 happiest countries are same in 2015 and 2017, only small changes in the places. After ranking fourth for the last two years, Norway jumped three spots and displaced three-time winner Denmark to take the title of "world's happiest country" for the first time. Denmark dropped to second place this year, followed by Iceland, Switzerland, Finland, Netherlands, Canada, New Zealand and Australia and Sweden (which tied for ninth place), according to the latest World Happiness Report, released in March 2017 by the Sustainable Development Solutions Network for the United Nations. Denmark has won the title three of the four times the report has been issued, while Switzerland has won the title just once.

Part III. Conclusion

Above analysis are showing that there is no single factor which can explain the happiness of people. Factors such as GDP, family, income inequality, the degree of peace and corruption have important role on happiness. This suggests that when we analyze the happiness, we should consider all factors together.

We know very well that money does not buy happiness by itself but it provides the other factors to be happier as healthier life, trustable government, freedom to make life choices and freedom from corruption, income inequality and levels of peace. GDP is like a catalyzer which is effecting most of the factors.

Happiness isn't just about money, although it's part of it.

"As demonstrated by many countries, this report gives evidence that happiness is a result of creating strong social foundations. It's time to build social trust and healthy lives, not guns or walls. Let's hold our leaders to this fact."

Future is exciting for developed countries as they are working in new technologies, Artificial Intelligence, Electrical Cars, Internet of Things and most of them are ready for climate changes in next decades. Developed countries are investing for clean energy, agricultural sciences and cleaning the air from pollutants. They will surely be the best survived countries against climate changes. We can easily conclude that developed countries will keep their happiness and life standards in future.

On the other hand, poor and unhappy countries will be worsen day after day as they have limited resources, high and not educated population and wars, their future is very dark. Climate change will show the effects very fastly in near future, this will be faster than expected. Afterwards, world will have 2 type of countries, very low standard countries and very high. There will not be middle level country. If this will continue in this way, developed countries will be effected indirectly. We will see more refugees around the developed countries, more wars for the limited sources as oil, clean water and food. Polulation is increasing uncontrollably, especially in Far East Countries as India, China and Indonesia. Air pollution is another big problem of the world and these crowded countries have the most polluted air, they are over the limits.

To sum up, I can not draw an optimistic table at the end with all these results and conditions. Developed countries will keep their status and they will be more happy than the rest of the world. Unhappy countries score will not increase with these circumstances, even their score will decrease every year. Happiness have several factors, GDP is the powerful factor but not the only one. I wish we could have a better world in future but i am afraid this will not happen.

Thank You