R Journey

Supun Manathunga

Part 1 - Basics

country	year	infant_mortality	life_expectancy	fertility	population	gdp	continent	region
Albania	1960	115.40	62.87	6.19	1636054	NA	Europe	Southern Europe
Algeria	1960	148.20	47.50	7.65	11124892	13828152297	Africa	Northern Africa
Angola	1960	208.00	35.98	7.32	5270844	NA	Africa	Middle Africa
Antigua and Barbuda	1960	NA	62.97	4.43	54681	NA	Americas	Caribbean
Argentina	1960	59.87	65.39	3.11	20619075	108322326649	Americas	South America
Armenia	1960	NA	66.86	4.55	1867396	NA	Asia	Western Asia

country	year	infant_mortality	life_expectancy	fertility	population	gdp	continent	region
Albania	2000	23.2	74.70	2.38	3121965	3686649387	Europe	Southern Europe
Algeria	2000	33.9	73.30	2.51	31183658	54790058957	Africa	Northern Africa
Angola	2000	128.3	52.30	6.84	15058638	9129180361	Africa	Middle Africa
Antigua and Barbuda	2000	13.8	73.80	2.32	77648	802526701	Americas	Caribbean
Argentina	2000	18.0	74.20	2.48	37057453	284203745280	Americas	South America
Armenia	2000	26.6	71.30	1.30	3076098	1911563665	Asia	Western Asia
Aruba	2000	NA	73.78	1.87	90858	1858659293	Americas	Caribbean
Australia	2000	5.1	79.80	1.76	19107251	416887521196	Oceania	Australia and New Zealand
Austria	2000	4.6	78.20	1.37	8050884	192070749954	Europe	Western Europe
Azerbaijan	2000	60.6	66.50	2.05	8117742	5272617196	Asia	Western Asia

Function	Description
`mean(), median(), sd()`	Calculate mean, median, and standard deviation
`t.test(), wilcox.test(), var.test()`	Hypothesis tests (t-test, Wilcoxon test, variance test)
`cor()`	Calculate correlations between variables
`dnorm(), pnorm(), qnorm()`	Density, distribution, and quantile functions for the normal distribution
`dpois(), ppois(), qpois()`	Similar functions for the Poisson distribution
`dbinom(), pbinom(), qbinom()`	Functions for the binomial distribution
`lm()`	Fit linear regression models
`glm()`	Fit generalized linear models
`summary()`	Provide model summaries and other summary statistics

1 / 85

R Journey Supun Manathunga

R Journey
R Journey
Part 1 - Basics
R and RStudio
Why R?
RStudio
Installing packages
Tidyverse
RStudio workspace
R Objects
Case 1
Functions
Case 2
seq function and vectors
Vectors
Loops
Find the sum of the first 100 numbers.
More interesting ways…
Challenge
Dataframes
Loading dataframes
Common functions used on dataframes
Creating dataframes
Other commonly used dplyr functions
Part 2 - Data Visualization
Exploratory Data Analysis (EDA)
Motivation
Slide 28
The pipe Operator
library(dplyr) library(dslabs)...
While R has powerful...
hist(murders$total)...
plot(murders$population/10^6,...
barplot(murders$total,...
boxplot(murders$total...
pie(tapply(murders$total,...
ggplot2 Package
There are three main components to a ggplot2 plot:
ggplot(data = murders,...
ggplot(data = murders,...
ggplot(data = murders,...
ggplot(data = murders,...
ggplot(data = murders,...
ggplot(data = murders,...
ggplot(data = murders,...
ggplot(data = murders,...
p <- ggplot(data...
p...
Gapminder
Two Worlds
People in the “Third...
library(tidyverse)...
Is it the same in 2012?
filter(dslabs::gapminder,...
library(gganimate)...
GDP vs Life Expectancy
What is the relationship between GDP and life expectancy across countries and over time?
dslabs::gapminder...
dslabs::gapminder...
dslabs::gapminder...
dslabs::gapminder...
dslabs::gapminder...
dslabs::gapminder...
Slide 64
dslabs::gapminder...
Part 3 - Statistics
Summary Statistics
Describe the distribution...
Describing continuous data
\(F(a) = \mbox{Proportion...
Histograms and Smoothed Density Plots
Slide 72
In this plot, we...
Stratified plots
Normal Distribution
Stratified boxplots
Comparing means
t.test(x = heights$height[heights$sex...
We can calculate...
Some common functions available in stats package
Lady Tasting Tea
How many ways to select 4 cups from 8?
Therefore, the probability...
So, the p-value,...
What we calculated earlier was the p-value of the Fisher’s exact test.