OECD

A knowledge hub for data and analysis

Anne Sophie Gill https://www.skemagloballab.io/gillAnneSophie.html (SKEMA Global Lab in AI)https://skemagloballab.io , Thierry Warin https://www.nuance-r.com/principalInvestigator.html (SKEMA Business School (Raleigh, NC))https://www.skemagloballab.io
02-25-2020

Using the SKEMA Quantum Studio(Warin 2019) framework, we will teach you how to use the OECD package

How to use the OECD package

Retrieve data


# Loading OECD library
library(OECD)

# get_datasets()
dataset_list <- get_datasets() # this function will show you all the different datasets available

# search dataset()
search_dataset("unemployment", data = dataset_list) # this function will help you narrow your search of a specific dataset

                    id
93               DUR_I
94               DUR_D
158            AVD_DUR
666   AEO2012_CH6_FIG4
697  AEO2012_CH6_FIG29
743  AEO2012_CH6_FIG19
1299               NRR
                                                                                   title
93                                                 Incidence of unemployment by duration
94                                                              Unemployment by duration
158                                                     Average duration of unemployment
666                                               Figure 4: Youth and adult unemployment
697  Figure 29: Youth employment and unemployment by education and country income groups
743              Figure 19: The trade off between vulnerable employment and unemployment
1299                                               Net Replacement Rates in unemployment

# choose your dataset and show the data in a data frame
dataset <- "DUR_D"

Extract data


# show the data in a data frame

dstruc <- get_data_structure(dataset) 
str(dstruc, max.level = 1)

List of 12
 $ VAR_DESC       :'data.frame':    12 obs. of  2 variables:
 $ COUNTRY        :'data.frame':    53 obs. of  2 variables:
 $ TIME           :'data.frame':    51 obs. of  2 variables:
 $ SEX            :'data.frame':    3 obs. of  2 variables:
 $ AGE            :'data.frame':    7 obs. of  2 variables:
 $ DURATION       :'data.frame':    8 obs. of  2 variables:
 $ FREQUENCY      :'data.frame':    1 obs. of  2 variables:
 $ OBS_STATUS     :'data.frame':    15 obs. of  2 variables:
 $ UNIT           :'data.frame':    316 obs. of  2 variables:
 $ POWERCODE      :'data.frame':    32 obs. of  2 variables:
 $ REFERENCEPERIOD:'data.frame':    96 obs. of  2 variables:
 $ TIME_FORMAT    :'data.frame':    5 obs. of  2 variables:

dstruc$VAR_DESC # show this variable in a table

                id        description
1          COUNTRY            Country
2             TIME               Time
3              SEX                Sex
4              AGE                Age
5         DURATION           Duration
6        FREQUENCY          Frequency
7        OBS_VALUE  Observation Value
8      TIME_FORMAT        Time Format
9       OBS_STATUS Observation Status
10            UNIT               Unit
11       POWERCODE    Unit multiplier
12 REFERENCEPERIOD   Reference period

dstruc$SEX # show this variable in a table

     id       label
1   MEN         Men
2 WOMEN       Women
3    MW All persons

dstruc$AGE # show this variable in a table

      id    label
1   1519 15 to 19
2   1524 15 to 24
3   2024 20 to 24
4   2554 25 to 54
5   5564 55 to 64
6   6599      65+
7 900000    Total

# filter your results

filter_list <- list(c("CAN", "FRA", "USA", "GBR"), "MW", "900000")
df <- get_dataset(dataset = dataset, filter = filter_list)
head(df)

  COUNTRY SEX    AGE DURATION FREQUENCY TIME_FORMAT obsTime obsValue
1     CAN  MW 900000      UN1         A         P1Y    1976    233.2
2     CAN  MW 900000      UN1         A         P1Y    1977    264.8
3     CAN  MW 900000      UN1         A         P1Y    1978    273.7
4     CAN  MW 900000      UN1         A         P1Y    1979    273.0
5     CAN  MW 900000      UN1         A         P1Y    1980    289.2
6     CAN  MW 900000      UN1         A         P1Y    1981    305.5

# choose one time frame in the DURATION data frame
unique(df$DURATION)

[1] "UN1" "UN2" "UN3" "UN4" "UN5" "UN"  "UND" "UNK"

dstruc$DURATION # show this variable in a table

   id                    label
1 UN1                < 1 month
2 UN2 > 1 month and < 3 months
3 UN3 > 3 month and < 6 months
4 UN4   > 6 month and < 1 year
5 UN5          1 year and over
6  UN                    Total
7 UND           Total Declared
8 UNK                  Unknown

Visualize your data


# We will use the "UN" DURATION for this example

df_plot <- df[df$DURATION == "UN", ]

# Data wrangling

df_plot$obsTime <- as.numeric(df_plot$obsTime) # make sure the variable is in a numeric format

library(ggplot2)

qplot(data = df_plot, x = obsTime, y = obsValue, color = COUNTRY, geom = "line") +
  labs(x = NULL, y = "Persons, thousands", color = NULL,
       title = "Long-term unemployement")

The line chart above illustrates long term unemployement in Canada, France, the United Kingdom and the US since the 1970s. We can easily tell that around 2010, many people in the US were unemployed.


We hope this helped you navigate the OECD package.

Follow SKEMA Global Lab in Augmented Intelligence

Warin, Thierry. 2019. “SKEMA Quantum Studio: A Technological Framework for Data Science in Higher Education.” https://doi.org/10.6084/m9.figshare.8204195.v2.

Citation

For attribution, please cite this work as

Gill & Warin (2020, Feb. 25). Blog: OECD. Retrieved from https://blog.skemagloballab.io/posts/2020-02-25-oecd/

BibTeX citation

@misc{gill2020oecd,
  author = {Gill, Anne Sophie and Warin, Thierry},
  title = {Blog: OECD},
  url = {https://blog.skemagloballab.io/posts/2020-02-25-oecd/},
  year = {2020}
}