statcanR

Officially part of The Comprehensive R Archive Network.

Anne-Sophie Gill (SKEMA Global Lab)https://skemagloballab.io/ , Marine Leroi https://skemagloballab.io/leroiMarine.html (SKEMA Global Lab)https://skemagloballab.io/ , Thierry Warin https://www.nuance-r.com/principalInvestigator.html (SKEMA Business School (Raleigh, NC))https://www.skemagloballab.io
01-14-2020

SKEMA Quantum Studio (Warin 2019) has created a package named statcanR, in order to provide a efficient way to get statistics data tables with the help of advanced analytical skills and offer support in terms of data collection allowing to produce social and economics analysis of Canada at 3 different geographical granularity level, such as country of Canada, Canadian provinces as well as Canadian metropolitan areas.

statcanR is part of CRAN (The Comprehensive R Archive Network), a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R.

Statistics Canada ensures Canadians have the key information on Canada’s economy, society and environment that they require to function effectively as citizens and decision makers.

In order to accelerate knowledge discovery, statcanR is how the new way of consuming a discrete amount of data points updates with rapidity and ease.

Here is an example of code to use our statcanR package.

Package & Function


library(statcanR)
mydata <- sqs_statcan_data("27-10-0014-01","eng")

Detailed arguments of the function:
27-10-0014-01: Statistic Canada table number
eng: Language of the table

With a simple web search ‘statistics canada wages by industry metropolitan area monthly’, the table number can easily be found on Statisitcs Canada’s webpage. Here is below a figure that illustrates this example, such as ‘27-10-0014-01’ for the Federal expenditures on science and technology, by socio-economic objectives.

Data Manipulation

The date (column REF_DATE) needs to be transformed, so we can apply a real date format.


library(stringr)
mydata$date <- str_sub(mydata$REF_DATE,-4,-1)
mydata$date <- paste0(mydata$date, "-01-01")
mydata$date <- as.Date(mydata$date)

We need to create a subset of our data to create a graph.


mydata <- mydata[mydata$`Science and technology components`=="Research and development", ]
mydata <- mydata[mydata$`Socio-economic objectives`!="Total socio-economic objectives"]
mydata <- mydata[mydata$`Type of expenditures`=="Intramural"]

names(mydata)[names(mydata) == "Socio-economic objectives"] <- "se_objectives"

library(dplyr)
mydata <- filter(mydata, se_objectives == "Agriculture" | se_objectives == "Defence"| se_objectives == "Energy"| 
                 se_objectives == "Environment"| se_objectives == "Fishing"| se_objectives == "Forestry"| 
                 se_objectives == "Health" | se_objectives == "Telecommunication" | se_objectives == "Transport")

Visualization


library(ggplot2)
library(ggthemes)
ggplot(data = mydata, aes(x = date, y = VALUE)) +
  geom_line(aes(colour=se_objectives), size = 0.8)  + 
  theme_hc() +
  theme(legend.position="right", title=element_text(size=10)) +
  labs(title = "Intramural research and development per socio-economic objectives",
       subtitle = "(2003 - 2018)",
       x = "Date",
       y = "Million dollars CAD",
       colour = "Socio-economic\n objectives",
       caption = "Source: SKEMA Quantum Studio")


Install statcanR now.


Follow SKEMA Global Lab in Augmented Intelligence on

Warin, Thierry. 2019. “SKEMA Quantum Studio: A Technological Framework for Data Science in Higher Education.” https://doi.org/10.6084/m9.figshare.8204195.v2.

Citation

For attribution, please cite this work as

Gill, et al. (2020, Jan. 14). Blog: statcanR. Retrieved from https://blog.skemagloballab.io/posts/2020-01-14-statcanr/

BibTeX citation

@misc{gill2020statcanr,
  author = {Gill, Anne-Sophie and Leroi, Marine and Warin, Thierry},
  title = {Blog: statcanR},
  url = {https://blog.skemagloballab.io/posts/2020-01-14-statcanr/},
  year = {2020}
}