Tweets on AI: Distribution per hour

When do people tweet about AI?

Marine Leroi https://skemagloballab.io/leroiMarine.html (SKEMA Global Lab)https://skemagloballab.io/ , Thierry Warin https://www.nuance-r.com/principalInvestigator.html (SKEMA Business School (Raleigh, NC))https://www.skemagloballab.io
12-16-2019

By using SKEMA Quantum Studio Framework (Warin 2019), this blog will teach you how to manipulate AI tweets.

Loading the data

First, let’s load the csv files containing the tweets on AI. We collected 10,000 tweets on Artificial Intelligence between 11 and 12 November 2019 with two specific keywords: “ArtificialIntelligence” and “Artificial Intelligence”.


library(readr)

# Dataset with the keyword "Artificial Intelligence"
tweetsArtificial_Intelligence <- read_csv("tweetsArtificial_Intelligence.csv")

# Dataset with the keyword "ArtificialIntelligence"
tweetsArtificialIntelligence <- read_csv("tweetsArtificialIntelligence.csv")

Merging the data

The second step is to merge the two datasets into one.


library(dplyr)

# Merging the datasets
tweetsAI <- bind_rows(tweetsArtificialIntelligence, tweetsArtificial_Intelligence)

Cleaning the data

After merging the data, it needs to be a little cleaned up. We want to keep unique tweets but we collected the tweets at different time which means that we potentially have the same tweets collected at different time. This results in a change in the likes count, retweets count and replies count. Then the unique() function will not remove the tweet even if it’s the same because the count is not the same. Therefore we need to remove these columns and apply the unique() function.


# Removing the count columns
tweetsAI <- select(tweetsAI, -replies_count, -retweets_count, -likes_count)

# Keeping unique tweets
tweetsAI <- unique(tweetsAI)

Now, we want to filter the data to keep only the tweets of November 11, 2019. That leaves us with about 6,000 tweets.


# Filtering the tweets
tweetsAI <- filter(tweetsAI, date == "2019-11-11")

As we have the tweets over a one-day period, we can make a visual to see what time people tweet on AI.

Number of tweets per hour

One way to display the tweets per hour is to separate the column time in three columns: “hour”, “minute” and “second”.


library(tidyr)

# Separating the column "time" into three columns
tweetsAI <- separate(tweetsAI, time, c("hour", "minute", "second"), sep = ":")

We can use the column “hour” as the x axis to get the count of tweets per hour. Now, let’s make a basic histogram showing the distribution of the tweets per hour.


library(ggplot2)

# Creating an histogram
ggplot(data = tweetsAI, aes(x = hour)) +
  geom_histogram(stat = "count", aes(fill = ..count..)) +
  theme(legend.position = "none") +
  xlab("Hours") + ylab("Number of tweets") + 
  ggtitle("Distribution of the tweets per hour") +
  scale_fill_gradient(low = "yellow", high = "red") +
  theme_minimal()

We can see that people are tweeting about AI at 8am then at 1pm until 7pm.



Warin, Thierry. 2019. “SKEMA Quantum Studio: A Technological Framework for Data Science in Higher Education.” https://doi.org/10.6084/m9.figshare.8204195.v2.

Citation

For attribution, please cite this work as

Leroi & Warin (2019, Dec. 16). Blog: Tweets on AI: Distribution per hour. Retrieved from https://blog.skemagloballab.io/posts/2019-12-16-tweetsaipart1/

BibTeX citation

@misc{leroi2019tweets,
  author = {Leroi, Marine and Warin, Thierry},
  title = {Blog: Tweets on AI: Distribution per hour},
  url = {https://blog.skemagloballab.io/posts/2019-12-16-tweetsaipart1/},
  year = {2019}
}