Introducing rtweet: Why Ever Leave the House?
Those of you who know me well know that I can consistently be found on Twitter, taking in the latest developments in AI, meeting fellow data science enthusiasts and/or hosting the #DataEveryone chat on Thursdays at 7pm ET. However, RStudio is just as much of a regular hang for yours truly. There’s never a shortage of new packages to try - the most eventful of which lately, has been rtweet.
You can imagine my excitement when I heard about rtweet - the R package that allows us both to analyze the breadth of data on Twitter and post to our accounts straight from RStudio! The example below shows the very simple code and the way it instantly manifests on my Twitter timeline. Would you ever know I didn’t post it from my phone?
So how does this all work?
Open up RStudio and install the rweet package:
install.packages("rtweet")
library(rtweet)
In my case, RStudio made me install “rlang” separately, but once I did, everything worked as intended.
The first time you attempt a command such as post_tweet(), your browser will automatically open and ask you to authenticate by logging into Twitter. Once you sign into your account, you will receive a message that says “Authentication complete. Please close this page and return to R.” When you go back to your RStudio window, you will see a similar confirmation message. Now you can start posting statuses and you won’t have to authenticate again, even if you restart your R session.
My Favorite Feature:
Granted, posting to my timeline directly from RStudio is pretty cool, but my world was even further opened by the options rtweet offers for trend analysis between cities on Twitter.
As you can see above, the get_trends() command allows you to store the top 50 current topics trending on Twitter for a given location in a variable. It looks and feels like complete magic, but you do still have to make sure that you choose the right city name for the API to recognize it. For example, the command get_trends(“nola”) returns a message saying that the location “nola” cannot be found. New Orleans, however, produces results.
And within those results…
I found some interesting patterns. Leave it to me to turn every new package exploration into a study, but I can’t help it.
New York, Los Angeles, San Francisco and Chicago all had a variation of the corona virus listed within their top 5 trends as of noon EST on February 29, 2020. However, the rankings and verbiage differed as shown in the table below:
Why are some location’s trends labeled as #coronavirus and others Corona? Is this difference significant? And it doesn’t appear that trend rankings follow a consistent pattern based on confirmed cases. For example, New York currently does not have confirmed cases of Coronavirus and yet the topic is trending higher than it is in LA, which has a case in the city itself and three more in surrounding areas (worldometer.info).
I don’t have all the answers…
And neither does Twitter. But the ability to pool data from Twitter directly into RStudio for analysis does allow us to start asking the right questions - questions we might not have had the bandwidth to ask before rtweet came on the scene.
Like what you read?
Don’t hesitate to support the creation of more innovative data science content by becoming a patron on Patreon!
And you will receive exclusive interviews covering both the latest in AI and diversity and inclusion as well as conference discounts, hiring tips and one-of-a-kind art.