This is a replica to the awesome data.fivethirtyeight.com website. Read more here

FiveThirtyEightFiveThirtyEight replica

/subreddit-algebra

subreddit-algebra

Subreddit Algebra

This directory contains the code and data behind the story Dissecting Trump's Most Rabid Online Following.

The raw data (an online cache of Reddit comments going back to 2005) is from Google's BigQuery and more information about the data can be found here.

Details about the three files of code in this folder:

FileDescription
processData.sqlSQL code for filtering, processing and formatting Reddit comment data from Google's BigQuery. (Note that if you click on the raw data link above, this SQL query will automatically be loaded).
subredditVectorAnalysis.RConducts a latent semantic analysis of over 50,000 subreddits that creates a vector representation of each one based on commenter co-occurence. It also implements "subreddit algebra:" the ability to add and subtract different subreddits to reveal how they relate to one another.
computeUserOverlap.sqlA separate SQL query used for computing the user overlap between r/The_Donald and other subreddits

Files

NameDownload