Presidential Approval Ratings Don't Mean Much Early On

Trump is having a lousy week. You've probably heard. The President's Charlottesville response was deemed inadequate, his lawyer has been forwarding misguided General Lee comparisons, and now we've learned that a crown jewel of Presidential gatherings - the lauded Manufacturing Council - is no more. Who knew condemning Neo-Nazis could be so hard?

The recent stream of negative press hasn't helped the President's popularity. FiveThirtyEight has Trump's approval rating at a lowly 37.3%. The site's nifty historical comparisons also show that, among post-WW2 Presidents, only Gerald Ford had a similarly poor approval rating at this point in his administration. Trump's North Korea Twitter rhetoric certainly doesn't alleviate concerns about executive or national stability either.

However, while Trump's low rating is unusual given how early we are in his administration, it is not really an outlier in the context of overall Presidential approval:


Many Presidents have, for instance, dipped below 40%. Precipitous drops in popularity are the norm rather than the exception - although they do tend to occur in a President's second term. Trump's fall is particularly jarring compared to Barack Obama's comparatively serene eight years, an historic outlier that featured no true "bottoming out" a la Nixon, Carter, or either Bush.

Still, does Trump's miserable approval rating mean he won't be re-elected? To investigate, I plotted the approval of first term Presidents by the number of days until their re-election bid, splitting out winners and losers. The 1964 election run-up is shown for LBJ.

It's clear that incumbents usually win. Only Ford, Carter, and George H.W. Bush lost their re-election bids. Voters generally prefer the devil they know.

The other noteworthy trend is that early term Presidential approval is not a good predictor of election success. Clinton was exceptionally unpopular at times. And Truman's swings over the course of his first term make Trump's administration look positively serene. Trump is certainly in bad shape, but it's not inconceivable he turns it around.

Of course, I do realize a President must actually make it to the election to be re-elected.

P.S. It is worth mentioning that approval rating has some severe flaws: first, the question is rather vague - why not ask a respondent directly about the President's effect on their quality of life? Second, military action tends to give administrations a nice boost (at first). This is a sub-optimal incentive structure. The metric reminds me of batting averages in baseball: no student of the game takes them seriously, but since even the layperson knows the definition it is unlikely that Mendoza line mentions ever fall completely out of favor.

If you enjoyed this post, consider signing up for my weekly newsletter on tech, sports, and more.

Text Mining BBC Headlines with R

Recently I discovered Text Mining with R: A Tidy Approach, a new guide by Julia Silge and David Robinson that synthesizes common text analysis tasks with the tidyverse concepts familiar to all Hadley Wickham adherents.

To test out the book's techniques, I scraped BBC headlines since 2014 using the Wayback Machine. After the usual data wrestling/wrangling process, I was left with a de-duped dataset of 2,885 headlines that had appeared in the BBC's top headline slot. From here, a simple application of R's unnest_tokens gave me an appropriately "tidy" dataset of one word per row. I did concatenate some obvious bi-grams (e.g. North Korea) but otherwise stuck with individual words.

Jumping into some analysis, I leveraged Text Mining's code to summarize word frequencies. Here, we find some predictable terms in the top spots:

Headline Changes

While not surprising that the U.S. and Trump have dominated headlines, it is interesting to examine how the focus of BBC coverage has shifted over the years. One approach suggested in Text Mining is to calculate a document's tf-idf score, where tf-idf refers to "term frequency–inverse document frequency." The goal is to find words that occur frequently in a particular document (e.g. all the headlines for a given year) but are not terribly common across an entire corpus (e.g. all headlines from 2014-2017). Applying tf-idf by year, we find that Gaza stories were prominent in 2014, the Greece debt crisis was on everybody's mind in 2015, while from 2016 onwards we have been living in TrumpWorld:

There are other options beyond tf-idf. In particular, one of Text Mining's case studies demonstrates a model-based technique using Twitter archives. Here, separate GLM binomial models are fit to each word's count vs. total word frequency across time. A positive slope for a given word's model indicates the word is appearing more often across time, while a negative slope demonstrates reduced frequency. Given the high volume of models, significance is assessed with adjusted p-values to avoid multiple comparison issues. I used a .01 adjusted p-value threshold to assess significance (along with a minimum of 50 total appearances). Below are the frequencies across time for all the words found to have significant slopes:

Unsurprisingly, words like Trump and Gaza appear again, but the GLM approach also identifies "Ukraine" as a signficant decliner - a word missed using tf-idf scores.

You might be wondering what the "opportunity cost" has been of all the U.S. election/Trump stories from the past 1.5 years? Using the BBC's regional classification for stories (parsed from headline URLs), we can see that stories from Europe, the Middle-East, and Africa have taken the bulk of the reduced press coverage since 2016:

Sentiment Analysis

Switching into a different topic, Text Mining has a nice sentiment analysis section that demonstrates usage of the sentiment dictionaries found in the book's associated tidytext package. Below I break-out BBC headline word frequencies by positive and negative categories, as found in the included "bing" lexicon.

Note that using these sentiment dictionaries does require some care. For example, "trump" was listed as a positive word. This may or may not ring true depending on the reader's political persuasion, but for the purpose of analytic objectivity I thought it best to remove the word from either category.

In any case, we find that BBC headline coverage is dominated by one-off destructive events, from "attack" and "crash" to "strike" and "bomb." Terror is still very effective at garnering media coverage.

Screen Shot 2017-07-30 at 10.53.19 AM.png

Network Effects

I'd like to close with my favorite visual from Text Mining: word networks using igraph and ggraph. The network below visualizes connections between words appearing in the same headline (minimum 8 matches). Here, we can see the vast quantity of BBC coverage devoted to U.S. foreign policy, from Ukraine and Syria to interactions with Russia, Iran, and China. On the edges are some anomalous stories, such as the search for MH370 plus reports on Israel and the Gaza Strip.

Find all code for this analysis here.

Enjoy this post? Subscribe to my weekly newsletter on tech, sports, and more!