A Random Walker

  • Home
  • Random Talkers
  • About
youtube-2712573_1280.png

Creation Stories: Analyzing YouTube Data With R

October 11, 2017 by Adam Walker

Earlier this year, YouTube introduced Creators On The Rise, a showcase for up and coming stars on the platform. For those of us with somewhat modest YouTube followings, these creators are VERY intriguing. What secrets enable their lavish subscriber growth? Is it the cadence of posts? The eye-catching thumbnails? Perhaps it's the video titles perfectly written to elicit a click...

Or maybe channel popularity is a function of plain old survivorship bias. Life is unfair!

Anyway, to analyze these new media success stories I turned to tuber, a handy R package that facilitates access to YouTube's API - just make sure to register via Google's developer console for the necessary ID and key. With a little wrangling I was able to compile a dataset of 92 YouTube channels previously featured as "Creators On The Rise," including comprehensive metadata (title, description, tags, duration, views) for 9,410 videos. All of this plus my code is available on Github.


Screen Shot 2017-10-10 at 3.19.04 PM.png

"With more than 1,000 creators crossing the 1,000 subscriber threshold every single day, new talent is constantly emerging."

YouTube introductory blog post


Now, with data in hand, time to tackle some burning questions:

1. How Long Till I Become A YouTube Superstar?

The 92 creators represent an interesting mix of fast success versus long-term grind. Channels like Deestroying and Hailey Reese have amassed subscribers with abandon since their debut, while 007craft and Amazing Grays have taken a more leisurely approach:

Screen Shot 2017-10-10 at 7.49.22 PM.png

YouTube's selection criteria for a "Creator On The Rise" likely means this chart represents an overly optimistic view of the time it takes to become a YouTube superstar. In addition, it's worth noting that some creators may have had an existing following prior to creating their channel - Sophina The Diva, for example, has garnered 64K subscribers in a little over a month, but had the advantage of already being rather famous. Nevertheless, building a YouTube empire does not have to be a decades-long affair.


2. Will I Need A Viral Hit?

It doesn't hurt. Let's return to the previously mentioned 007craft, a solo creator channel with around 39K subscribers. Mr. 007craft (I'm not sure anyone knows his real name) attracted attention earlier this year as the guy who was living in a storage locker. His video on the experience has collected over 3.5M views, representing an outsized portion of his channel's 11.8M total.

Of course, creating a viral video is tough. One could toil for years without a fleeting whiff of fame. Only 55 videos in the (very much non-representative!) data have broken 1 million views, representing about 0.6% of the 9,410 total clips. Still, to have a shot you probably want to keep your clip under 15-20 minutes:

Screen Shot 2017-10-10 at 4.21.32 PM.png

"FEEDING THE DEVIL | Spiders and Centipede" can be found here for those interested. I thought insect-averse readers would appreciate me omitting the thumbnail, so check out the second place video instead:

In any case, have all 92 Creators On The Rise had a true viral hit? Subsetting the data to only the highest viewed video from each channel, we find that these top clips have a median view count of 553K. This is certainly a lofty bar for non music videos, but at least your channel doesn't have to garner a Charlie bit my finger level of fame to gain subs. Overall: viral hits help, but aren't essential.


3. What Should I Make My Videos About?

All the channels analyzed used tags to help people discover their content. So tag liberally! For the graph below, I parsed tags from each individual video and computed pairwise correlations between all tags appearing in at least 10 channels and 100 videos. Groups of tags with high positive correlations indicate common topics covered by many creators:

Screen Shot 2017-10-10 at 3.03.45 PM.png

Most topics are fairly down-to-earth! There are fitness channels, family-oriented content, make-up tutorials and the like. And no politics! Perhaps YouTube is consciously refraining from naming political channels to the "On The Rise" section? Or maybe political takes aren't that compelling after all...we can only hope.


4. How Much Content Does A Top Content Creator Create?

Now that you have some ideas on potential channel topics, you might be wondering if it's possible to build a large following without quitting your day job? Maybe! Most creators are posting steadily but certainly not daily. The data below shows publication frequency over the past 52 weeks for all the channels created prior to the start of the year:

Screen Shot 2017-10-10 at 2.28.59 PM.png

Side-note: CatPusic might be my favorite of all the channels in the data. Consider adding a cat to your channel for an instant popularity boost.

In terms of timing, Friday is the most popular day of the week for publishing new videos:

Screen Shot 2017-10-10 at 2.37.12 PM.png

5. The Rich Life...What's That?

The attentive reader will have noticed an outlier in the second to last chart. The Rich Life lead the pack by far in terms of posting frequency, cranking out a ridiculous 6.5 videos weekly. For the uninitiated, The Rich Life follows a "homeschooling family of 7 that loves to share the good, the bad and maybe the occasional ugly." Of course, the idea of family as #brand is nothing new, but adding 200K subs in a little over a year is impressive regardless.

A cursory glance at The Rich Life's recent clips reveals a channel that clearly has its act together:

Screen Shot 2017-10-10 at 12.43.39 AM.png

The ALL CAPS titles alone entice a click, plus The Rich Life's thumbnails are attention-grabbing and effective. Also, the sheer variety of hijinks encountered is incredible! Note that if you have recently been kicked in the head by a horse, suffered a break-in, evaded a tornado, AND dealt with a police run-in...then you may have what it takes to be a top YouTube creator.


Final Thoughts

I hope this article has served as a somewhat tongue-in-cheek introduction to analyzing YouTube data via tuber. There's much, much more that could be done with what's available, from running sentiment analysis on video titles and descriptions to building models for subscriber growth or a video's likelihood of going viral. Once again, all project code can be found here. Thanks for reading!


Further Adventures In R:

  • Text Mining BBC Headlines With R
  • Presidential Approval Ratings Don't Mean Much Early On
  • TV Outliers: Game of Thrones, Breaking Bad... Grey's Anatomy?
October 11, 2017 /Adam Walker
  • Newer
  • Older
  • May 2018
    • May 21, 2018 Random Talkers E26: Skin In The Game May 21, 2018
  • April 2018
    • Apr 29, 2018 Random Talkers E25: Blockchain Basics Apr 29, 2018
    • Apr 15, 2018 Random Talkers E24: Weapons Of Math Destruction Apr 15, 2018
    • Apr 15, 2018 Random Talkers E23: The Singularity Is Near(ish) Apr 15, 2018
    • Apr 15, 2018 Random Talkers E22: Universal Basic Income Explained Apr 15, 2018
    • Apr 15, 2018 Random Talkers E21: The Master Algorithm Apr 15, 2018
  • February 2018
    • Feb 23, 2018 Random Talkers E20: China’s Social Credit System, “The Gray Rhino” Reviewed Feb 23, 2018
    • Feb 10, 2018 Random Talkers E19: Blockchain Applications, “American Kingpin” Reviewed Feb 10, 2018
  • January 2018
    • Jan 24, 2018 Random Talkers E18: The Innovator's Solution Jan 24, 2018
    • Jan 20, 2018 YouTube Creators Shouldn't Blame Logan Paul For Changes Jan 20, 2018
    • Jan 18, 2018 Random Talkers E17: The Hard Thing About Hard Things Jan 18, 2018
  • December 2017
    • Dec 27, 2017 Random Talkers E16: Where Are The Aliens, What’s Up With The Octopus? Dec 27, 2017
    • Dec 13, 2017 Random Talkers E15: Delivering Happiness, Debating Bitcoin Dec 13, 2017
  • November 2017
    • Nov 29, 2017 E14: Zero To One Review, Net Neutrality Debated Nov 29, 2017
    • Nov 21, 2017 "Guess The Stock!" With R And Shiny Nov 21, 2017
    • Nov 14, 2017 E13: The Everything Store Reviewed, Waymo Surges Ahead Nov 14, 2017
    • Nov 7, 2017 E12: Superintelligence And The Future Of AI, ESPN Ditches Barstool Nov 7, 2017
  • October 2017
    • Oct 23, 2017 Bill Simmons Doesn't Talk To Women Much Oct 23, 2017
    • Oct 22, 2017 E11: “The Upstarts” Book Review, The USMNT’s World Cup Rejection Oct 22, 2017
    • Oct 11, 2017 Creation Stories: Analyzing YouTube Data With R Oct 11, 2017
    • Oct 6, 2017 E10: An Elon Musk Biography, A 70 Hour Workweek, And Another Uber Segment! Oct 6, 2017
  • September 2017
    • Sep 17, 2017 Random Talkers E9: Disney Takes On Netflix, Amazon Announces HQ2, Equifax Loses Our Data Sep 17, 2017
    • Sep 14, 2017 Random Talkers E8: Taylor's Return, Uber's New CEO, and Giving Smartly Sep 14, 2017
  • August 2017
    • Aug 31, 2017 August Newsletter Roundup Aug 31, 2017
    • Aug 20, 2017 Random Talkers E7: Innovator's Dilemma, Trump Approval, College Football's Greatest Teams Aug 20, 2017
    • Aug 17, 2017 Presidential Approval Ratings Don't Mean Much Early On Aug 17, 2017
    • Aug 6, 2017 E6: DOTA, "The Gene" Reviewed Aug 6, 2017
    • Aug 2, 2017 July Newsletter Roundup Aug 2, 2017
  • July 2017
    • Jul 30, 2017 Text Mining BBC Headlines With R Jul 30, 2017
    • Jul 24, 2017 Episode 5: The Defiant Ones, Mark Levin, Drug Prices Jul 24, 2017
    • Jul 15, 2017 Book Review: "Rediscovering Americanism: And the Tyranny of Progressivism" Jul 15, 2017
    • Jul 9, 2017 Episode 4: Google Fined, Shoe Dog Review, Game of Thrones Jul 9, 2017
    • Jul 1, 2017 TV Outliers: Game of Thrones, Breaking Bad... Grey's Anatomy? Jul 1, 2017
  • June 2017
    • Jun 26, 2017 Episode 3: Amazon, Uber, McGregor And More Jun 26, 2017
    • Jun 16, 2017 The New York Times Chose Flawed Over Failed Jun 16, 2017
    • Jun 11, 2017 Episode 2: Graded On A Curve Jun 11, 2017
    • Jun 4, 2017 The Numbers Say Avoid Calculus: An Analysis Of Miami University GPA Data Jun 4, 2017
  • May 2017
    • May 28, 2017 Episode 1: To Hire Or Not To Hire May 28, 2017
    • May 14, 2017 Drugs Are Expensive! May 14, 2017
  • April 2017
    • Apr 30, 2017 Random Thoughts: Hiring Analysts Apr 30, 2017
    • Apr 23, 2017 What Makes Content Go Viral? Scraping Deadspin To Find Out Apr 23, 2017
    • Apr 10, 2017 How Important Are Small Businesses? Apr 10, 2017
    • Apr 1, 2017 Should I Have Gotten An Online Masters In Statistics From Texas A&M? Apr 1, 2017