A Random Walker

  • Home
  • Random Talkers
  • About
youtube-2712573_1280.png

Creation Stories: Analyzing YouTube Data With R

October 10, 2017 by Adam Walker

Earlier this year, YouTube introduced Creators On The Rise, a showcase for up and coming stars on the platform. For those of us with somewhat modest YouTube followings, these creators are VERY intriguing due to their rapid growth on generally low budgets. How frequently do creators post videos, and when? Which strategies do they implement to create appealing thumbnails and titles? And most importantly, how on earth do these channels fill hours and hours of content each month? It's a bit of a puzzle.

To analyze the traits of these new media success stories I turned to tuber, a handy R package that faciliates access to YouTube's API. The packagae contains several easy functions to grab data on paticular channels or videos. Just make sure to register via Google's developer console to get the necessary API ID and key.

With a little bit of wrangling I compiled a dataset of 92 channels previously featured as Creators On The Rise. Variables available included subscriber counts, likes, comments, plus metadata on 9,410 published channel videos. All data + code is available on my Github. ADD LINK


Screen Shot 2017-10-10 at 3.19.04 PM.png

"With more than 1,000 creators crossing the 1,000 subscriber threshold every single day, new talent is constantly emerging." 

YouTube introductory post


1. How Long Till I Become A YouTube Superstar?

The 92 creators represent an interesting mix of relatively fast success versus long-term grind. Channels like Deestroying and Hailey Reese amass subcribers with abandon, while 007craft and Amazing Grays take a more leisurely approach.

 
 
Screen Shot 2017-10-10 at 7.49.22 PM.png
 
 

YouTube's selection criteria for a "Creator On The Rise" likely means this chart represents an overly optimistic view of the time it takes to become a YouTube superstar. Plus it's worth noting that some creators may have had an existing following prior to creating their channel - Sophina The Diva, for example, has garnered 64K subscribers in a little over a month, but had the advantage of already being rather famous. Nevertheless, building a YouTube empire does not have to be a decades-long affair.

2. But Do I Need A Viral Hit?

It doesn't hurt. Let's return to the previously mentioned 007craft. 007craft (I'm not sure anyone knows his real name) attracted attention earlier this year as the guy who was living in a storage locker. His video on the experience has collected over 3.5M views, representing an outsized portion of his channel's 11.8M total.

Of course, creating a viral video is tough. 55 videos in the data have broken 1 million views, representing about 0.6% of the total. Still, to have a shot you probably want to keep your clip under 15-20 minutes:

Screen Shot 2017-10-10 at 4.21.32 PM.png

"FEEDING THE DEVIL | Spiders and Centipede" can be found here for those interested. I thought insect-averse readers would be glad to avoid checking out the thumbnail, so here is the second place video instead:

Subsetting the data to only the highest viewed video from each channel, we find the 92 videos have a median view count of 553K. This is certainly a lofty bar, but your channel doesn't have a Charlie bit my finger level of virality to add subs.

3. What Should I Make My Videos About?

All the channels use tags to help people find their content. Parsing individual tags and looking for positive correlations (i.e. a given pair of tags often appear in the same video), we can get a general idea of the kinds of topics creators are discussing:

Screen Shot 2017-10-10 at 3.03.45 PM.png

Most of the content is fairly down to earth. There are fitness channels, family-oriented affairs, make-up tutorials and more. And very little politics! Perhaps YouTube is consciously refraining from naming political channels to the "On The Rise" section? Or maybe political takes aren't all that compelling after all...

4. How Much Content Does A Top Content Creator Create?

You might wonder if it's possible to build a large following without quitting your day job? Maybe! Most creators are posting steadily but certainly not daily. The data below shows publication frequency over the past 52 weeks for all the channels created prior to the start of the period:

Screen Shot 2017-10-10 at 2.28.59 PM.png

CatPusic might be my favorite of all the channels in the data.

In terms of timing, Friday is the most popular day of the week for publishing new videos:

Screen Shot 2017-10-10 at 2.37.12 PM.png

5. The Rich Life...What's That?

The attentive reader will have noticed an outlier in the second to last chart. The Rich Life follows a "homeschooling family of 7 that loves to share the good, the bad and maybe the occasional ugly." Of course, the idea of family as #brand is definitely nothing new, but 200K subs in a little over a year is impressive regardless.

Beyond its outstanding publication frequency, the channel's clip game is on point:

Screen Shot 2017-10-10 at 12.43.39 AM.png

The titles alone entice you to click and find out what hilarous shenanigan have occurred, and the thumbnails are attention-grabbing and varied. Plus the sheer variety of hijinks is impressive! If you have recently been kicked in the head by a horse, suffered a break-in, evaded a tornado, and dealt with a police run-in...you may have what it takes to be a top YouTube creator.

6. Final Thoughts

Overall the data available from the YouTube API is impressive

Lots more to do on this topic:
Can model out the views
Analyze the video titles and descriptions
Look at the thumbnails
Dataset/code is here….

etc.

October 10, 2017 /Adam Walker
  • Newer
  • Older
  • May 2018
    • May 21, 2018 Random Talkers E26: Skin In The Game May 21, 2018
  • April 2018
    • Apr 29, 2018 Random Talkers E25: Blockchain Basics Apr 29, 2018
    • Apr 15, 2018 Random Talkers E24: Weapons Of Math Destruction Apr 15, 2018
    • Apr 15, 2018 Random Talkers E23: The Singularity Is Near(ish) Apr 15, 2018
    • Apr 15, 2018 Random Talkers E22: Universal Basic Income Explained Apr 15, 2018
    • Apr 15, 2018 Random Talkers E21: The Master Algorithm Apr 15, 2018
  • February 2018
    • Feb 23, 2018 Random Talkers E20: China’s Social Credit System, “The Gray Rhino” Reviewed Feb 23, 2018
    • Feb 10, 2018 Random Talkers E19: Blockchain Applications, “American Kingpin” Reviewed Feb 10, 2018
  • January 2018
    • Jan 24, 2018 Random Talkers E18: The Innovator's Solution Jan 24, 2018
    • Jan 20, 2018 YouTube Creators Shouldn't Blame Logan Paul For Changes Jan 20, 2018
    • Jan 18, 2018 Random Talkers E17: The Hard Thing About Hard Things Jan 18, 2018
  • December 2017
    • Dec 27, 2017 Random Talkers E16: Where Are The Aliens, What’s Up With The Octopus? Dec 27, 2017
    • Dec 13, 2017 Random Talkers E15: Delivering Happiness, Debating Bitcoin Dec 13, 2017
  • November 2017
    • Nov 29, 2017 E14: Zero To One Review, Net Neutrality Debated Nov 29, 2017
    • Nov 21, 2017 "Guess The Stock!" With R And Shiny Nov 21, 2017
    • Nov 14, 2017 E13: The Everything Store Reviewed, Waymo Surges Ahead Nov 14, 2017
    • Nov 7, 2017 E12: Superintelligence And The Future Of AI, ESPN Ditches Barstool Nov 7, 2017
  • October 2017
    • Oct 23, 2017 Bill Simmons Doesn't Talk To Women Much Oct 23, 2017
    • Oct 22, 2017 E11: “The Upstarts” Book Review, The USMNT’s World Cup Rejection Oct 22, 2017
    • Oct 11, 2017 Creation Stories: Analyzing YouTube Data With R Oct 11, 2017
    • Oct 6, 2017 E10: An Elon Musk Biography, A 70 Hour Workweek, And Another Uber Segment! Oct 6, 2017
  • September 2017
    • Sep 17, 2017 Random Talkers E9: Disney Takes On Netflix, Amazon Announces HQ2, Equifax Loses Our Data Sep 17, 2017
    • Sep 14, 2017 Random Talkers E8: Taylor's Return, Uber's New CEO, and Giving Smartly Sep 14, 2017
  • August 2017
    • Aug 31, 2017 August Newsletter Roundup Aug 31, 2017
    • Aug 20, 2017 Random Talkers E7: Innovator's Dilemma, Trump Approval, College Football's Greatest Teams Aug 20, 2017
    • Aug 17, 2017 Presidential Approval Ratings Don't Mean Much Early On Aug 17, 2017
    • Aug 6, 2017 E6: DOTA, "The Gene" Reviewed Aug 6, 2017
    • Aug 2, 2017 July Newsletter Roundup Aug 2, 2017
  • July 2017
    • Jul 30, 2017 Text Mining BBC Headlines With R Jul 30, 2017
    • Jul 24, 2017 Episode 5: The Defiant Ones, Mark Levin, Drug Prices Jul 24, 2017
    • Jul 15, 2017 Book Review: "Rediscovering Americanism: And the Tyranny of Progressivism" Jul 15, 2017
    • Jul 9, 2017 Episode 4: Google Fined, Shoe Dog Review, Game of Thrones Jul 9, 2017
    • Jul 1, 2017 TV Outliers: Game of Thrones, Breaking Bad... Grey's Anatomy? Jul 1, 2017
  • June 2017
    • Jun 26, 2017 Episode 3: Amazon, Uber, McGregor And More Jun 26, 2017
    • Jun 16, 2017 The New York Times Chose Flawed Over Failed Jun 16, 2017
    • Jun 11, 2017 Episode 2: Graded On A Curve Jun 11, 2017
    • Jun 4, 2017 The Numbers Say Avoid Calculus: An Analysis Of Miami University GPA Data Jun 4, 2017
  • May 2017
    • May 28, 2017 Episode 1: To Hire Or Not To Hire May 28, 2017
    • May 14, 2017 Drugs Are Expensive! May 14, 2017
  • April 2017
    • Apr 30, 2017 Random Thoughts: Hiring Analysts Apr 30, 2017
    • Apr 23, 2017 What Makes Content Go Viral? Scraping Deadspin To Find Out Apr 23, 2017
    • Apr 10, 2017 How Important Are Small Businesses? Apr 10, 2017
    • Apr 1, 2017 Should I Have Gotten An Online Masters In Statistics From Texas A&M? Apr 1, 2017