Saturday, March 29, 2014

Do you know how to extract subtitles (Captions) from a YouTube video?

The word "subtitle" is the prefix "sub-" (below) followed by "title". In a YouTube video a "caption" is only available on video where the owner has added them, and on certain video where YouTube automatically provides them. Subtitles exist in two forms: open subtitles are 'open to all' and cannot be turned off by the viewer; closed subtitles are designed for a certain group of viewers, and can usually be turned on/off or selected by the viewer.

Note: The terms captioning and subtitling may be confusing to some readers. Both terms relate to the addition of onscreen text that renders dialogue.  (Sorry! I am not going to list the differences between them, rather i will focus on how to extract them from a YouTube video.)

Subtitles can be used in a lot of ways! You can watch the video in slow mode to learn English, if it is not your mother tongue. You can use video tutorials step by step instructions to learn technique while keeping note. if Subtitles supported by most video players. You can also save the document key value for all videos.

Here are some of the ways you can extract the captions or subtitles from a YouTube video:

Method 1: By Using 4K Video Downloader 
4K Video Downloader allows to download video, audio and subtitles from YouTube in high-quality and as fast as your computer and connection will allow. Please, follow few simple steps to download the video with subtitles from YouTube.

Step 1: Download and install the 4K Video Downloader
Download and install the 4K Video Downloader application from the download page. It's available for Mac OS X, Windows and Linux.

Step 2: Copy the YouTube video link from your address bar in the browser

Step 3: Press "Paste Url" button in 4K Video Downloader application.
Step 4: Choose the quality of video and specify that you want to download subtitles and select a language. After that, click "Download".

Method 2: Extracting the XML file of the video
  • Open the video page in Chrome browser (or any other browser that provides HTTP debugging/Developer Tools) and pause the video
  • Right click anywhere on the page, and click on Inspect Element OR hit the F12 function key.
  • Click on Network tab
  • Under the Network tab look for an item called timedtext. 
  • Right click on it and open that file in a new tab.
  • An xml file containing subtitles with their timestamps (the stuff inside of < >) opens up.

To get rid of the timestamps and just have the plain transcript, here is what you have to do:
  • Open up Microsoft Excel
  • Copy paste the subtitles inside one cell
  • Press Ctrl+H
  • In the replace tab type <*> in the Find What textbox and leave the Replace With  textbox  blank, and click Replace All. The search expression will remove all tags within the original text.
Update: The original URL pattern (struck through) appears to have changed. The subtitles can be fetched only if captions are manually transcribed i.e. not automatically generated

Reading text in a XML file is for machines. To view just the text in a XML (or HTML file), Just paste the XML content in EditPlus and use the Ctrl + Shift + P keyboard shortcut to convert. With other editors (You can try this in Notepad++ & Visual Studio) that support regular expressions, you can paste the XML content into it and use the expression <(.|\n)*?> with the Find and Replace option to get just plain text.

Method 3: By Using Google2SRT
Google2SRT is a tool that can download "not embedded" subtitles (Closed Captions - CC) from YouTube/Google Video videos (if those are present) and convert them to a standard format (SubRip - SRT) supported by most video players. 

On this window you can load a subtitles file from YouTube/Google Video by writing video URL, and choose which subtitles want to convert and where you want to save them.

Now you can enjoy watching your favorite subscribed videos by downloading them with subtitles!


