Recently, I worked on a side project where I needed to extract captions from YouTube videos. While there are plenty of tools available for this, I decided to create my own solution in Go. This was primarily for the sake of learning Go, and now, I’d like to share that experience with you.
YouTube API
My initial approach was to determine if extracting captions was feasible using the official YouTube API. I discovered an endpoint that seemed to allow downloading video captions, so I quickly developed a Go script with the official YouTube Go package to test it out.
This was my first mistake.
I should have spent more time reading the endpoint documentation because, even if this is not clearly mentionned, only the video owner can use this API, or the video must have third-party contributions enabled for captions.
But, not all my time was wasted. I also explored the playlist API, which will be useful to be able to download the captions for an entire playlist.
func downloadPlaylistCaptions(service *youtube.Service, playlistID string, nextPageToken string) error {
part := []string{"snippet"}
playlistItemsList := service.PlaylistItems.List(part).PlaylistId(playlistID)
if nextPageToken != "" {
playlistItemsList = playlistItemsList.PageToken(nextPageToken)
}
items, err := playlistItemsList.Do()
if err != nil {
return fmt.Errorf("unable to list playlist items: %w", err)
}
for _, item := range items.Items {
// ...
}
if items.NextPageToken != "" {
return downloadPlaylistCaptions(service, playlistID, items.NextPageToken)
}
return nil
}
Unofficial YouTube API
After researching other existing tools, I discovered a surprisingly simple method to download video captions.
All you need to do is open the YouTube video page, such as https://www.youtube.com/watch?v=mT0RNrTDHkI.
Then, open the browser console and enter:
ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks[0].baseUrl
This will give you a caption file URL.
Also note that the captionTracks array contains the captions for all the available languages.
Then we just need to write a simple script that queries the video page, extracts the necessary variable, and retrieves the base url and language code for each available caption. From there, we can download the caption that interests us.
package main
import (
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"os"
"path/filepath"
"strings"
)
const baseURL = "https://www.youtube.com/watch?v="
type ytInitialPlayerResponse struct {
Captions struct {
PlayerCaptionsTracklistRenderer struct {
CaptionTracks []struct {
BaseUrl string `json:"baseUrl"`
LanguageCode string `json:"languageCode"`
} `json:"captionTracks"`
} `json:"playerCaptionsTracklistRenderer"`
} `json:"captions"`
}
type Caption struct {
BaseUrl string
LanguageCode string
}
func (c *Caption) Download(targetPath string) error {
resp, err := http.Get(c.BaseUrl)
if err != nil {
return fmt.Errorf("unable to download caption: %w", err)
}
defer resp.Body.Close()
file, err := os.Create(targetPath)
if err != nil {
return fmt.Errorf("unable to create file: %w", err)
}
defer file.Close()
_, err = io.Copy(file, resp.Body)
if err != nil {
return fmt.Errorf("unable to write file: %w", err)
}
return nil
}
func listVideoCaptions(videoID string) ([]Caption, error) {
resp, err := http.Get(baseURL + videoID)
if err != nil {
return nil, fmt.Errorf("unable to download video page: %w", err)
}
defer resp.Body.Close()
content, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("unable to read response body: %w", err)
}
pageContent := string(content)
// Find ytInitialPlayerResponse variable
pageContentSplited := strings.Split(pageContent, "ytInitialPlayerResponse = ")
if len(pageContentSplited) < 2 {
return nil, fmt.Errorf("unable to find ytInitialPlayerResponse variable")
}
// Find the end of the variable
pageContentSplited = strings.Split(pageContentSplited[1], ";</script>")
if len(pageContentSplited) < 2 {
return nil, fmt.Errorf("unable to find the end of the ytInitialPlayerResponse variable")
}
ytInitialPlayerResponse := ytInitialPlayerResponse{}
err = json.Unmarshal([]byte(pageContentSplited[0]), &ytInitialPlayerResponse)
if err != nil {
return nil, fmt.Errorf("unable to unmarshal ytInitialPlayerResponse: %w", err)
}
captions := make([]Caption, 0, len(ytInitialPlayerResponse.Captions.PlayerCaptionsTracklistRenderer.CaptionTracks))
for _, caption := range ytInitialPlayerResponse.Captions.PlayerCaptionsTracklistRenderer.CaptionTracks {
captions = append(captions, Caption{
BaseUrl: caption.BaseUrl,
LanguageCode: caption.LanguageCode,
})
}
return captions, nil
}
func main() {
if len(os.Args) < 2 {
log.Fatalf("usage: %s <videoID>", filepath.Base(os.Args[0]))
}
videoID := os.Args[1]
captions, err := listVideoCaptions(videoID)
if err != nil {
log.Fatalf("unable to list video captions: %v", err)
}
for _, caption := range captions {
if caption.LanguageCode == "en" {
err := caption.Download(fmt.Sprintf("%s.xml", videoID))
if err != nil {
log.Printf("unable to download caption: %v", err)
}
}
}
}
And try it with:
go run main.go mT0RNrTDHkI && cat mT0RNrTDHkI.xml
And voilà! Here’s a link to the GitHub repository. The implementation has evolved a bit, but the idea remains the same.
Learn more How to download YouTube captions using a Go script