Do more with help from AI
As a scientist, you can greatly benefit from using tools that can transcribe audio into text and summarize it. This allows you to quickly get text summaries of lengthy videos without having to watch them, saving you hours of time. You can do this for free, assuming the video resides in YouTube and it has a transcript associated with the video.
You will be using Google Colab and the YouTube Transcript API to automate the transcription and summarization of YouTube videos, saving time and effort in your research endeavors. Here’s a step-by-step guide to help you get started:
Prerequisites:
Before diving into the process, make sure you have the following:
- A Google account.
- Basic knowledge of copying and pasting 🙂
Step 1 – Copy YouTube Video ID:
- Find the YouTube video id by going to the YouTube video, clicking the ‘Share’ button and copying the id as shown below.
- Paste the Video Id into a document for later use.
Step 2 – Create a new Python notebook in Google Colab:
- Go to https://colab.research.google.com and log in with your Google account.
- Click on “New Notebook” to create a new notebook.
Step 3 – Copy the provided code and paste into a Google Colab Notebook:
- Copy the code provided below and paste it into your notebook:
# First, install the youtube_transcript_api package
!pip install youtube_transcript_api
from youtube_transcript_api import YouTubeTranscriptApi
import re
def youtube_transcribe(video_id):
"""
Given a YouTube video ID, this function will return the transcribed text.
"""
transcript = YouTubeTranscriptApi.get_transcript(video_id)
result_text = ''
for component in transcript:
result_text += ' ' + component['text']
# Clean the text
result_text = re.sub(r'\[.*?\]', '', result_text) # remove all within brackets
result_text = re.sub(r'\(.*?\)', '', result_text) # remove all within parenthesis
result_text = re.sub(r'\n', ' ', result_text) # remove linebreaks
result_text = re.sub(r'\s+', ' ', result_text) # remove extra spaces
return result_text
# Replace 'YOUR_VIDEO_ID' with your YouTube video's ID
video_id = 'Add YouTube Video ID HERE'
# Getting the transcript
transcript = youtube_transcribe(video_id)
# Writing the transcript to a .txt file
with open('transcript.txt', 'w') as f:
f.write(transcript)
- Paste the code from above into your newly created Google Colab notebook.
Step 4 – Paste Video ID and Run the Code
- You are almost ready to get your video transcript. You first need to paste the Video Id you got in step 1 into the code, and then hit the run button.
Step 5 – Get the Transcript
- You will know the code has finished running when you see a little check mark next to the ‘Run’ button. Once that appears you can click on the folder icon and you will be able to download the transcript from the video.
Step 6 – Converse with the summarized text:
- Copy the full transcript into ChatGPT or another conversational AI tool. Ask the AI questions about the key points or summarize the transcript. This allows you to have a natural dialogue to further digest and analyze the content.
Following this simple workflow enables you to leverage AI to efficiently process and gain insights from hours of video content.
Happy experimenting!