Turn Entire YouTube Playlists to Markdown Formatted and Refined Text Books (in any language)
Give it any YouTube playlist(entire courses for instance) and receive a clean, formatted and structured file with all the details of that playlist. It's a simple yet effective script using the free Google Gemini API. I haven't found any free tool available with this scale, so I made one. Check it out : https://github.com/Ebrizzzz/Youtube-playlist-to-formatted-text The project is divided into two main processing stages: Transcript Extraction: Using the pytube library and youtube_transcript_api, the application fetches the transcript for each video. It handles both full playlists and single video URLs. The extracted text is then consolidated into a single text file. AI-Powered Refinement: The extracted transcript is then processed by Google’s Gemini API. Each chunk of a video is sent along with a context prompt that instructs Gemini to reorganize the content — adding headings, bullet points, and other formatting elements — without omitting any details. The final output is a neatly formatted markdown file. Chunking and Context Strategy for Consistent Refinement One of the key challenges of processing long-form transcripts is maintaining a consistent flow and structure across multiple chunks. Here’s how the process works: Segmentation by Video: The entire transcript is first divided into separate sections for each video. This ensures that every video’s content is processed individually and retains its unique context. Dividing Each Video into Chunks: For each video, the transcript is further split into chunks based on the number of words. This step is critical because the Gemini API has a limited context window. We empirically set a chunk size (e.g., 3000 words) to strike a balance between detail and manageability. Refinement with Contextual Prompts: First Chunk: The initial chunk of a video is sent directly to Gemini(with the fixed prompt) for refinement. This establishes the baseline structure and style. Subsequent Chunks: For every following chunk, the prompt is prefixed with a context reminder that includes the previously refined text. Essentially, it tells Gemini, “This is what you already processed; now, refine what comes next.” This method ensures that the AI doesn’t treat each chunk in isolation but instead builds upon previous content, maintaining a consistent narrative flow. Choosing the Right Context Size: Selecting the appropriate context size is tricky. It needs to be small enough to fit within the Gemini API’s input and output limits — ensuring no detail is lost or overly summarized — yet large enough to provide sufficient context to avoid hallucinations or inconsistencies. This balance is crucial for generating coherent and accurate refinements across the entire video transcript. Key Features Automatic Transcript Extraction: Seamlessly pulls transcripts from YouTube playlists or individual video URLs. Language Support: Users can specify the output language. Although English is the default, the application supports other languages — give it a try! PyQt5 Graphical Interface: A simple yet effective GUI built with PyQt5. Batch Processing & Gemini Refinement: The application handles large playlists by processing each video sequentially and using a robust prompt for the Gemini API to ensure coherent, refined output. Customizable Gemini Model: Choose from different Gemini models (e.g., gemini-1.5-flash, gemini-2.0-flash) depending on your needs.

Give it any YouTube playlist(entire courses for instance) and receive a clean, formatted and structured file with all the details of that playlist.
It's a simple yet effective script using the free Google Gemini API.
I haven't found any free tool available with this scale, so I made one.
Check it out : https://github.com/Ebrizzzz/Youtube-playlist-to-formatted-text
The project is divided into two main processing stages:
Transcript Extraction:
Using the pytube library and youtube_transcript_api, the application fetches the transcript for each video. It handles both full playlists and single video URLs. The extracted text is then consolidated into a single text file.
AI-Powered Refinement:
The extracted transcript is then processed by Google’s Gemini API. Each chunk of a video is sent along with a context prompt that instructs Gemini to reorganize the content — adding headings, bullet points, and other formatting elements — without omitting any details. The final output is a neatly formatted markdown file.
Chunking and Context Strategy for Consistent Refinement
One of the key challenges of processing long-form transcripts is maintaining a consistent flow and structure across multiple chunks. Here’s how the process works:
Segmentation by Video:
The entire transcript is first divided into separate sections for each video. This ensures that every video’s content is processed individually and retains its unique context.
Dividing Each Video into Chunks:
For each video, the transcript is further split into chunks based on the number of words. This step is critical because the Gemini API has a limited context window. We empirically set a chunk size (e.g., 3000 words) to strike a balance between detail and manageability.
Refinement with Contextual Prompts:
First Chunk: The initial chunk of a video is sent directly to Gemini(with the fixed prompt) for refinement. This establishes the baseline structure and style.
Subsequent Chunks: For every following chunk, the prompt is prefixed with a context reminder that includes the previously refined text. Essentially, it tells Gemini, “This is what you already processed; now, refine what comes next.” This method ensures that the AI doesn’t treat each chunk in isolation but instead builds upon previous content, maintaining a consistent narrative flow.
Choosing the Right Context Size:
Selecting the appropriate context size is tricky. It needs to be small enough to fit within the Gemini API’s input and output limits — ensuring no detail is lost or overly summarized — yet large enough to provide sufficient context to avoid hallucinations or inconsistencies. This balance is crucial for generating coherent and accurate refinements across the entire video transcript.
Key Features
- Automatic Transcript Extraction:
- Seamlessly pulls transcripts from YouTube playlists or individual video URLs.
- Language Support: Users can specify the output language. Although English is the default, the application supports other languages — give it a try!
- PyQt5 Graphical Interface:
- A simple yet effective GUI built with PyQt5.
- Batch Processing & Gemini Refinement:
- The application handles large playlists by processing each video sequentially and using a robust prompt for the Gemini API to ensure coherent, refined output.
- Customizable Gemini Model: Choose from different Gemini models (e.g., gemini-1.5-flash, gemini-2.0-flash) depending on your needs.