Devlog: Youtube Transcript and Summary Python Code

in voilk •  13 days ago

    Disclaimer: This devlog was partly generated by AI. Of course, I've gone over it for accuracy and personality.



    Devlog: Creating a YouTube Summarizer Using NanoGPT

    Over the past week, I worked on perfecting my workflow to put summaries of Youtube's videos on the blockchain. I'm not the first one who attempts this, as @taskmaster4450le is a big advocate of filling this blockchain with data. And @mightpossibly has already made a summarization bot that you could pay to access.

    My workflow and the resulting summaries are a bit different from his, though. At first, it was a manual process, but I started automating parts of it, and I finally reached a complete version.

    Below are the codes I had made with ChatGPT's help before starting to combine their functionalities into one automatic script.

    Background 1: Fetching Transcriptions

    yt_transcript.py:

    import sys
    import re
    import os
    from youtube_transcript_api import YouTubeTranscriptApi
    from bs4 import BeautifulSoup
    import requests
    
    def extract_video_id(url):
        regex = r'(?:https?://)?(?:www\.)?(?:youtube\.com/(?:[^/]+/)?(?:v|e|embed|watch|shorts)/|youtu\.be/)([a-zA-Z0-9_-]{11})'
        match = re.search(regex, url)
        
        if match:
            return match.group(1)
        return None
    
    def get_video_url():
        if len(sys.argv) > 1:
            for arg in sys.argv[1:]:
                if not arg.startswith('-'):
                    return arg
        
        video_url = input("No video URL provided. Please enter a YouTube video URL: ")
        return video_url
    
    def fetch_video_title(video_id):
        try:
            print(f"Fetching video title for video ID '{video_id}'")
            url = f"https://www.youtube.com/watch?v={video_id}"
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, 'html.parser')
            title = soup.find("title").text
            # Remove " - YouTube" from the end of the title if present
            if title.endswith(" - YouTube"):
                title = title[:-10]
            return title
        except Exception as e:
            print(f"Error fetching video title: {e}")
            return None
    
    def fetch_transcript(video_id):
        try:
            print(f"Fetching transcript for video ID '{video_id}'")
            transcript = YouTubeTranscriptApi.get_transcript(video_id)
            print("Transcript fetched successfully.")
            
            transcript_text = "\n".join([entry['text'] for entry in transcript])
            return transcript_text
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    def sanitize_filename(name):
        return re.sub(r'[^a-zA-Z0-9-_]', '-', name).strip('-')
    
    def save_transcript(title, transcript):
        sanitized_title = sanitize_filename(title)
        filename = f"{sanitized_title}.txt"
        with open(filename, 'w', encoding='utf-8') as f:
            f.write(f"{title}\n\nTranscript:\n{transcript}")
        print(f"Transcript saved to '{filename}'")
    
    if __name__ == "__main__":
        video_url = get_video_url()
        video_id = extract_video_id(video_url)
    
        if not video_id:
            print("Error: Invalid YouTube URL. Unable to extract video ID.")
        else:
            title = fetch_video_title(video_id)
            if not title:
                print("Error: Unable to fetch video title.")
            else:
                transcript = fetch_transcript(video_id)
                if transcript:
                    save_transcript(title, transcript)
    
    

    Background 2: Prompting the AI's API

    For this code, I'm using NanoGPT API which allows me to communicate with many Large Language Models, including llama 3.3 70B which I'll be using and pay for that using the $NANO cryptocurrency.

    nanogpt.py:

    import requests
    import json
    import os
    import argparse
    from dotenv import load_dotenv
    
    # Load API key from .env file
    load_dotenv()
    API_KEY = os.getenv("API_KEY")
    
    # Constants
    BASE_URL = "https://nano-gpt.com/api"
    OUTPUT_FILE = "response_output.txt"
    
    headers = {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
    
    def talk_to_gpt(prompt, system_prompt=None, model="llama-3.3-70b", messages=[]):
        """
        Send a prompt to the NanoGPT API.
    
        Args:
            prompt (str): The main user prompt.
            system_prompt (str, optional): The system-level prompt for setting context. Defaults to None.
            model (str, optional): The model to use. Defaults to "llama-3.3-70b".
            messages (list, optional): Conversation history messages. Defaults to [].
    
        Returns:
            str: The API response, or None if there was an error.
        """
        if system_prompt:
            messages.insert(0, {"role": "system", "content": system_prompt})
    
        data = {
            "prompt": prompt,
            "model": model,
            "messages": messages
        }
    
        try:
            response = requests.post(f"{BASE_URL}/talk-to-gpt", headers=headers, json=data)
            if response.status_code == 200:
                return response.text
            else:
                print(f"Error {response.status_code}: {response.text}")
                return None
        except requests.RequestException as e:
            print("An error occurred:", e)
            return None
    
    def save_response_to_file(response):
        """
        Save the response to a text file, ensuring Unicode characters are written correctly.
        
        Args:
            response (str): The response to save.
        """
        with open(OUTPUT_FILE, "w", encoding="utf-8") as file:
            file.write(response)
    
    def main():
        """
        Parse command-line arguments and interact with the NanoGPT API.
        """
        # Argument parsing
        parser = argparse.ArgumentParser(description="Interact with the NanoGPT API.")
        parser.add_argument("-p", "--prompt", type=str, help="The main prompt to send to the GPT model.")
        parser.add_argument("-s", "--system", type=str, help="The system-level prompt to set context (optional).")
        parser.add_argument("-m", "--model", type=str, default="llama-3.3-70b", help="The model to use (default: llama-3.3-70b).")
        args = parser.parse_args()
    
        # Check if a prompt is provided; if not, ask the user
        prompt = args.prompt or input("Enter your prompt: ")
        system_prompt = args.system
        model = args.model
    
        # Example messages (modify as needed)
        messages = [
            {"role": "user", "content": "I'll provide you with the transcript video now."},
            {"role": "assistant", "content": "Please go ahead and share the transcript of the video. I'll be happy to assist you with anything you need."}
        ]
    
        # Call the function
        response = talk_to_gpt(prompt, system_prompt=system_prompt, model=model, messages=messages)
        if response:
            # Split the response to separate the text and NanoGPT info
            parts = response.split('<NanoGPT>')
            text_response = parts[0].strip()
    
            # Decode Unicode characters properly
            decoded_text_response = json.loads(f'"{text_response}"')
    
            # Extract the NanoGPT info
            try:
                nano_info = json.loads(parts[1].split('</NanoGPT>')[0])
    
                # Combine and format the final output as JSON
                final_output = {
                    "response_text": decoded_text_response,
                    "nano_gpt_info": nano_info
                }
    
                # Pretty print the JSON and save to a .txt file
                formatted_output = json.dumps(final_output, indent=4, ensure_ascii=False)
                print("Formatted Output:", formatted_output)
                save_response_to_file(formatted_output)
    
                print(f"Response saved to {OUTPUT_FILE}")
            except (IndexError, json.JSONDecodeError):
                print("Failed to parse NanoGPT info.")
                save_response_to_file(f"Response: {decoded_text_response}\n\nError parsing NanoGPT info.")
        else:
            print("Failed to get response from GPT")
    
    # Ensure the script can be imported and used elsewhere
    if __name__ == "__main__":
        main()
    
    

    The App Devlog

    Step 1: Setting the Foundation

    I wanted to build a script that could fetch YouTube transcripts, send them to NanoGPT, and generate a meaningful summary. The outputs should be in a neat Markdown format.

    So, the main function of the script takes a YouTube URL, fetches its transcript, sends it to NanoGPT, and saves the result.

    Code Snippet:

    video_id = extract_video_id(video_url)
    transcript = fetch_transcript(video_id)
    response = talk_to_gpt(transcript)
    print(response)
    

    Step 2: Dynamic File Naming

    The next step was to save summaries with filenames based on the YouTube video title. The title should be sanitized of the special characters first, though. This prevents overwriting and makes files easy to identify.

    Milestone:

    Each output file is named dynamically with the format Summary-{SanitizedTitle}.md.

    Code Snippet:

    sanitized_title = sanitize_filename(title)
    filename = f"Summary-{sanitized_title}.md"
    with open(filename, 'w', encoding='utf-8') as f:
        f.write(summary)
    

    Step 3: Enhanced Help Messages

    To make the script user-friendly, I added -h and --help flags with detailed usage instructions. This ensures clarity for anyone using the script.

    Milestone:

    The script now supports --help to explain all the arguments.

    Code Snippet:

    parser = argparse.ArgumentParser(
        description="Fetch YouTube transcript and generate summary using NanoGPT."
    )
    parser.add_argument("-u", "--url", type=str, help="The YouTube video URL.")
    parser.add_argument("-p", "--prompt", type=str, help="The main prompt.")
    

    Step 4: Default System Prompt

    A default system prompt was added to guide the summarization process. If a custom prompt is provided, it takes priority.

    **Milestone:
    The script uses a high-quality default prompt but allows custom overrides.

    Code Snippet:

    DEFAULT_SYSTEM_PROMPT = """Your output should use the following template:
    
    "### Title
    
    One_Paragraph
    
    ### Heading 1
    - ✅ Bulletpoint
    - ✅ Bulletpoint
    etc
    
    ### Heading 2
    - ✅ Bulletpoint
    - ✅ Bulletpoint
    etc"
    
    Below is a transcript from a Youtube video. Clean the transcript's text, then write a summary from the video content.
    
    You Must:
    - Generate a title for the summary, especially if the video title was misleading.
    - Write a high-quality one-paragraph summary in 120 words or less.
    - Convert the transcript into bullet points, maintaining all the key information.
    - Each bullet point should be concise, focusing on one main idea. Expand on it a bit when appropriate.
    - Every bullet point starts with a suitable emoji (to replace ✅) based on its text.
    - Categorize bullet points that follow the same topic under an appropriately titled subheading, ordered by their mention order from the video's transcript.
    - If an idea linked to a time/date in the video, mention the time and date.
    - If an idea was reinforced with an example, mention the example.
    - Be as similar as possible to the video's voice and manner of speech. Avoid redundant language."""
    
    if args.system:
        system_prompt = args.system
    else:
        system_prompt = DEFAULT_SYSTEM_PROMPT
    

    Step 5: Human-Readable JSON Output

    I improved the NanoGPT response formatting by displaying the JSON part in a human-readable way, both in the terminal and the Markdown file. The AI model outputs emoji, which were harder to make compatible with the console.

    Milestone:
    JSON outputs are now neatly indented for better readability.

    Code Snippet:

    formatted_json = json.dumps(nano_info, indent=4)
    f.write(f"\n\n<NanoGPT>\n{formatted_json}\n</NanoGPT>")
    

    Step 6: Structuring the Final Output

    Finally, I structured the output to include the video title and link at the top, followed by the summary, with clear separations.

    Code Snippet:

    f.write(f"Title: {title}\nLink: {video_url}\n\n{summary}\n\n\n\n{transcript}")
    

    Full Code

    yt_ai_summary.py:

    import sys
    import re
    import os
    from youtube_transcript_api import YouTubeTranscriptApi
    from bs4 import BeautifulSoup
    import requests
    import argparse
    from dotenv import load_dotenv
    import json
    
    # Load API key from .env file
    load_dotenv()
    API_KEY = os.getenv("API_KEY")
    
    # Constants for NanoGPT API
    BASE_URL = "https://nano-gpt.com/api"
    OUTPUT_FILE = "response_output.txt"
    
    headers = {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
    
    # Default system prompt
    DEFAULT_SYSTEM_PROMPT = """Your output should use the following template:
    
    "### Title
    
    One_Paragraph
    
    ### Heading 1
    - ✅ Bulletpoint
    - ✅ Bulletpoint
    etc
    
    ### Heading 2
    - ✅ Bulletpoint
    - ✅ Bulletpoint
    etc"
    
    Below is a transcript from a Youtube video. Clean the transcript's text, then write a summary from the video content.
    
    You Must:
    - Generate a title for the summary, especially if the video title was misleading.
    - Write a high-quality one-paragraph summary in 120 words or less.
    - Convert the transcript into bullet points, maintaining all the key information.
    - Each bullet point should be concise, focusing on one main idea. Expand on it a bit when appropriate.
    - Every bullet point starts with a suitable emoji (to replace ✅) based on its text.
    - Categorize bullet points that follow the same topic under an appropriately titled subheading, ordered by their mention order from the video's transcript.
    - If an idea linked to a time/date in the video, mention the time and date.
    - If an idea was reinforced with an example, mention the example.
    - Be as similar as possible to the video's voice and manner of speech. Avoid redundant language."""
    
    def extract_video_id(url):
        regex = r'(?:https?://)?(?:www\.)?(?:youtube\.com/(?:[^/]+/)?(?:v|e|embed|watch|shorts)/|youtu\.be/)([a-zA-Z0-9_-]{11})'
        match = re.search(regex, url)
        if match:
            return match.group(1)
        return None
    
    def fetch_video_title(video_id):
        try:
            print(f"Fetching video title for video ID '{video_id}'")
            url = f"https://www.youtube.com/watch?v={video_id}"
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, 'html.parser')
            title = soup.find("title").text
            if title.endswith(" - YouTube"):
                title = title[:-10]
            return title
        except Exception as e:
            print(f"Error fetching video title: {e}")
            return None
    
    def fetch_transcript(video_id):
        try:
            print(f"Fetching transcript for video ID '{video_id}'")
            transcript = YouTubeTranscriptApi.get_transcript(video_id)
            print("Transcript fetched successfully.")
            transcript_text = "\n".join([entry['text'] for entry in transcript])
            return transcript_text
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    def sanitize_filename(name):
        return re.sub(r'[^a-zA-Z0-9-_]', '-', name).strip('-')
    
    def save_summary_to_file(summary, transcript, title, video_url, nano_info):
        sanitized_title = sanitize_filename(title)
        filename = f"Summary-{sanitized_title}.md"
        with open(filename, 'w', encoding='utf-8') as f:
            # Write the title and video link
            f.write(f"### Title:\n{title}\n")
            f.write(f"### Link:\n{video_url}\n\n")  # 2 new lines separating the content
    
            # Write the summary and transcript with 10 empty lines in between
            f.write(f"{summary}\n\n" + "\n" * 2 + f"\n{transcript}")
            
            # Write the formatted NanoGPT response in human-readable JSON
            f.write(f"\n\n<NanoGPT>\n{json.dumps(nano_info, indent=4)}\n</NanoGPT>")
            
        print(f"Summary saved to '{filename}'")
    
    def talk_to_gpt(prompt, system_prompt=None, model="llama-3.3-70b", messages=[]):
        if system_prompt:
            messages.insert(0, {"role": "system", "content": system_prompt})
    
        data = {
            "prompt": prompt,
            "model": model,
            "messages": messages
        }
    
        try:
            response = requests.post(f"{BASE_URL}/talk-to-gpt", headers=headers, json=data)
            if response.status_code == 200:
                return response.text
            else:
                print(f"Error {response.status_code}: {response.text}")
                return None
        except requests.RequestException as e:
            print("An error occurred:", e)
            return None
    
    def main():
        # Argument parsing with help description
        parser = argparse.ArgumentParser(
            description="Fetch YouTube transcript and generate summary using NanoGPT.\n\n"
                        "This script fetches the transcript of a YouTube video using its video URL. "
                        "The transcript is passed as input to the NanoGPT API, which generates a summary. "
                        "The output summary is printed in the terminal and saved to a file, along with the original transcript."
        )
        parser.add_argument("-u", "--url", type=str, help="The YouTube video URL.")
        parser.add_argument("-p", "--prompt", type=str, help="The main prompt to send to the GPT model.")
        parser.add_argument("-s", "--system", type=str, help="The system-level prompt to set context (optional).")
        parser.add_argument("-m", "--model", type=str, default="llama-3.3-70b", help="The model to use (default: llama-3.3-70b).")
        args = parser.parse_args()
    
        # Get video URL
        video_url = args.url or input("Enter your YouTube video URL: ")
        video_id = extract_video_id(video_url)
        if not video_id:
            print("Error: Invalid YouTube URL. Unable to extract video ID.")
            return
    
        title = fetch_video_title(video_id)
        if not title:
            print("Error: Unable to fetch video title.")
            return
    
        transcript = fetch_transcript(video_id)
        if not transcript:
            print("Error: Unable to fetch transcript.")
            return
    
        # Set system prompt: default or passed in argument
        system_prompt = args.system or DEFAULT_SYSTEM_PROMPT
        prompt = args.prompt or transcript
    
        # Get response from NanoGPT
        response = talk_to_gpt(prompt, system_prompt=system_prompt, model=args.model)
        if not response:
            print("Error: Unable to get response from NanoGPT.")
            return
    
        print("NanoGPT response:")
        print(response)
    
        # Extract NanoGPT info from the response
        try:
            parts = response.split('<NanoGPT>')
            text_response = parts[0].strip()
            # Decode JSON part
            nano_info = json.loads(parts[1].split('</NanoGPT>')[0])
    
            # Save summary and transcript to file with human-readable JSON
            save_summary_to_file(text_response, transcript, title, video_url, nano_info)
        except (IndexError, json.JSONDecodeError):
            print("Error parsing NanoGPT info.")
    
    if __name__ == "__main__":
        main()
    
    
    

    Final Thoughts

    The way I asked ChatGPT to co-develop the code with me was incremental. Asking for one feature at a time, and testing it as I go. From fetching transcripts to structuring the output with Markdown and JSON, each step brought the project closer what I envisioned. I hope the final script is user-friendly enough.

    What’s next?

    Making the code also post these summaries on HIVE is something I'm thinking of doing, but don't know how to best approach it yet. I'd love to add GUI functionality too! But for now, I’m just happy with how far this has come.

    Thanks for Reading.~

    Posted Using InLeo Alpha

      Authors get paid when people like you upvote their post.
      If you enjoyed what you read here, create your account today and start earning FREE VOILK!