Automating Link Uploads to NotebookLM Using Python and Playwright

by Nathan Purvis

Background

Google’s NotebookLM is a fantastic tool that leverages the power of Gemini to digest multiple sources and provide you with all kinds of resources including podcasts, mindmaps, study guides and more. But if you’ve ever tried to add more than a handful of links manually, you’ll know first-hand that the existing process is a pain. With no option to bulk add and in the absence of an official API, you’re left with no choice other than to copy, paste and repeat.

That’s where this project comes in! I recently built a lightweight Python solution using Playwright to automate the entire source upload flow. The goal? Allow users to supply a list of web/YouTube links, hit run, and end up with a new notebook auto-filled with those sources - up to the platform’s limit of 300 sources.

Why Playwright?

Having previously used Selenium for browser automation in the form of webscraping, I wanted to see if other options would be more effective for this. I eventually landed on Playwright due to:

  • Support for both headless and headed modes - allowing easy testing and troubleshooting before clean execution upon release

  • Easy Chromium installation and support

  • Simple, seamless handling of persistent login states

  • Integration with Python and clean APIs for modern, dynamic sites like NotebookLM

Managing and persisting Google Login

As NotebookLM is a Google product, we need to use a Google account for authentication. Obviously we don’t want to hard-code credentials in our code base and bypassing login isn’t an option, nor would we want to do this as we want to access our notebooks after making them! Here’s where set_login_state.py steps in to save the day. Running this script:

  1. Launches a fresh browser session

  2. Opens NotebookLM and waits for manual login

  3. Stores cookies and local/session storage to a state.json file in your directory (don’t worry, this is already in .gitignore!)

The result? You can now run the rest of the project, which references this file, without having to continuously re-authenticate. Your login state is retained without having to compromise on security.

Automating Source Uploads

Now we move to the heart of the project - main.py. This script does all of the heavy lifting, looping through all of your sources and adding them to a new notebook. The key steps are:

  1. Loading the login state created my set_login_state.py

  2. Navigating to NotebookLM and creating a new notebook

  3. Iterating through a CSV of links in the /sources subdirectory using helper functions

    1. file_handler.py manages file operations and validations, ensuring source files are correctly formatted and located

    2. links.py handles the process of reading these links and preparing them for upload

    3. Both scripts can be found in the /functions subdirectory

Before we can start the real magic, file_handler.py conducts a bit of pre-processing, running some sense checks on the source files to make sure they are in the right location, populated and named correctly, with some informational outputs like the number of blank rows in case the user wasn’t aware of these:

Onwards to the actual flow, Playwright has all of the necessary functionality built-in to ensure this process runs smoothly. For example, a code block like so:

link_button = page.locator(
    "span.mdc-evolution-chip__text-label", has_text=re.compile(f"{source_type}",re.I)
)
link_button.wait_for(state="attached")
link_button.click()

In this code block, the locator method searches for a button element containing the specified text label (e.g., 'Website' or 'YouTube', depending on the source type). The wait_for function ensures that the button is present in the DOM and is in an 'attached' state, meaning it's ready for interaction. Finally, the click method simulates a user click on the button. This sequence ensures that the script interacts with the button only when it's fully loaded and interactive, preventing potential errors from attempting to click an unready element.

To loop through sources, this is handled by links.py. You’ll notice two if statements that check if we’re in the first or last iteration depending on the number of URLs provided. This is because, if we’re at the first, we want to create a new notebook to begin with:

And if we’re now on the last source, we don’t want to keep clicking the ‘Add source’ button and continuing the flow:

Once we’ve broken out of this loop and finished adding all sources to our notebook, we close out by setting the title provided by the end-user, citing the execution time for a bit of extra information:

Note: If you ever wanted to watch the process in action, you can simply change the following in main.py:

headless=True

To:

headless=False

Future ideas

As of now, this project works great for Website and YouTube links, but I’ve been toying with a few ideas if there is enough of a demand, such as:

  • Adding ‘Copied text’ support for users who have notes saved locally in .txt files

  • Turning this into a CLI tool to make the user experience smoother

More thoughts and suggestions are warmly welcomed, as are pull requests!

Final thoughts

This was an incredibly enjoyable build - very simple on the surface, but with some key nuances once you get into the detail of the UI flow and handling things like account login. If you’re spending more time uploading links than actually using NotebookLM, give this a try - it’s a small but powerful script that gives you back a bunch of time!

You can find the GitHub repo here, complete with an easy-to-follow README containing more information and steps to get started. Make sure to give it a star if you find it useful!

As usual, if you have any additional suggestions, feedback, or requests for future content then please do reach out!

Next
Next

Automating upcoming Data Engineering event notifications with Python, dbt, Snowflake & Zapier