After 6 months of sporadic vibe coding sessions and banging my head against my desk, I have a working MVP — Web Gems.
What’s a Web Gem?
If you’re a millennial male you may get nostalgic flashbacks of late summer nights watching this Baseball Tonight segment.
But these Web Gems are highlights from podcasts. I’m a podcast addict. Every morning I’m either listening to Prof G, My First Million, or something AI related.
I use Snipd to listen to my podcasts. Made for podcast power listeners, the iOS app allows you to highlight parts of a podcast. So if I hear something that I find interesting I can come back to it.
The problem I noticed is that I never went back to the highlights.
So I discovered that Snipd had an integration with Notion. I could send my “snips” to a Notion database and store the content (see here). Highlight names, speakers, and the transcript.
Still, I never took time to go back to Notion to look at the different highlights. However at least the highlights were in a database which means I could do programming on them. I started to think about different things I can do with the data. I thought there were ways I can create automated newsletters or LinkedIn posts.
And then I thought, wouldn’t it be cool to have a video feed of all of the content that stuck out from the podcasts? I could then send the videos to colleagues and friends whenever I heard something relevant.
Really that was the main goal, having video links I could send to whoever I wanted. I never had a grand vision of shipping something that could become a real business. I just wanted it to exist for my own personal use and I thought it would be a cool project to work on while also building up more technical skills.
On a more macro basis, I believe we’re all drowning in vast oceans of content. Useful and powerful insights are here today and gone tomorrow. Too much to consume. Web Gems is my personal life raft.
Below was the whole process. Skip ahead to the end if you want some vibe coding tips.
Parsing Notion
Firstly I had to figure out some key things. For one, how do I get the podcast highlights out of Notion into a better data structure? The way Snip sends the content to the Notion database is less than ideal. The main object at the row level is the podcast episode, not highlights. If I highlighted two moments in one episode, they would be in one row. The highlight content was stored in the object’s page using notion toggle blocks (per the below).
I worked with Cursor to understand the structure and build a function that could parse through the data and extract the individual highlights. It wasn’t perfect or easy but eventually the parser extracted the data correctly and consistently.
Moving to a Real Backend & Finding YouTube Videos
Notion’s databases are useful but I needed something with more storage capabilities that would work well with a React app.
Supabase is the database of choice for vibe coders. I was introduced to Supabase from a course I took about using Cursor. It’s a cloud-hosted Postgres SQL db with easy to use features.
Once I figured out how to extract the highlights from Notion I could easily put them into a Supabase table. Now that they were neatly stored, I had to figure out how to find the matching YouTube video for the podcast. While I thought finding the corresponding podcast YouTube video would be easy, it wasn’t. I consistently ran into bugs and downloading the wrong videos with the simple search query logic.
So instead of generic searches of the podcast name and episode name, I decided to change the logic to first find the podcast channel on YouTube. Each of the podcasts I listen to has their own YouTube channel and ID. So I built a lookup table with the YouTube details for my function to reference when it came across highlights. So if it was a Prof G highlight, it would check the Prof G podcast channel. This required some manual effort (and still does to add new podcasts) but was much more accurate.
I then downloaded each episode’s full video content to my laptop.
Smart Video Clipping
After I had the video, I needed to build functions to find the part of the video to clip. Luckily Snip’d does provide timestamps in its API to Notion and I was storing them in my Supabase table. I used the timestamps in a clipping function to create the video highlights after extracting the full video from YouTube using yt-dlp.
Sounds easy enough right? But again it wasn’t.
See, the podcast content can be slightly different than the YouTube video that’s posted. The main reason for this is ads, particularly at the beginning of episodes. So a lot of my favorite podcast’s video length would be a few minutes off from the actual highlight which makes a huge difference when trying to get 90 second clips.
Big headache.
I originally looked to download the captions from the YouTube API and store those on my local. I could then match what I had in my Supabase/Notion transcript column to the YouTube captions. However, YouTube captions aren’t reliable and the way they are uploaded can be different. You can auto generate them or manually upload your own caption files. Most podcasts auto-generate them. The main issue I had with the auto-generated captions is that the captions didn’t have timestamps. So my matching system wouldn’t work.
Then I started to think about how I could create my own transcripts for the videos. This was something I originally wanted to avoid as I wanted to do my best to keep the project simple. But I had hit a log jam and needed a better solution.
Using AI to Transcribe Videos
I knew AI transcription had become more accessible to devs over the last several years. I was also familiar with using Open AI’s API from other projects.
So I started researching and testing Open AI’s Whisper. Whisper transcribes audio into natural language and has an API users can access at a cost. And Whisper’s API also allows you to add timestamps which weren’t really available via the YouTube API. So for a given podcast clip each sentence in a transcript would carry a timestamp, meaning if I could identify the matching transcript I could get the true time stamp for the highlight in the podcast video I downloaded.
My fear was that transcribing my videos would be too expensive. But the cost was actually quite low. For example in the month of June I transcribed 13 hours of content for only $4.87. Shoutout to the team at OpenAI for building this product and making it so cheap.
Now even after I got the transcripts of the video clips I still had to build a matching algorithm. Basically I had to match the Whisper transcript-chunks (usually a sentence long) to the transcripts I had from Notion/Supabase. I had to create a scoring system to avoid false positives i.e. matching the transcripts for common words like “the”, “and”, etc.
I also had to make sure the audio I was sending to Whisper’s API was close enough to the highlight. As I mentioned previously, the podcast timestamps could be off by a few minutes. So if I didn’t build in smart logic to consider which podcast I was clipping, I would transcribe the wrong part of the podcast episode and get no match.
Building a Video Feed
Figuring out the backend processes took a good amount of time but most of it was a battle of logic. I knew how the data was structured and understood the whole backend clipping process pretty well.
Developing a front end was almost a harder battle. I wanted my front end to be a scrollable video feed of insightful business content. And I wanted it to look pristine.
While I coded all of my backend in Cursor, I knew tools like Bolt, v0, and Lovable are better suited to produce sleek front end interfaces.
So in my Cursor project I prompted the LLM to create a design guideline prompt using everything it knew about my project as well as my own direction on how I wanted the frontend to look. Basically prompt-ception.
I then uploaded the prompts to the different platforms to see which one would produce the best outcome. The good thing about tools like Bolt aside from the great design outputs, is that they render the front end without you having to mess around in the terminal.
These tools have usage limits, so you have to be careful with how many times you prompt them. Which is why the original prompt is so important. After some edits I decided I liked Lovable’s design the best. So I downloaded the code to my laptop and placed it into my Cursor project.
I then had to connect the backend to the frontend but since I used a prompt that gave Lovable a lot of details about my project, a lot of the code was already compatible with my front end.
Even after this was done I still had to work on a lot of minor details that were important. Things like where the pause button was on the video player, making sure desktop/mobile feeds had different user interactions and having the videos auto-play on load. And these details were often hard to describe to an LLM. I used some tricks to combat this that you’ll read in the Tips & Tricks section below.
Now what
I’m looking to raise a pre-seed round at a 10x EBITDA of… just kidding.
I’ll probably keep making improvements here and there. I also use the videos to post on my LinkedIn and the Web Gems TikTok. And I send relevant videos to co-workers that speak to problems we deal with while on the grind.
I learned a lot going through the process. If you’re just getting into coding/development don’t expect to make something that’s going to make you money in a week or months. AI is really good but you’re still better off understanding some key computer science concepts.
Here’s the link.
Vibe Coding Tips & Tricks
- Learn the basics. There’s a lot of noise/content out there that vibe coding means you can create full SaaS products without any technical knowledge. That’s just not true. Prior to me starting this project I had worked on other minor projects and also work as a software product manager. This site has a great library of courses for people looking to get into vibe coding.
- Keep a Project History markdown file. Every time you login to your vibe coding application of choice and complete a session, ask it to generate a summary of what you worked on. The next time you login, you won’t have to explain the project all over again. You can just attach the file to the chat for context. You may say “well I’ll just keep using the same chat”, which leads to point #3.
- Don’t overload chats. LLMs have capped context windows. The more you chat with one LLM the more context it will need to work through and eventually performance will degrade.
- Learn to Context engineer. Per the last two points you really want to make sure you’re feeding the LLM with context about your project. Whether that be a Project History file, a PRD, or just giving it the relevant files in your project to work on, you want to always make sure that you’re giving the LLM enough context about what you’re trying to accomplish.
- Use MCPs. My Cursor project had an MCP directly with Supabase meaning LLMs could query it in real time which was super helpful when working on the clip generation. Otherwise I would have had to manually tell it what parts of the highlight weren’t processing. At one point I also used an MCP called Browser Tools that would allow the LLM to look at my browser’s console for debugging and even take screenshots of my browser. This was great for dealing with those slight front end bugs that I mentioned before.
- Use Wispr Flow or another voice to text app. Wispr Flow is a transcription app you can download to your desktop that allows you to program a key on your keyboard (’fn’ for me) so that when you hold it you can speak and output the text. I probably saved hours of time by not having to type every message to the LLM chatbots.
- Understand LLM’s aren’t perfect. I ran into several bugs that the LLMs couldn’t figure out. But they often presented solutions with a lot of confidence which would then lead me to changing my code and causing even more headaches. My recommendation is to not allow Cursor to publish major changes to your code without first getting an understanding of their solution. You can do this by creating a “Development Guidelines” markdown file that explicitly states not making code changes directly to files unless getting approval from the user. You can also just switch the Cursor LLM mode from “Agent” to “Ask” although I prefer being in Agent mode.
- Commit your code. As mentioned in the last step, you may end up with new bugs that LLMs create and then have to revert to your old files. By committing your code you’re basically keeping a history of your project that you can always revert back to. I use Vercel via Github to manage my deployments. (If you don’t know what this means the site I recommended in point 1 has some easy to digest videos)
- Leverage other LLMs to generate prompts for your coding LLM. I kicked off this project by just brainstorming in Chat GPT. After a while I had GPT create a prompt that I pasted into Cursor to kick off the project.
- Stay the course & block out the noise. My Twitter feed is filled with people claiming to have vibe coded a $25K MRR SaaS in just a weekend. For a while I believed that but after going through the process I’m highly skeptical. Celebrate your project wins and don’t get too frustrated when the LLMs can’t figure out the bug.