My girlfriend wanted to make me dinner one night. She had found a recipe on TikTok for a lemon pasta. She opened the app, went to her bookmarks tab, scrolled down, loaded a second set of videos, looked for the thumbnail she wanted, clicked it, started watching the video, then clicked the description of the video to actually see the recipe in a small corner of the screen.
I was amazed, in a bad way, about the way that an average person is collecting recipes these days. It was one thing to complain about the long prelude to a recipe, but the way this system is not searchable nor readable really bothered me from an engineering perspective. I prefer well-designed systems and the right tool for the job.
While TikTok might be good for discovering recipes, it does a bad job of actually cataloging them for use later. Additionally, I already built a website for collecting recipes in a way that used structured data to make my recipes manageable. As a good boyfriend, I volunteered myself to make her life easier by writing some code.
Structured Data
I have defined a JSON schema for my recipes already. Up until now, each recipe I’ve created was done by manually writing out a JSON or YAML file which fit the spec. This works great for me, but it can be intimidating to someone without the same experience. It also takes time.
I decided to use Gemini’s API and a bit of web scraping. Gemini’s API has a feature where you pass in a JSON schema and it gives you a JSON output matching that schema (most of the time).
// The JSON schema provided in the prompt.
const schema = {
type: "OBJECT",
properties: {
author: { type: "STRING" },
recipe: { type: "STRING" },
spec: { type: "STRING", enum: ["v0.1.2"] },
tags: { type: "STRING" },
servings: { type: "NUMBER" },
prelude: {
type: "OBJECT",
properties: { description: { type: "STRING" } }
},
steps: {
type: "ARRAY",
items: {
type: "OBJECT",
properties: {
description: { type: "STRING" },
ingredients: {
type: "ARRAY",
items: {
type: "OBJECT",
properties: {
item: { type: "STRING" },
amount: { type: "NUMBER" },
unit: { type: "STRING" }
}
}
}
}
}
},
metadata: {
type: "OBJECT",
properties: {
originalUrl: { type: "STRING" }
}
}
}
};
To fit into Gemini’s structured output, I turned it from a fully-fleshed out JSON schema to something close to it.
I can use web scraping to grab the text contents of the video, as it’s included on the webpage in a hidden component. That goes into the prompt along with my directions and the schema.
const prompt = `
Parse the following recipe description from a social media video and convert it into the provided JSON schema.
Infer the author from the video context if possible, otherwise use the channel name.
The recipe title should be descriptive. Set the spec to "v0.1.2".
Extract tags from the hashtags. Determine the number of servings.
The prelude description should be a brief, friendly intro.
List all ingredients under their corresponding first step where they are used.
Add the original video URL to the metadata.
Description:
---
${description}
---
Source URL: ${sourceUrl}
`;
const payload = {
contents: [{ parts: [{ text: prompt }] }],
generationConfig: {
responseMimeType: "application/json",
responseSchema: schema,
},
};
Then I just parse out the JSON and display it. Since I already have the existing UI for rendering this JSON out, it basically just works.
While this doesn’t look especially impressive, it’s neat this is all coming from a JSON object.
Saving
This is neat, but then what do I do with the content? Up to this point, I have just saved my data in GitHub. I have a git repo for my cookbook so my recipes can be saved with versioning and all the benefits of git. The URL https://git-recipes.web.app/g/fleker/personal-cookbook/crepe pulls in one of those recipes right from GitHub.
I did need to make an improvement here too, since Git is also not the most user-friendly to use. So I had to integrate Firestore for a simple database storage option. That also meant I had to integrate Firebase Auth so that people could sign-in and save recipes to their account.
async uploadRecipe() {
const recipe = this.parsedRecipe();
if (!recipe) return;
if (!this.auth.currentUser) {
alert('You must be logged in to upload a recipe.');
return;
}
const user = await this.auth.currentUser;
if (!user || !user.uid) {
alert('User not authenticated.');
return;
}
const userId = user.uid;
const recipeName = recipe.recipe.toLowerCase()
.replace(/\s+/g, '-') // Sanitize recipe name for document ID
.replace(/[&]/g, ''); // Sanitize recipe name for document ID
try {
const cookbookRef = this.firestore.collection('cookbooks').doc(userId).collection('_cookbooks').doc('default-cookbook')
const cookbookDoc = cookbookRef.get()
cookbookDoc.subscribe(async (val: any) => {
if (!val.exists || !val.data()) {
const recipeStarter = {
recipes: {
recipeName
},
collections: {
"All Items": {
recipes: [{
key: recipeName,
label: recipe.recipe
}]
}
}
}
await cookbookRef.set(recipeStarter)
} else {
const cookbook = val.data()
cookbook.recipes[recipeName] = recipeName
cookbook.collections['All Items'].recipes.push({
key: recipeName,
label: recipe.recipe
})
// Update our cookbook
await cookbookRef.update(cookbook)
}
})
const docRef = await this.firestore.collection('cookbooks').doc(userId).collection('default-cookbook').doc(recipeName).set({
text: JSON.stringify(recipe, null, 2), // Store the recipe JSON as a string
timestamp: Date.now()
});
alert('Recipe uploaded successfully!');
this.link.set(`${window.location.origin}/f/${userId}/default-cookbook/${recipeName}`)
return `/f/${userId}/default-cookbook/${recipeName}`
} catch (error) {
console.error('Error uploading recipe:', error);
alert('Failed to upload recipe.');
}
return;
}
If you go to https://git-recipes.web.app/f/kZP1PYAbhhT2d5nwMAD8ozNJtxP2/default-cookbook/honey-teriaki you can now pull from Firebase as the source instead. In theory, one could add additional sources apart from Firebase and GitHub.
What’s Next?
I think there’s a bit more work to do around the UX here. Everything is functional, but providing a bit more polish would be nice. The buttons could be cleaned up and there’s probably a bit more to do with improving the prompt and structure.
Additionally, I do support Instagram and TikTok as input sources. I haven’t really tested Instagram to the same extent nor any other source which might be useful as well.
I think testing in general is kinda hard. How do you create a deterministic way to test a technology which is inherently non-deterministic? For a chatbot it’s not too bad, but requiring the format to be JSON and properly parseable JSON creates a different challenge.
Gemini is adding improvements to this over time, so this transformation should only get better over time.
Cost can become factor. While I’ve shared code, I haven’t shared the URL for you to try yourself. I would hate for this side project to start racking up a lot of bills. A large tech company could spend a lot and take a cut, but I would need to set up a billing. Firestore does have its own costs, the Gemini API is even costlier. Maybe one day I’ll do that.
Here’s the source code:
Until then, I should probably get back to my dinner.
