Build An App for ChatGPT! Step By Step!

A Server! A frontend with React! Render dynamic contents! Get & set global parameters (tool input, output, etc)! Call Tools From App!

Full Demo On my GitHub!

Chat GPT just released this Apps SDK to build apps for ChatGPT!

Originally, I thought it is going to be another App provider like App Store or Google Play, but NOOOOOO!

It is just an iFrame rendered within the ChatGPT!

Yes, it is like calling something render within a Webview of an iOS App an app! (Sounds stupid but that all it is about! Seriously!)

Which means to create an App for ChatGPT,

Five Steps!

  1. Create a Regular MCP Server with some Tools to return some structured content
  2. Create some UI/UX. We will be using Vite with React here and bundle those, but you can also just write some HTML, JS, and CSS directly!
  3. Register our UI components (actually just some HTML) as MCP Server Resources.
  4. Modify tools to return components UI in addition to structured content with embedded resource
  5. Start the MCP Server and register it as an ChatGPT Connector

Simple! Right?

Let’s check it out! Step by step!

Before Started

Since how we deploy our App is by registering our MCP server as a ChatGPT Connector, we need developer mode.

Which means, free plan does not work!

  1. Make sure to pay OpenAI Some money to upgrade the plan! Yes, I just paid 22 (20 + tax) dollars to try this thing out! I will make the most out of this !!!!! 22 dollars!
  2. Toggle Settings > Apps & Connectors > Advanced > Developer mode in the ChatGPT client

Project Structure

First of all, as you might have realized from those steps I had above, we need both a server and a frontend. Here will be my folder structure.

.
├── server # MCP Server
│ ├── package.json
│ ├── src
│ └── tsconfig.json
└── web # Bundled UI
├── dist # Build output
├── package.json
├── src
└── tsconfig.json

I have left out couple config files here for the web because they depend on the framework you want to use.

Basic MCP Server

The MCP server is the foundation of the Apps SDK integration, responsible for the following.

  1. List tools: exposes tools that the model can call, including their JSON Schema input and output contracts and optional annotations
  2. Call tools: executes the call_tool request from the model (ChatGPT) and returns structured content the model can parse.
  3. Return components (UI): If we want ChatGPT to render some UI (which we do in this article) for the tool call, those tools will need to point to an embedded resource that represents the interface to render (inline) in the ChatGPT client.

The transport protocol COULD be Server-Sent Events, but Streamable HTTP is indeed recommended!

Let’s start with an MCP Server with the basic tool functionality, ie: return structured contents, we will then add the resource-related ones after we have created our frontend UI!

In this article, please allow me to assume that you are familiar with MCP, especially the Streamable HTTP protocol, if you need a little catch, please feel free to check out some of my previous articles!

Set up

Let’s start with adding the following dependencies after running npm init -y in the server folder!

npm install  @types/express typescript  @types/node --save-dev
npm install express @modelcontextprotocol/sdk

I will be leaving out the full package.json and tsconfig.json here, but you can just grab those from my GitHub!

MCP Server

Like what I have said, we are starting here with a regular MCP server with a simple get_pokemon tool to get some detail information for a specific pokemon!

import { McpServer} from '@modelcontextprotocol/sdk/server/mcp.js'
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js'
import express, { Request, Response } from 'express'
import { z } from 'zod'
const PORT = 3000
/****************************/
/******** MCP Server ********/
/****************************/
const server = new McpServer({
name: 'pokemon-server',
version: '1.0.0',
}, {
capabilities: {},
})
// Add list pokemons tool
server.registerTool(
'get_pokemon',
{
title: 'Get Pokemon',
description: 'Get detail infomation of a pokemon.',
inputSchema: { name: z.string({ description: "The name of the pokemon to get detail for." }) },
outputSchema: {
result: z.object({
name: z.string({ description: "Pokemon name." }),
id: z.number({ description: "Pokemon index id." }).int(),
height: z.number({ description: "Pokemon height." }).int(),
weight: z.number({ description: "Pokemon weight." }).int(),
types: z.array(
z.object({
slot: z.number().int(),
type: z.object({
name: z.string({ description: "type name." }),
url: z.string({ description: "URL to get type detail." }).url(),
})
}).passthrough()
),
sprites: z.object({
back_default: z.string({ description: "URL to get type back_default image." }).url().nullable(),
back_female: z.string({ description: "URL to get type back_female image." }).url().nullable(),
back_shiny: z.string({ description: "URL to get type back_shiny image." }).url().nullable(),
back_shiny_female: z.string({ description: "URL to get type back_shiny_female image." }).url().nullable(),
front_default: z.string({ description: "URL to get type front_default image." }).url().nullable(),
front_female: z.string({ description: "URL to get type front_female image." }).url().nullable(),
front_shiny: z.string({ description: "URL to get type front_shiny image." }).url().nullable(),
front_shiny_female: z.string({ description: "URL to get type front_shiny_female image." }).url().nullable(),
}, { description: "URLs to get pokmeon images." }).passthrough()
}).passthrough()
}
},
async ({ name }) => {
if (name.length == 0) {
throw new Error("Pokemon name cannot be empty.")
}
const response = await fetch(`https://pokeapi.co/api/v2/pokemon/${name}`)
if (!response.ok) {
throw new Error(`HTTP error. status: ${response.status}`)
}
const json = await response.json()
const structuredContent = {
result: json
}
return {
content: [
{ type: 'text', text: JSON.stringify(structuredContent) },
],
structuredContent: structuredContent
}
}
)
/********************************/
/******** Express Server ********/
/********************************/
// Set up Express and HTTP transport
const app = express()
app.use(express.json())
app.post('/mcp', async (req: Request, res: Response) => {
// Create a new transport for each request to prevent request ID collisions
const transport = new StreamableHTTPServerTransport({
// stateless mode
// for stateful mode:
// (https://levelup.gitconnected.com/mcp-server-and-client-with-sse-the-new-streamable-http-d860850d9d9d)
// 1. use sessionIdGenerator: () => randomUUID()
// 2. save the generated ID: const sessionId = transport.sessionId and the corresponding transport
// 3. try retrieve the session id with req.header["mcp-session-id"] for incoming request
// 4. If session id is defined and there is an existing transport, use the transport instead of creating a new one.
sessionIdGenerator: undefined,
// to use Streamable HTTP instead of SSE
enableJsonResponse: true
})
res.on('close', () => {
transport.close()
})
await server.connect(transport)
await transport.handleRequest(req, res, req.body)
})
app.listen(PORT, () => {
console.log(`Pokemon MCP Server running on http://localhost:${PORT}/mcp`)
}).on('error', error => {
console.error('Server error:', error)
process.exit(1)
})

Two points here!

  1. It is really important to define an outputSchema if we are planning on using those information within our components UI (and we are) so that what we returned can be verified.
  2. The text content should be the JSON.stringified version of the structuredContent. The reason for this actually have something to do with the UI Parts as well so please let me leave it out for now and come back to it in couple seconds!

Confirm Schema

Before we move onto some frontend stuff, let’s quickly confirm that what we are returning as the structuredContent indeed confirm to what we have declared for the outputSchema!

If you have never get a chance to use this MCP Inspector provided officially, I strongly recommended!

  1. Start the express server. If you are using the package.json from my GitHub, npm run dev.
  2. Run npx @modelcontextprotocol/inspector. This will automatically start a local server for inspection and open the following page in the browser.

Choose Streamable HTTP for Transport Type, Enter the URL, and Connect!

Under the Tools tab, choose List Tools, select the get_pokemon tool, enter a name, and confirm that the Tool Result is Success!

If our structuredContent does not match our outputSchema , this will tell us exactly where the problem occurs!

Confirmed?

Time to React!

Window.OpenAI API

I am sorry, but I just lied!

We are not ready for some UI yet!

Because!

Before that, I would like to take a second to look at this window.openai.

This is the bridge between our frontend and ChatGPT, we use it to get agent-related states such as the tool input, tool output, persist component state, trigger server actions, and more!

Let’s take a look at what we can do with it one by one!

Starting with the definition!

Oh, by the way, please Copy and Paste the code we have in this section (some are provided by OpenAI on their GitHub., some are written by me wrapping around the provided ones) to your web folder! Or at least the ones you need!

Yes! OpenAI is planning on creating a public package for it, but NOT YET!

window.openai Definition

Starting with how this window.openai is defined, because what we have for the rest of this section is just how to make use of this OpenAiGlobals!

/**
* Global oai object injected by the web sandbox for communicating with chatgpt host page.
*/
declare global {
interface Window {
openai: API & OpenAiGlobals;
}
interface WindowEventMap {
[SET_GLOBALS_EVENT_TYPE]: SetGlobalsEvent;
}
}
export type OpenAiGlobals<
ToolInput = UnknownObject,
ToolOutput = UnknownObject,
ToolResponseMetadata = UnknownObject,
WidgetState = UnknownObject
> = {
// visuals
theme: Theme;
userAgent: UserAgent;
locale: string;
// layout
maxHeight: number;
displayMode: DisplayMode;
safeArea: SafeArea;
// state
toolInput: ToolInput;
toolOutput: ToolOutput | null;
toolResponseMetadata: ToolResponseMetadata | null;
widgetState: WidgetState | null;
setWidgetState: (state: WidgetState) => Promise<void>;
};
// currently copied from types.ts in chatgpt/web-sandbox.
// Will eventually use a public package.
type API = {
callTool: CallTool;
sendFollowUpMessage: (args: { prompt: string }) => Promise<void>;
openExternal(payload: { href: string }): void;
// Layout controls
requestDisplayMode: RequestDisplayMode;
};
export type UnknownObject = Record<string, unknown>;
export type Theme = "light" | "dark";
export type SafeAreaInsets = {
top: number;
bottom: number;
left: number;
right: number;
};
export type SafeArea = {
insets: SafeAreaInsets;
};
export type DeviceType = "mobile" | "tablet" | "desktop" | "unknown";
export type UserAgent = {
device: { type: DeviceType };
capabilities: {
hover: boolean;
touch: boolean;
};
};
/** Display mode */
export type DisplayMode = "pip" | "inline" | "fullscreen";
export type RequestDisplayMode = (args: { mode: DisplayMode }) => Promise<{
/**
* The granted display mode. The host may reject the request.
* For mobile, PiP is always coerced to fullscreen.
*/
mode: DisplayMode;
}>;
export type CallToolResponse = {
// result: the string (text) content return by the tool using { type: 'text', text: JSON.stringify(structuredContent) },
result: string;
};
/** Calling APIs */
export type CallTool = (
name: string,
args: Record<string, unknown>
) => Promise<CallToolResponse>;
/** Extra events */
export const SET_GLOBALS_EVENT_TYPE = "openai:set_globals";
export class SetGlobalsEvent extends CustomEvent<{
globals: Partial<OpenAiGlobals>;
}> {
readonly type = SET_GLOBALS_EVENT_TYPE;
}

This type of structure on defining a global object on window should look pretty familiar if you have ever created a custom Plugin Provider for Web App for other developers to call on! (If you have not but interested in creating one, please feel free to check out my article on Create Custom Plugin Provider (Javascript API) for Web App!)

Now, here is a little note I would like to make on this CallToolResponse type. In the provided definition, it only contains a result string corresponding to the content return by the tool using { type: 'text', text: JSON.stringify(structuredContent) } which is why I have recommended to set the text content to be the JSON.stringified version of the structuredContent, but actually, all the following key will be included.

  1. content: the same content array we returned from the server
  2. structuredContent: again, that same structuredContent conforms to our outputSchema
  3. isError: like the name suggests, a boolean indicating whether the tool call succeed or not.
  4. meta: the _meta returned from the server tool call. We will take a more detailed look at this in couple seconds when registering resources, but this actually corresponds to the toolResponseMetadata in the OpenAiGlobals that we can use to pass data that should not influence the model’s reasoning but needed for rendering UIs, like the full set of locations that backs a dropdown.
  5. _meta : you think this will be the field containing the the _meta returned from the server? Not based on my testing! I am not sure if it is a bug or not, but for now, I have to be honest here. I have NO clue on what is this field for, because it is always null for me!

useOpenAiGlobal

This is a hook provided (code-wise, not package-wise yet…) by OpenAI listening for host openai:set_globals events and lets React components subscribe to a single global value.

export function useOpenAiGlobal<K extends keyof OpenAiGlobals>(
key: K
): OpenAiGlobals[K] {
return useSyncExternalStore(
(onChange) => {
const handleSetGlobal = (event: SetGlobalsEvent) => {
const value = event.detail.globals[key];
if (value === undefined) {
return;
}
onChange();
};
window.addEventListener(SET_GLOBALS_EVENT_TYPE, handleSetGlobal, {
passive: true,
});
return () => {
window.removeEventListener(SET_GLOBALS_EVENT_TYPE, handleSetGlobal);
};
},
() => window.openai[key]
);
}

For example, we can use it to read tool input, output, and metadata.

export function useToolInput() {
return useOpenAiGlobal('toolInput')
}
export function useToolOutput() {
return useOpenAiGlobal('toolOutput')
}
export function useToolResponseMetadata() {
return useOpenAiGlobal('toolResponseMetadata')
}

Or adding in a little typing for our own use case here!

import { useOpenAiGlobal } from "./use-openai-global";
import { z } from 'zod'
export function useToolInput(): InputSchema | null {
const input = useOpenAiGlobal('toolInput')
if (input === null) {
return null
}
try {
return InputSchema.parse(input)
} catch (error) {
console.error(error)
return null
}
}
export const InputSchema = z.object({ name: z.string().nonempty() })
export type InputSchema = z.infer<typeof InputSchema>
import { useOpenAiGlobal } from "./use-openai-global";
import { z } from 'zod'
export function useToolOutput(): StructuredOutput | null {
const output = useOpenAiGlobal('toolOutput')
if (output === null) {
return null
}
try {
return StructuredOutput.parse(output)
} catch (error) {
console.error(error)
return null
}
}
export const StructuredOutput = z.object({
result: z.object({
name: z.string(),
id: z.number().int(),
height: z.number().int(),
weight: z.number().int(),
types: z.array(
z.object({
slot: z.number().int(),
type: z.object({
name: z.string(),
url: z.url(),
})
}).loose()
),
sprites: z.object({
back_default: z.url().nullable(),
back_female: z.url().nullable(),
back_shiny: z.url().nullable(),
back_shiny_female: z.url().nullable(),
front_default: z.url().nullable(),
front_female: z.url().nullable(),
front_shiny: z.url().nullable(),
front_shiny_female: z.url().nullable(),
}).loose()
}).loose()
})
export type StructuredOutput = z.infer<typeof StructuredOutput>

(You might wonder why does my zod notation look so different from the server side. That’s because the zod version on the server is fairly low, enforced by the modelcontextprotocol/sdk…)

setOpenAiGlobal

This is how I called it because OpenAI didn’t mention a single word on it! I don’t know the reason for that but I decided to make it myself!

And unfortunately, it is not as simple as just setting the key on that window.openai object!

I meant, it is but in that case, our useOpenAiGlobal hook will never be invoked, which means our UI will never be updated!

export function setOpenAIGlobal<K extends keyof OpenAiGlobals>(
key: K,
value: OpenAiGlobals[K] | null
) {
if (window.openai !== null && window.openai !== undefined) {
window.openai[key] = value as any
const event = new SetGlobalsEvent(SET_GLOBALS_EVENT_TYPE, {
detail: {
globals: {
[key]: value
}
}
})
window.dispatchEvent(event)
}
}

Yes! Dispatching that SetGlobalsEvent manually is the key!

Because, as we can see from the useOpenAiGlobal definition, that’s when the onChange triggered!

Widget State

Widget state can be used for persisting component state, as well as expose context to ChatGPT. Anything we pass to setWidgetState will be shown to the model, and hydrated into window.openai.widgetState.

Here is the helper hooks provided to keep host-persisted widget state aligned with local React state.

import { useCallback, useEffect, useState, type SetStateAction } from "react";
import { useOpenAiGlobal } from "./use-openai-global";
import type { UnknownObject } from "./types";
export function useWidgetState<T extends UnknownObject>(
defaultState: T | (() => T)
): readonly [T, (state: SetStateAction<T>) => void];
export function useWidgetState<T extends UnknownObject>(
defaultState?: T | (() => T | null) | null
): readonly [T | null, (state: SetStateAction<T | null>) => void];
export function useWidgetState<T extends UnknownObject>(
defaultState?: T | (() => T | null) | null
): readonly [T | null, (state: SetStateAction<T | null>) => void] {
const widgetStateFromWindow = useOpenAiGlobal("widgetState") as T;
const [widgetState, _setWidgetState] = useState<T | null>(() => {
if (widgetStateFromWindow != null) {
return widgetStateFromWindow;
}
return typeof defaultState === "function"
? defaultState()
: defaultState ?? null;
});
useEffect(() => {
_setWidgetState(widgetStateFromWindow);
}, [widgetStateFromWindow]);
const setWidgetState = useCallback(
(state: SetStateAction<T | null>) => {
_setWidgetState((prevState) => {
const newState = typeof state === "function" ? state(prevState) : state;
if (newState != null) {
window.openai.setWidgetState(newState);
}
return newState;
});
},
[window.openai.setWidgetState]
);
return [widgetState, setWidgetState] as const;
}

Note that currently everything passed to setWidgetState is shown to the model. For the best performance, it’s recommended to keep this payload small, and to not exceed more than 4k tokens.

Direct MCP Tool Calls

We can use this window.openai.callTool to make direct MCP tool calls. This can be useful for refreshing data, fetching pokemon information for another pokemon in our example, and more!

export async function getPokemon(name: string) {
const toolInput: InputSchema = { name }
setOpenAIGlobal("toolInput", toolInput)
setOpenAIGlobal("toolOutput", null)
const response = await window.openai?.callTool("get_pokemon", { name: name });
if ("structuredContent" in response) {
setOpenAIGlobal("toolOutput", response["structuredContent"] as any)
} else {
const jsonResult = JSON.parse(response.result)
setOpenAIGlobal("toolOutput", jsonResult)
}
}

Two notes here.

  1. window.openai.callTool will not automatically update any states (any key defined in OpenAiGlobals) for us. We have to do it manually!
  2. The tool we are calling here needs to be marked as able to be initiated by the component, as we will do in couple seconds while registering resources and embedding those!

Send follow up conversations

We can use window.openai.sendFollowupMessage to insert a message into the conversation as if the user asked it.

await window.openai?.sendFollowupMessage({
prompt: "some user message.",
});

Request Alternate Layouts

For example, if our component UI needs more space such as maps or tables, we can request for it using window.openai.requestDisplayMode.

await window.openai?.requestDisplayMode({ mode: "fullscreen" })

Use host-backed navigation

Skybridge (the sandbox runtime) mirrors the iframe’s history into ChatGPT’s UI, which means we can use standard routing APIs such as React Router and the host will keep navigation controls in sync with our component.

To set up the routing with BrowserRouter

export default function PizzaListRouter() {
return (
<BrowserRouter>
<Routes>
<Route path="/" element={<PizzaListApp />}>
<Route path="place/:placeId" element={<PizzaListApp />} />
</Route>
</Routes>
</BrowserRouter>
);
}

And to perform programmatic navigation, we can just use the useNavigate hook.

const navigate = useNavigate();
function openDetails(placeId: string) {
navigate(`place/${placeId}`, { replace: false });
}
function closeDetails() {
navigate("..", { replace: true });
}

End of window.openai!

Finally, some UI!

Component UI/UX

As I have mentioned, I will be using Vite with React here, styling with tailwind, but you can choose whatever you like!

What is important here is that

The entry file should mount a component into a root element (which is how React works) and read initial data from window.openai.toolOutput or persisted state.

Code for UI

This is not the important part , so let me just dump some code at you using those hooks we had above, reading tool outputs, watching for changes, and calling some tools ourselves!

main.tsx

import { createRoot } from 'react-dom/client'
import App from './App.tsx'
import './index.css'
createRoot(document.getElementById('container')!).render(<App />)

App.tsx

import { useMemo, useState } from 'react'
import { InputSchema, useToolInput } from './helpers/use-tool-input'
import { StructuredOutput, useToolOutput } from './helpers/use-tool-output'
import PokemonCard from './Card'
import useEmblaCarousel from "embla-carousel-react"
import { getPokemon } from './helpers/fetch-pokemon'
const recommended = ["pikachu", "bulbasaur", "charmander", "squirtle"]
function App() {
const toolInput: InputSchema | null = useToolInput()
const toolOutput: StructuredOutput | null = useToolOutput()
const [emblaRef, _emblaApi] = useEmblaCarousel({
align: "center",
loop: false,
containScroll: "trimSnaps",
slidesToScroll: "auto",
dragFree: false,
})
const [name, setName] = useState("")
const [error, setError] = useState("")
const isLoading: boolean = useMemo(() => {
return toolOutput === null && error === ""
}, [toolOutput, error])
const sprites: { title: string, url: string | null }[] | null = useMemo(() => {
const sprites = toolOutput?.result.sprites
if (!sprites) {
return null
}
const array = Object.keys(sprites).map(function (title) {
let url: string | null = sprites[title] as string | null
return {
title: title,
url: url
}
})
return array
}, [toolOutput])
async function getPokemonHelper(name: string) {
setError("")
try {
await getPokemon(name)
} catch (error) {
console.error(error)
setError("Oops, something went wrong! Please check try again later!")
}
}
return (
<div className='flex flex-col gap-4 bg-amber-100 border rounded-md border-amber-700 text-black p-4' >
<div className='flex flex-col gap-2'>
<h2 className="font-semibold">Recommended Pokemons</h2>
<div className="antialiased relative w-full flex flex-row gap-2">
{recommended.map((pokemon) => (
<button
className="border rounded-md border-black bg-white py-1 px-2 disabled:opacity-50 disabled:cursor-not-allowed"
disabled={isLoading}
onClick={async () => await getPokemonHelper(pokemon)}
key={pokemon}
>{pokemon == "pikachu" ? `⭐ ${pokemon} ⭐` : pokemon}</button>
))}
</div>
</div>
<div className='flex flex-row gap-2 items-center'>
<h2 className="font-semibold">Search</h2>
<input onChange={(event) => setName(event.target.value)} className='py-1 px-2 border rounded-md' />
<button
className="border rounded-md border-black bg-white py-1 px-2 disabled:opacity-50 disabled:cursor-not-allowed"
disabled={isLoading}
onClick={async () => {
await getPokemonHelper(name)
setName("")
}}>Go</button>
</div>
<div className='flex flex-col gap-2'>
<h2 className="font-semibold">Pokemon: {toolInput?.name}</h2>
{
toolOutput === null ?
error !== "" ? <div className="text-red-400">{error}</div> : <div className='text-gray-400'>Loading...</div> :
<div className='flex flex-col gap-1 text-sm'>
<p>Id: {toolOutput.result.id}</p>
<p>Height: {toolOutput.result.height}</p>
<p>Weight: {toolOutput.result.weight}</p>
<p>Type: {toolOutput.result.types.map((t) => t.type.name).join(", ")}</p>
</div>
}
{
(sprites !== null && sprites.length > 0) ?
<div className="overflow-hidden" ref={emblaRef}>
<div className="flex flex-row gap-2 max-sm:mx-5 items-stretch">
{sprites?.map((sprite) => (
<PokemonCard key={sprite.title} url={sprite.url} title={sprite.title} />
))}
</div>
</div>
: null
}
</div>
</div>
)
}
export default App

Card.tsx

export type PokemonCardProps = {
title: string,
url: string | null
}
export default function PokemonCard({ title, url }: PokemonCardProps) {
if (url === null || url === undefined || typeof (url) !== "string" || url.length === 0) return null
return (
<div className="min-w-[160px] select-none max-w-[160px] w-[40vw] sm:w-[160px] self-stretch flex flex-col">
<div className="w-full">
<img
src={url}
alt={title}
className="w-full aspect-square rounded-2xl object-cover ring ring-black/5 shadow-[1px_2px_6px_rgba(0,0,0,0.06)]"
/>
</div>
<div className="mt-3 flex flex-col flex-auto">
<div className="text-base font-medium truncate line-clamp-1">{title}</div>
</div>
</div>
)
}

Test UI Locally

Obviously, we are not embedded in the OpenAI yet so we don’t have that window.openai available, but what if we want to test it locally?

Inject some values ourselves!

If you are using Vite, you should already have an index.html created at the root folder. Update it to something like the following. For the result, just copy and paste one of those API responses from pokeAPI, for example, this one for pikachu.

<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>pokemon-widget</title>
<script>
window.openai = {
"toolOutput": {
"result": {...}
},
"toolInput": "pikachu"
}
</script>
</head>
<body>
<div id="container"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

We can then run npm run dev to check it out!

Bundle

Now we are done writing our React component, it is time to build it into a single JavaScript/CSS module that the server can render inline, with an HTML as the entry point.

What do I meant by that?

Or what do we actually need here so that we can register it as a MCP Server Resources?

Since the ChatGPT renders our UI in an iframe using the the skybridge sandbox runtime, we have to provide

An HTML resource whose mimeType is text/html+skybridge and whose body loads the compiled JS/CSS bundle.

Yes, that is we cannot use tsc -b && vite build to build our project and just use the index.html generated in the dist , because it uses relative references such as /assets/index-D-XA_-fY.css. (You could use absolute URLs though.)

I meant, technically speaking, we COULD use the the JS and CSS generated and compose the HTML (which are just some string) on the server side, but as we can see here, the name for those files are so random that we will have to change some code every time!

So!

To build it, here are couple options we have.

Directly running esbuild command.

esbuild src/main.tsx --bundle --format=esm --outfile=dist/index.js

This will bundle the JS and CSS for us. We can then create the HTML on our server side like following.

const JS_PATH = join(__dirname, "..", "..", "web/dist/index.js")
const JS_FILE = readFileSync(JS_PATH, "utf8");
const CSS_PATH = join(__dirname, "..", "..", "web/dist/index.css")
const CSS_FILE = readFileSync(CSS_PATH, "utf8");
const HTML = [
"<!doctype html>",
"<html>",
`<head><style>${CSS_FILE}</style></head>`,
"<body>",
` <script type="module">${JS_FILE}</script>`,
` <div id="container"></div>`,
"</body>",
"</html>",
].join("\n");

Custom Build script to run Vite build

Build script: build.mts

import { build, type InlineConfig, type Plugin } from "vite"
import react from "@vitejs/plugin-react"
import fg from "fast-glob"
import path from "path"
import fs from "fs"
import tailwindcss from "@tailwindcss/vite"
const htmlName = "index.html"
const htmlRootName = "container"
const entryFileName = "src/main.tsx"
const outDir = "dist"
const PER_ENTRY_CSS_GLOB = "**/*.{css,pcss,scss,sass}"
const PER_ENTRY_CSS_IGNORE = "**/*.module.*".split(",").map((s) => s.trim())
const GLOBAL_CSS_LIST = [path.resolve("src/index.css")]
function wrapEntryPlugin(
virtualId: string,
entryFile: string,
cssPaths: string[]
): Plugin {
return {
name: `virtual-entry-wrapper:${entryFile}`,
resolveId(id) {
if (id === virtualId) return id
},
load(id) {
if (id !== virtualId) {
return null
}
const cssImports = cssPaths
.map((css) => `import ${JSON.stringify(css)}`)
.join("\n")
return `
${cssImports}
export * from ${JSON.stringify(entryFile)}
import * as __entry from ${JSON.stringify(entryFile)}
export default (__entry.default ?? __entry.App)
import ${JSON.stringify(entryFile)}
`
},
}
}
fs.rmSync(outDir, { recursive: true, force: true })
fs.mkdirSync(outDir, { recursive: true })
const builtName = path.basename(path.dirname(entryFileName))
const entryAbs = path.resolve(entryFileName)
const entryDir = path.dirname(entryAbs)
// Global CSS (Tailwind, etc.), only include those that exist
const globalCss = GLOBAL_CSS_LIST.filter((p) => fs.existsSync(p))
const perEntryCss = fg.sync(PER_ENTRY_CSS_GLOB, {
cwd: entryDir,
absolute: true,
dot: false,
ignore: PER_ENTRY_CSS_IGNORE,
}).filter((p) => !globalCss.includes(p))
// Final CSS list (global first for predictable cascade)
const cssToInclude = [...globalCss, ...perEntryCss].filter((p) =>
fs.existsSync(p)
)
const virtualId = `\0virtual-entry:${entryAbs}`
const createConfig = (): InlineConfig => ({
plugins: [
wrapEntryPlugin(virtualId, entryAbs, cssToInclude),
tailwindcss(),
react(),
{
name: "remove-manual-chunks",
outputOptions(options) {
if ("manualChunks" in options) {
delete (options as any).manualChunks
}
return options
},
},
],
esbuild: {
jsx: "automatic",
jsxImportSource: "react",
target: "es2022",
},
build: {
target: "es2022",
outDir,
emptyOutDir: false,
chunkSizeWarningLimit: 2000,
minify: "esbuild",
cssCodeSplit: false,
rollupOptions: {
input: virtualId,
output: {
format: "es",
entryFileNames: `${builtName}.js`,
inlineDynamicImports: true,
assetFileNames: (info) => {
const name = info.names[0]
const modified = (name || "").endsWith(".css")
? `${name}`
: `[name]-[hash][extname]`
return modified
}
},
preserveEntrySignatures: "allow-extension",
treeshake: true,
},
},
})
console.log(`Building ${builtName} (react)`)
await build(createConfig())
console.log(`Built ${builtName}`)
const htmlPath = path.join(outDir, htmlName)
// css get renamed sometimes
const cssPaths = fg.sync(`${outDir}/**/*.css`)
const jsPaths = fg.sync(`${outDir}/**/*.js`)
let cssBlock: string = ""
for (const cssPath of cssPaths) {
const css = fs.existsSync(cssPath)
? fs.readFileSync(cssPath, { encoding: "utf8" })
: ""
cssBlock = cssBlock + css ? `\n <style>\n${css}\n </style>\n` : ""
}
let jsBlock: string = ""
for (const jsPath of jsPaths) {
const js = fs.existsSync(jsPath)
? fs.readFileSync(jsPath, { encoding: "utf8" })
: ""
jsBlock = jsBlock + js ? `\n <script type="module">\n${js}\n </script>` : ""
}
const html = [
"<!doctype html>",
"<html>",
`<head>${cssBlock}</head>`,
"<body>",
` <div id="${htmlRootName}"></div>${jsBlock}`,
"</body>",
"</html>",
].join("\n")
fs.writeFileSync(htmlPath, html, { encoding: "utf8" })
console.log(`${htmlPath} (generated)`)

To build it, we can then run tsx ./build.mts. This will also generated the HTML for us to use so that we can just read it in on our server side directly.

const HTML_PATH = join(__dirname, "..", "..", "web/dist/index.html")
const HTML = readFileSync(HTML_PATH, "utf8");

Since I am using Vite here, I went for this approach, but it actually uses esbuild underwood anyway!

Regardless of which apporach you take, make sure the id used in HTML is the same as that we have mounted our React App onto! In my case container!

By the way if you are running into Error: Cannot find module ‘@rollup/rollup-darwin-arm64’ while building, simply clean everything and reinstall those!

rm -rf node_modules       
rm package-lock.json
npm install

Register & Use UI Components on MCP Server

Last bit before we can deploy our Chat GPT App!

Two things we need to do here!

  1. Register our UI components (that HTML) as an MCP Server Resource.
  2. Modify tools to return components UI in addition to structured content with embedded resource

First thing first, let’s register our resource!

const VERSION = "1.0.0"
const BASE_RESOURCE_URI = "ui://widget/pokemon-board.html"
const RESOURCE_URL = `${BASE_RESOURCE_URI}?version=${VERSION}`
const RESOURCE_MIME_TYPE = "text/html+skybridge"
const HTML_PATH = join(__dirname, "..", "..", "web/dist/index.html")
const HTML = readFileSync(HTML_PATH, "utf8")
server.registerResource(
"pokemon-widget",
RESOURCE_URL,
{},
async () => ({
contents: [
{
uri: RESOURCE_URL,
mimeType: RESOURCE_MIME_TYPE,
text: HTML,
},
],
})
)

Three things here.

  1. mimeType: text/html+skybridge for the sandbox runtime
  2. resource URI: I have included a version here because ChatGPT caches templates aggressively, so unique URIs can help us preventing stale assets from loading.
  3. No inline data assignment. We are not setting anything related to that window.openai! Host, ie: ChatGPT, will inject the data for us!

We can then link our tool to the template by setting _meta["openai/outputTemplate"] to the same resource URI.

// Add list pokemons tool
server.registerTool(
'get_pokemon',
{
// ...
_meta: {
"openai/outputTemplate": RESOURCE_URL,
"openai/toolInvocation/invoking": "Invoking...",
"openai/toolInvocation/invoked": "Invoked!",
// Allow component-initiated tool access: https://developers.openai.com/apps-sdk/build/mcp-server#%23%23allow-component-initiated-tool-access
"openai/widgetAccessible": true
},
// ...
},
async ({ name }) => {
if (name.length == 0) {
throw new Error("Pokemon name cannot be empty.")
}
const response = await fetch(`https://pokeapi.co/api/v2/pokemon/${name}`)
if (!response.ok) {
throw new Error(`HTTP error. status: ${response.status}`)
}
const json = await response.json()
const structuredContent = {
result: json
}
return {
// ...
// The _meta property/parameter is reserved by MCP to allow clients and servers to attach additional metadata to their interactions.
// This allows us to define Arbitrary JSON passed only to the component.
// Use it for data that should not influence the model’s reasoning, like the full set of locations that backs a dropdown.
// // _meta is never shown to the model.
// _meta: {
// "key": "value"
// }
}
}
)

Here I have also set openai/widgetAccessible to true so that our tool can be invoked from our frontend! There are also other optional _meta fields that let us declare properties such as security schemes and etc.

And I as have mentioned a little bit above, for the data our tool returns, in addition to structuredContent and content , we can also use this _meta field to pass data to the component without exposing it to the model.

Deploy & Connect!

Yes, we are here!

To connect our MCP server to ChatGPT, we need a stable HTTPS endpoint!

Here are some of the common deployment options.

  • Managed containers: Fly.io, Render, or Railway for quick spin-up and automatic TLS.
  • Cloud serverless: Lambda, Google Cloud Run or Azure Container Apps if we need scale-to-zero. Whichever you choose, that cold start can interrupt streaming HTTP!
  • Kubernetes

But!

For testing, a local server with ngrok is more than enough!

So!

If your server is not started yet, get it running and run ngrok http 3000 assuming you are listening to PORT 3000!

You should see some like following. That Forwarding HTTPS is what we will be using!

Connect From ChatGPT

Now, let’s head to ChatGPT Console to create our connector for our little MCP Server!

Settings > Apps & Connectors > Create

Enter a name and a description you like.

For the MCP server URL, make sure to include the /mcp path, or whatever path you have for handling MCP post requests! Here I will not have any authentication involved, but of course, you can authenticate the users with OAuth2.0!

Choose Create!

If the connection succeeds we will see a list of the tools our server advertises. If it fails, use the Testing guide to debug with MCP Inspector or the API Playground.

Call!

We could just prompt ChatGPT as we always do (I don’t use AI so not what I always do..but!), but we can also enforce it by clicking on the + button near the message composer and select the tool.

Let’s go!

Yeah!!!

We can also test it with other clients

  • API Playground: visit https://platform.openai.com/playground, open Tools → Add → MCP Server, and paste the same HTTPS endpoint. This is useful when we want raw request/response logs.
  • Mobile clients: once the connector is linked on web it is available on ChatGPT mobile apps as well. Automatically! And we can use it to test mobile layouts if our component has custom controls.

Also, if we decide to make any code modifications

  1. Rebuild the component bundle (npm run build) if it is frontend.
  2. Restart the MCP server.
  3. Refresh the connector in ChatGPT settings to pull the latest metadata.

Thank you for reading!

That’s it for this article!

Again, feel free to grab everything from my GitHub!

I don’t know about you but while I was making this, I started wondering if I can just turn any web service into a ChatGPT App by simply throwing in another Iframe within it!

I guess if your service support CORS to allow that, it might work, but obviously, most do not!

Why cannot we just read in the URL content and register it directly with the server as the resource? Because it is probably not bundle or is using some relative Javascript or CSS for rendering!

Anyway!

Happy App making!

Leave a Reply