Step1: ChatGPT prompt to get prompt to be fed to codex
example of prompt to give to chatgpt codex so to create a data pipeline where each service is a spring boot application and order of devices are dataingestion-service , data-deduplication-service, data-quality-service, data-storage-service and each service calls the corresponding fast api exposed by corresponding python application that uses llm to address following usecases for each service data classification between PII data and non-PII data, data-deduplication, imputation using median for data quality, storage in DB correspondingly
The above creates a prompt as below:
You are a senior full-stack engineer. Generate a COMPLETE, RUNNABLE mono-repo implementing the following architecture. Output ALL files inline under proper code fences, grouped by a file tree first.
# Goal
Build a 4-stage data pipeline where each stage is a Spring Boot 3 (Java 17) microservice that calls a corresponding Python FastAPI service which uses an LLM:
1) dataingestion-service → calls ingestion_api → LLM for data classification (PII vs non-PII)
2) data-deduplication-service → calls dedup_api → LLM/embeddings for semantic deduplication
3) data-quality-service → calls quality_api → median imputation for missing numeric fields
4) data-storage-service → calls storage_api → persists to a relational DB
# Tech requirements
- Java: Spring Boot 3.x, Java 17, Gradle or Maven (pick Maven), use `spring-boot-starter-web`, `spring-boot-starter-validation`, WebClient (reactive) or RestTemplate (your choice; prefer WebClient).
- Python: FastAPI + uvicorn. For LLM use OpenAI SDK (new >=1.0.0). Read `OPENAI_API_KEY` from env; NEVER hardcode keys.
- Dedup: embeddings + cosine similarity (OpenAI text-embedding-3-small) + threshold param with default 0.9.
- Imputation: compute column-wise median over numeric fields; return an imputation report.
- DB: Postgres running in Docker Compose. Storage API writes normalized records to table `records(id TEXT PRIMARY KEY, payload JSONB, created_at TIMESTAMPTZ)`.
- Packaging: Dockerfiles for all 8 services (4 Java, 4 Python), docker-compose.yml to run end-to-end.
- Tests: Minimal unit tests per Python service; Postman/HTTPie/cURL examples; a smoke test script.
# Data contracts (shared)
Define a common JSON record for requests flowing through all stages:
```json
{
"records": [
{
"id": "string-unique-id",
"data": { "any": "JSON fields" }
}
],
"options": {
"dedup_threshold": 0.9,
"impute_numeric_fields": ["amount","score"]
}
}
Copy the link for above prompt as:
Step2: Paste the same in Codex under the Environment:
Copy the following content in prompt defining to use above copied link as prompt to create application in the reposiroty already created in github
The above/Codex app will run and will create the application showing the code changes