AI is no longer a niche topic reserved for data scientists. As full-stack developers, we now have incredible tools at our fingertips — and the OpenAI API is the most accessible gateway into that world. In this post I'll walk through exactly how I integrate GPT-4o into a Next.js 15 application, including streaming, error handling, and keeping costs under control.
Setting Up the API Route
Next.js App Router makes it trivial to create a server-side API endpoint. Create a file at app/api/chat/route.ts and use the OpenAI Node SDK. The key is to use streaming so the user sees tokens appear in real-time rather than waiting for the full response — this dramatically improves perceived performance. Use ReadableStream with a TransformStream to pipe tokens to the client as they arrive.
Streaming Responses to the Frontend
On the client side, use the fetch API with a reader on response.body. As each chunk arrives, decode it and append it to your state. React's useState works fine here, though for complex chat UIs I'd recommend useReducer. The result feels snappy and interactive — users see the AI 'think' in real time.
Keeping Costs Under Control
- Always set a max_tokens limit appropriate for your use case
- Cache repeated prompts using Redis or a simple in-memory store
- Use gpt-4o-mini for tasks that don't require the full model
- Monitor usage in the OpenAI dashboard and set hard spending limits
- Send only the last N messages in long conversations, not the full history
With these patterns in place I've shipped three production AI features in the last six months. The tooling has matured enormously — streaming, function calling, and structured outputs are all first-class citizens now. There has never been a better time to add AI to your web apps.