When I built LoyalAI — a relationship assistant app on the App Store — I needed to integrate GPT-4 with streaming responses in React Native. Most tutorials show you how to call the API. None of them show you how to handle streaming on mobile, manage costs, or design a UX that doesn't feel broken. This post covers all three.
Architecture: Never Call OpenAI Directly from the App
Your OpenAI API key must never leave your server. A decompiled React Native bundle will expose any key you hardcode or store in environment variables. Always proxy through your backend.
// Your Node.js backend — never expose this on the client
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
app.post('/api/chat', authenticate, async (req, res) => {
const { messages } = req.body;
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
});
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) res.write(`data: ${JSON.stringify({ text })}\n\n`);
}
res.write('data: [DONE]\n\n');
res.end();
});Streaming in React Native
React Native's fetch does not support streaming. You have two options: use EventSource (SSE) with a polyfill, or use a chunked XHR approach. For LoyalAI I used EventSource with the react-native-event-source package.
import EventSource from 'react-native-sse';
function useStreamingChat() {
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);
const sendMessage = useCallback((messages: Message[]) => {
setLoading(true);
setResponse('');
const es = new EventSource('https://your-api.com/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${token}` },
body: JSON.stringify({ messages }),
});
es.addEventListener('message', (e) => {
if (e.data === '[DONE]') {
setLoading(false);
es.close();
return;
}
const { text } = JSON.parse(e.data!);
setResponse(prev => prev + text);
});
es.addEventListener('error', () => { setLoading(false); es.close(); });
}, [token]);
return { response, loading, sendMessage };
}UX Patterns for AI on Mobile
Streaming text that appears character by character looks jarring on mobile unless you handle it well. Patterns that work:
- Show a typing indicator (three dots) while the first token arrives
- Auto-scroll to bottom as new tokens arrive — but stop scrolling if the user manually scrolls up
- Render markdown (bold, lists) — use react-native-markdown-display for LLM output
- Add a stop button — users will want to interrupt a long response
- Cache responses locally so hitting back doesn't re-fetch
Cost Optimization
GPT-4o costs $5/1M input tokens and $15/1M output tokens. With a few thousand daily active users sending multiple messages, costs compound fast. The techniques that actually matter:
- 1.Keep system prompts short and reuse them with prompt caching (Anthropic) or a cached messages array
- 2.Summarize conversation history after every 10 exchanges instead of sending the full context
- 3.Use gpt-4o-mini for low-stakes responses (suggestions, quick lookups) and gpt-4o for complex generation
- 4.Rate-limit per user at the API layer — not just the client
- 5.Log token usage per request to catch runaway prompts
System Prompt Design for Mobile Apps
Your system prompt is the most important lever you have. Keep it under 500 tokens. Be explicit about response format (short, mobile-friendly) and persona. A system prompt that says "respond in 2-3 sentences unless asked for detail" dramatically reduces output token costs.
Lesson from LoyalAI: a verbose system prompt that produced 400-word answers cost 10x more per conversation than a tightly scoped one that produced 80-word answers. Users actually preferred the shorter responses on mobile.
App Store Considerations
Apple reviews AI features carefully. Make sure your app description accurately describes the AI capabilities, include a content moderation layer (OpenAI's moderation endpoint is free and fast), and have a clear privacy policy covering what data is sent to OpenAI. Apple will reject apps that send user data to third-party AI APIs without clear disclosure.