๐ค Building the qz-l AI Chat Assistant Using Google Gemini 2.5 Flash Lite + Next.js
I recently added a new feature to qz-l.com: an AI-powered chat assistant that can shorten URLs, show analytics, search blog posts, and help users navigate the service --- all through natural language.
This post explains how the assistant works under the hood using the modern @google/genai SDK and the free-tier model gemini-2.5-flash-lite, which runs entirely at zero cost within Google's usage limits.
๐ Why Build an AI Assistant?
qz-l's mission is simple: privacy-first URL shortening with analytics.
But users often ask:
- "How do I create a short link?"
- "Where's the dashboard?"
- "Can I delete a URL?"
- "What does this blog post say?"
Instead of building a whole help UI, I added a chat interface that can perform real actions using function calling.
๐ง Model Choice: gemini-2.5-flash-lite (Free)
The assistant uses:
- SDK: @google/genai
- Model: gemini-2.5-flash-lite
- Platform: Google AI Studio (free tier)
Why this model?
- โ Completely free within quota
- โ Very fast and low latency
- โ Full function calling support
- โ Perfect for automation + chat
- โ Stable enough for production workloads
Inspired by this resource list: https://github.com/cheahjs/free-llm-api-resources
๐๏ธ System Architecture
The assistant lives inside a Next.js App Router API route:
/api/chat
High-level flow:
User Message
โ
Next.js API (/api/chat)
โ
Gemini LLM (with system prompt + tools)
โ
If function call โ server executes logic
โ
LLM formats final Markdown response
โ
Chat UI displays answer (links, QR codes, etc.)
โ๏ธ Using @google/genai
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey });
const result = await ai.models.generateContent({
model: "gemini-2.5-flash-lite",
contents,
config: {
temperature: 0.7,
maxOutputTokens: 1024,
systemInstruction: {
role: "system",
parts: [{ text: SYSTEM_PROMPT }],
},
tools: [
{
functionDeclarations: [
shortenUrlDeclaration,
getUrlAnalyticsDeclaration,
listRecentUrlsDeclaration,
deleteUrlDeclaration,
searchBlogPostsDeclaration,
],
},
],
},
});
๐งฉ Function Calling
const shortenUrlDeclaration = {
name: "shortenUrl",
description: "Generate a shortened URL",
parameters: {
type: "object",
properties: {
longUrl: { type: "string" },
},
required: ["longUrl"],
},
};
Example function call return:
{
"functionCall": {
"name": "shortenUrl",
"args": { "longUrl": "https://google.com" }
}
}
๐งต The Function-Call Loop
- Ask Gemini for the next message
- Detect if it requested a function
- Execute the function on the server
- Append the result
- Call Gemini again for the final answer
๐ฏ Why This Approach Works
- โ Zero cost (Gemini free tier)
- โ Fast responses
- โ Deterministic output
- โ Secure
- โ Extendable
๐ Current Capabilities
- Shorten URLs
- Generate QR codes
- Fetch analytics
- Delete links
- Show recent URLs
- Search blog posts
- Explain features
๐ฎ Coming Enhancements
- Auth-protected actions
- Rate limiting
- Streaming responses
- Better UI
๐ Final Thoughts
You don't need expensive models to build production AI features --- just solid architecture and a good system prompt.