π Deep Dive: How qz-l.com Built Safety Link Preview + AI Summary
Following our previous post on how we built a lightweight headless rendering service on a GCP e2-micro instance, this article explains how we transform raw webpage data into a safety-focused AI summary.
Our goal is simple:
Let users preview a webpage safely β without opening it β and understand what they are about to click.
To achieve this, we combine:
- our custom headless page renderer
- the Gemini AI model
- a carefully designed system prompt
- lightweight JSON parsing
- a strict safety-oriented analysis pipeline
Letβs walk through the architecture.
π§± Step 1 β Render the Page (From Previous Post)
We already covered this in Building a Lightweight Headless Rendering Service on GCP e2-micro for Safety Link Preview. Our renderer extracts:
<title><meta>description- OG tags
- favicon
- full visible body inner text
And all of this runs on:
- GCP e2-micro
- single browser instance
- single queued request
domcontentloadedinstead ofnetworkidle
This gives us clean, structured raw data like:
{
"url": "https://example.com",
"metadata": {
"title": "Example Domain",
"desc": "This domain is for use in illustrative examples...",
"ogTitle": "Example Domain",
"ogDesc": "",
"image": "",
"favicon": "/favicon.ico"
},
"text": "Example Domain\nThis domain is for use in illustrative examples..."
}
This is the input for AI.
π§ Step 2 β Constructing the AI Prompt
To make the AI response predictable and structured, we use a strict JSON-only system prompt.
The AI is not allowed to output anything except a JSON object.
Here is the exact system prompt:
You are a URL safety, summary, and content category analyzer.
Your task is to analyze a given webpage and return ONLY a valid JSON object with the following keys:
{
"summary": "1-2 sentence concise description of the page content",
"safety": "safe | maybe-unsafe | unsafe",
"category": "specific content category of the page",
"riskNotes": "brief explanation of any potential risks or why the page is safe"
}
Rules:
1. Safety field:
- "safe" = generally safe, trustworthy content
- "maybe-unsafe" = potentially risky, unknown origin, suspicious
- "unsafe" = clearly malicious, scam, adult content, phishing, malware
2. Category field:
- Be as specific as possible. Possible categories include but are not limited to:
"shopping", "login", "adult", "file-download", "news", "social",
"search-engine", "blog", "life-style", "beauty", "tech", "finance",
"education", "entertainment", "sports", "health", "government", "non-profit",
"personal", "forum", "other"
- Only use "other" if the page cannot be reasonably categorized.
- Avoid using "unknown" unless there is literally no content to analyze.
3. Summary: provide a clear, concise 1-2 sentence description of the page content.
4. RiskNotes: provide a short note explaining any potential risks, such as phishing, downloads, ads, tracking, adult content, etc.
Always output strictly valid JSON, nothing else. Do not include explanations, markdown, or extra text outside the JSON.
This ensures consistent results across thousands of different websites.
π¨ Step 3 β Preparing the AI Input
We build a user message containing:
- URL
- Extracted metadata
- Extracted visible text
Example:
Analyze the following webpage:
URL: https://example.com
Title: Example Domain
Description: This domain is for use in illustrative examples...
Visible body text:
Example Domain
This domain is for use in illustrative examples...
This gives the AI clear, structured context.
π€ Step 4 β Calling Gemini AI
We use gemini-2.5-flash-lite to keep responses fast and cheap.
Here is the core code:
const aiRes = await ai.models.generateContent({
model: "gemini-2.5-flash-lite",
contents: [{ role: "user", parts: [{ text: userMessage }] }],
config: {
temperature: 0,
maxOutputTokens: 1024,
systemInstruction: { role: "system", parts: [{ text: SYSTEM_PROMPT }] },
},
});
The temperature is set to 0 for deterministic JSON.
π§Ή Step 5 β Cleaning & Parsing the JSON
AI sometimes wraps JSON inside:
```json
{ ... }
So we sanitize it:
```ts
let cleanText = textOutput
.trim()
.replace(/^```json\s*/i, "")
.replace(/```$/, "");
Then parse:
let aiJson;
try {
aiJson = JSON.parse(cleanText);
} catch {
aiJson = {
summary: "AI was unable to summarize.",
safety: "unknown",
category: "unknown",
riskNotes: "Could not parse AI response.",
};
}
This ensures our API never breaks.
π Example Output
Here is the type of JSON AI returns:
{
"summary": "This page describes the Example Domain used for documentation and testing.",
"safety": "safe",
"category": "education",
"riskNotes": "No apparent risks; this is an official and static informational page."
}
We then return this to the client and show it in the Safety Link Preview UI.
π‘οΈ Why This Matters
With this pipeline, qz-l.com offers:
- Preview before clicking
- AI-understandable page content
- Safety classification
- Category tagging
- Risk explanation
- Metadata extraction
- No exposure to harmful sites
- Low-cost infrastructure
It gives users confidence about whether a link is:
- safe
- suspicious
- malicious
- NSFW
- phishing
- unknown
All without loading the page in their own browser.
π Whatβs Next
Weβre extending the Safety Link system to include:
- screenshot preview (coming)
- advanced threat classification
- URL pattern analysis
- phishing heuristics
- multi-model consensus for higher accuracy
- real-time blacklist updates
Stay tuned as we continue building the safest link preview experience on the web.