🔐 Deep Dive: How qz-l.com Built Safety Link Preview + AI Summary

Following our previous post on how we built a lightweight headless rendering service on a GCP e2-micro instance, this article explains how we transform raw webpage data into a safety-focused AI summary.

Our goal is simple:

Let users preview a webpage safely — without opening it — and understand what they are about to click.

To achieve this, we combine:

our custom headless page renderer
the Gemini AI model
a carefully designed system prompt
lightweight JSON parsing
a strict safety-oriented analysis pipeline

Let’s walk through the architecture.

🧱 Step 1 — Render the Page (From Previous Post)

We already covered this in Building a Lightweight Headless Rendering Service on GCP e2-micro for Safety Link Preview. Our renderer extracts:

<title>
<meta> description
OG tags
favicon
full visible body inner text

And all of this runs on:

GCP e2-micro
single browser instance
single queued request
domcontentloaded instead of networkidle

This gives us clean, structured raw data like:

{
  "url": "https://example.com",
  "metadata": {
    "title": "Example Domain",
    "desc": "This domain is for use in illustrative examples...",
    "ogTitle": "Example Domain",
    "ogDesc": "",
    "image": "",
    "favicon": "/favicon.ico"
  },
  "text": "Example Domain\nThis domain is for use in illustrative examples..."
}

This is the input for AI.

🧠 Step 2 — Constructing the AI Prompt

To make the AI response predictable and structured, we use a strict JSON-only system prompt.
The AI is not allowed to output anything except a JSON object.

Here is the exact system prompt:

You are a URL safety, summary, and content category analyzer. 
Your task is to analyze a given webpage and return ONLY a valid JSON object with the following keys: 

{
  "summary": "1-2 sentence concise description of the page content",
  "safety": "safe | maybe-unsafe | unsafe",
  "category": "specific content category of the page",
  "riskNotes": "brief explanation of any potential risks or why the page is safe"
}

Rules:

1. Safety field:
   - "safe" = generally safe, trustworthy content
   - "maybe-unsafe" = potentially risky, unknown origin, suspicious
   - "unsafe" = clearly malicious, scam, adult content, phishing, malware

2. Category field: 
   - Be as specific as possible. Possible categories include but are not limited to: 
     "shopping", "login", "adult", "file-download", "news", "social", 
     "search-engine", "blog", "life-style", "beauty", "tech", "finance", 
     "education", "entertainment", "sports", "health", "government", "non-profit", 
     "personal", "forum", "other"
   - Only use "other" if the page cannot be reasonably categorized.
   - Avoid using "unknown" unless there is literally no content to analyze.

3. Summary: provide a clear, concise 1-2 sentence description of the page content.

4. RiskNotes: provide a short note explaining any potential risks, such as phishing, downloads, ads, tracking, adult content, etc.

Always output strictly valid JSON, nothing else. Do not include explanations, markdown, or extra text outside the JSON.

This ensures consistent results across thousands of different websites.

📨 Step 3 — Preparing the AI Input

We build a user message containing:

URL
Extracted metadata
Extracted visible text

Example:

Analyze the following webpage:

URL: https://example.com
Title: Example Domain
Description: This domain is for use in illustrative examples...
Visible body text:
Example Domain
This domain is for use in illustrative examples...

This gives the AI clear, structured context.

🤖 Step 4 — Calling Gemini AI

We use gemini-2.5-flash-lite to keep responses fast and cheap.

Here is the core code:

const aiRes = await ai.models.generateContent({
  model: "gemini-2.5-flash-lite",
  contents: [{ role: "user", parts: [{ text: userMessage }] }],
  config: {
    temperature: 0,
    maxOutputTokens: 1024,
    systemInstruction: { role: "system", parts: [{ text: SYSTEM_PROMPT }] },
  },
});

The temperature is set to 0 for deterministic JSON.

🧹 Step 5 — Cleaning & Parsing the JSON

AI sometimes wraps JSON inside:

```json
{ ... }


So we sanitize it:

```ts
let cleanText = textOutput
  .trim()
  .replace(/^```json\s*/i, "")
  .replace(/```$/, "");

Then parse:

let aiJson;
try {
  aiJson = JSON.parse(cleanText);
} catch {
  aiJson = {
    summary: "AI was unable to summarize.",
    safety: "unknown",
    category: "unknown",
    riskNotes: "Could not parse AI response.",
  };
}

This ensures our API never breaks.

🔍 Example Output

Here is the type of JSON AI returns:

{
  "summary": "This page describes the Example Domain used for documentation and testing.",
  "safety": "safe",
  "category": "education",
  "riskNotes": "No apparent risks; this is an official and static informational page."
}

We then return this to the client and show it in the Safety Link Preview UI.

🛡️ Why This Matters

With this pipeline, qz-l.com offers:

Preview before clicking
AI-understandable page content
Safety classification
Category tagging
Risk explanation
Metadata extraction
No exposure to harmful sites
Low-cost infrastructure

It gives users confidence about whether a link is:

safe
suspicious
malicious
NSFW
phishing
unknown

All without loading the page in their own browser.

🚀 What’s Next

We’re extending the Safety Link system to include:

screenshot preview (coming)
advanced threat classification
URL pattern analysis
phishing heuristics
multi-model consensus for higher accuracy
real-time blacklist updates

Stay tuned as we continue building the safest link preview experience on the web.

How qz-l.com Built Safety Link Preview + AI Summary

🔐 Deep Dive: How qz-l.com Built Safety Link Preview + AI Summary

🧱 Step 1 — Render the Page (From Previous Post)

🧠 Step 2 — Constructing the AI Prompt

📨 Step 3 — Preparing the AI Input

🤖 Step 4 — Calling Gemini AI

🧹 Step 5 — Cleaning & Parsing the JSON

🔍 Example Output

🛡️ Why This Matters

🚀 What’s Next

Related Posts

Building a Lightweight Headless Rendering Service on GCP e2-micro for Safety Link Preview

Introducing Safety Link Preview Inside the qz-l Chatbot

Tech Dive: Building the AI-Powered QR Scanner Tool on QZ-L.com