plan.md

# pranala.org — Indonesia-only Search Engine

**Positioning:** Mesin pencari web Indonesia yang transparan, lambat tapi jujur.
**Tagline:** Google ranks the web. Pranala ranks Indonesian trust.

**Two hard constraints, equal weight:**
1. **100% Cloudflare** — Workers + D1 + R2 + KV + Queues + Durable Objects + Cron + Vectorize + Workers AI. Zero origin servers, zero PM2, zero Docker in the request path.
2. **100% AI automated** — every step from crawl to ranking to billing to ad serving to support runs without a human in the loop. The only human surface allowed is a legally-mandated DMCA contact form (scoped below).

---

## Automation Constitution (hard rules)

These rules are enforced in code via `src/lib/automation-guard.ts`. Any PR that violates them must fail CI.

| # | Rule | Enforcement |
|---|---|---|
| A1 | No admin UI may have an "Approve" or "Reject" button for content, listings, ads, badges, payouts, or rank decisions. | Lint rule: forbid `<button>Approve</button>` patterns; admin UI is read-only dashboards + AI-action audit log. |
| A2 | No D1 column may be named `manual_review`, `pending_approval`, `reviewer_id`, or equivalent. | Schema lint at migration time. |
| A3 | No worker route may require an `admin` JWT to write production data. All writes are AI-driven via Queue consumer or Cron. | Route-level test: every POST/PUT/DELETE traces back to `cron`, `queue`, `webhook`, or `verified-self-service`. |
| A4 | No Slack/email notification may have "approve here" CTAs. Notifications are post-hoc reports only. | Template lint. |
| A5 | Every AI decision must write `(decision, score, model, prompt_hash, ts)` to `ai_decisions` D1 table for audit. | Wrapper function `aiDecide()` is the only callable; ESLint forbids direct `env.AI.run()`. |
| A6 | DMCA counter-notice handling is the ONLY allowed human surface, and is bounded to a single inbox processed by external counsel weekly — never inside the worker. | Single `legal@pranala.org` mailbox; nothing else routes to a human. |

If a rule needs to break, the PR description must include `AUTOMATION-EXCEPTION: <ticket>` and the exception is permanent technical debt visible on `/admin/debt`.

---

## Free-Tier Reality (HARD CONSTRAINT)

**Pranala lives entirely on Cloudflare Free tier until revenue forces an upgrade.** Slow, honest, cheap. No surprises.

### Free-tier ceilings (per CF docs you linked)
| Resource | Free quota | Pranala budget |
|---|---|---|
| Workers requests | 100K/day | ≤ 80K/day for SERP + AC + API combined |
| Workers CPU (request) | 10ms | All hot paths must finish < 10ms |
| Workers CPU (cron) | 30s | Crawl/index/rank budget lives here |
| D1 storage | 5GB total, 1GB/DB | One DB only, ≤ 800MB metadata, HTML in R2 |
| D1 reads | 5M/day | SERP cached → ≤ 1M reads/day |
| D1 writes | 100K/day | Crawl ≤ 10K URLs/day → ≤ 50K writes |
| KV reads | 100K/day | AC trie cached in module scope |
| KV writes | 1K/day | Trie rebuilt weekly, not daily |
| KV storage | 1GB | Trie + config only, never per-page |
| R2 storage | 10GB | Gzipped HTML (~5KB avg) → ≤ 2M pages |
| R2 Class A ops | 1M/mo | ≤ 33K writes/day |
| R2 Class B ops | 10M/mo | Reads cheap, fine |
| Workers AI | 10K Neurons/day | Cron-only, ≤ 100 LLM calls/day |
| Vectorize stored dims | 5M/mo | 1024d × 5K vectors max |
| Vectorize query dims | 30M/mo | ≤ 30K queries/mo (cron AC fallback only) |
| Analytics Engine | free w/ caps | Search log + clickthroughs only |
| Cron triggers | 1K invocations/day on free | One `* * * * *` multiplexer = 1440/day → use `*/2 * * * *` (720/day) |

### Forbidden on free tier
- ❌ **Queues** (paid only) → replaced with **D1 polling work table** (`task_queue` row + `claimed_at` timestamp + `WHERE state='pending' LIMIT 50`)
- ❌ **Durable Objects** (paid only) → replaced with **D1 host-throttle row** (`host_state(host, last_fetch_at, crawl_delay_s)`, cron checks `WHERE last_fetch_at < datetime('now', '-' || crawl_delay_s || ' seconds')`)
- ❌ **Workers AI per request** → all AI moved to cron batch jobs
- ❌ **Vectorize on hot path** → Vectorize used only in weekly cron to update AC suggestions, never in the request

### Growth ladder (upgrade triggers tied to revenue)
| Stage | URLs indexed | Monthly traffic | Plan | Why upgrade |
|---|---|---|---|---|
| α (now) | 0 → 100K | < 50K req/day | Free | Bootstrap |
| β | 100K → 500K | 50–200K req/day | Workers Paid $5/mo | Need 30s CPU on requests for autocomplete semantic + Queues |
| γ | 500K → 5M | 200K–2M req/day | + R2 paid + D1 paid | HTML storage exceeds 10GB |
| δ | 5M+ | 2M+ req/day | + Workers AI paid | More Llama capacity |

Until pranala makes ≥ Rp 75K/mo from flio ad revenue (covers $5 Workers Paid), it stays on Free. **Premium subscriptions self-fund the next upgrade tier.** No cash burn.

---

## Cloudflare Resource Map (Free tier — single project)

- **Worker:** `pranala-org` → custom domain `pranala.org` (handles routes + cron in one Worker to stay simple and within request budget)
- **D1:** `pranala_db` (single DB ≤ 1GB; sharding deferred to stage γ)
- **R2:** `pranala-html` (gzipped HTML only, hash-keyed)
- **KV:** `PRANALA_KV` (AC trie, config, parsed robots.txt — single namespace)
- **Vectorize:** `pranala-ac-v1` (≤ 5K query embeddings, used in cron only) — added at stage β
- **Workers AI (cron only):** `@cf/baai/bge-m3`, `@cf/meta/llama-3.3-70b-instruct-fp8-fast`, `@cf/meta/m2m100-1.2b`
- **Analytics Engine:** `pranala_events`
- **Cron:** `*/2 * * * *` single multiplexer that dispatches by minute-modulo (crawl, index-roll-up, AC trie rebuild, etc.)
- ~~Queues~~ → D1 `task_queue` polling
- ~~Durable Objects~~ → D1 `host_state` row throttling

---

## FASE 1 — Bootstrap (semua di tier gratis, semua UI Bahasa Indonesia)

**Lingkup yang dikirim:**
1. Worker tunggal dengan router Hono (TS).
2. Skema D1 minimal: `urls`, `documents`, `links`, `host_state`, `task_queue`, `submissions`, `ai_decisions`, `search_log`.
3. Cron `*/2 * * * *` → minute-modulo dispatcher: crawl 5 URL/tick, index 10 dokumen/tick, AC trie rebuild jam 03:00 WIB.
4. Halaman beranda mobile-native (Bahasa Indonesia) dengan kotak pencarian + tombol mikrofon + autocomplete trie.
5. Halaman SERP mobile (Bahasa Indonesia) — bottom nav, kartu hasil, tab filter horizontal.
6. Endpoint `/api/ac` — trie + typo (BK-tree) saja di Fase 1; semantic ditunda ke Fase 2 (Vectorize).
7. Endpoint `/api/submit` — pengirim domain/sitemap publik, hasilnya di-enqueue di `task_queue`.
8. Crawler cron: ambil URL → fetch (UA `Mozilla/5.0 (compatible; pranala-bot/1.0; +https://pranala.org/bot)`) → simpan HTML gzip ke R2 → ekstrak title/meta/outlink → tulis ke D1 dengan throttle host.
9. Manifest PWA + Service Worker offline fallback.
10. Halaman `/dmca` (form publik) → `dmca_intake` D1.
11. Halaman `/bot` (info crawler) — disebutkan di User-Agent.
12. Halaman `/transparansi` — formula peringkat publik, daftar trusted seeds, statistik indeks live.

**Yang DITUNDA ke Fase 2+ (butuh Workers Paid):**
- PageRank shard processing (R2 graph shards) — Fase 2
- Vectorize semantic AC fallback — Fase 2
- Premium tier billing (Xendit) — Fase 2
- Owner dashboard + entity badge auto-verifier — Fase 3
- View Transitions API + Speculation Rules — Fase 1 ya (gratis di browser)
- Llama-generated reports — Fase 2 (butuh kuota AI lebih)
- API key issuance — Fase 3

**Bahasa: 100% Indonesia.** Tidak ada teks Inggris di UI publik. Slogan: "Mesin pencari web Indonesia yang transparan, lambat tapi jujur."

---

## Component Plans + Automation Contracts

### 1. Free slow crawl
**Flow:** Cron (1m) → drain N URLs from `crawl_queue` D1 table → enqueue to `pranala-fetch` → consumer fetches with `Mozilla/5.0 (compatible; pranala-bot/1.0; +https://pranala.org/bot)` → store gzipped HTML in R2 (`html/{sha256}.gz`) → metadata + outlinks to D1 shard → enqueue outlinks back into `pranala-discover` → discoverer dedupes against `seen_urls` (D1) → re-enqueues novel ones.

**Politeness:** `HostThrottle` DO per host, 1 req/sec default, honors `Crawl-delay` from robots.txt cached 24h in `pranala-robots` KV. No per-host concurrency.

**Automation contract:**
- Robots.txt parsed by code, never overridden by humans.
- Recrawl interval is a pure function of `(rank_score, last_change_detected, content_hash_age)`. No manual "force recrawl" button anywhere.
- Domain blocklist is auto-populated from spam vector hits + repeated 4xx/5xx; entries auto-expire after 30d unless re-flagged.

**CF limit fit:** consumer batch ≤ 100 URLs × 3 subreq each = 300 < 1000 cap. CPU < 30s. R2 PUT < 1000/invocation.

### 2. Premium indexing (sell speed, not rank)
**Tiers (auto-billed via Xendit):**
- Free: best-effort crawl, no SLA
- Starter Rp 99K/mo: 1K pages, weekly recrawl, indexing report
- Pro Rp 499K/mo: 10K pages, daily recrawl, structured-data AI report, broken-link AI report
- Business Rp 2.5M/mo: 100K pages, hourly recrawl, API access, entity badge auto-issuance

**Webhook → activation:**
1. Xendit `/webhook/xendit` verifies `x-callback-token`.
2. INSERT into `subscriptions`, UPDATE site `tier`, increment `crawl_priority` integer.
3. Crawler scheduler reads `crawl_priority DESC` first.
4. Indexing report is generated weekly by Cron → Llama 3.3 70B → markdown → R2 → linked from dashboard.

**Automation contract:** No human ever sees a payment row. Refund flow: Xendit chargeback webhook auto-disables tier and writes `ai_decisions` row.

### 3. Ranking algorithm (offline cron, no human levers)
```
Rank = 0.30·LinkRank + 0.20·TrustRank + 0.15·EntityRank
     + 0.10·Freshness + 0.10·IndoRelevance + 0.10·QueryRel
     + 0.05·Engagement − SpamPenalty
```

**LinkRank (PageRank at 10M scale on Workers):**
- Adjacency stored as R2 JSONL shards (`graph/shard-{0..255}.jsonl`), 1M edges/shard.
- Cron tick: load 1 shard → compute partial rank delta → write to D1 `rank_partial` → atomic merge.
- One full iteration ≈ 256 ticks ≈ 256 minutes.
- 30 iterations to convergence ≈ 5–6 days. Fits CF Worker 5-min cron CPU per tick.

**TrustRank:** seeds in `pranala-config` KV `trusted_seeds` = hardcoded `go.id`, `*.go.id`, `ojk.go.id`, `bpom.go.id`, `*.ac.id`, `kompas.com`, `tempo.co`, `detik.com` (curated once at launch, never edited by humans — changes go through a `seed_changes.sql` migration that requires `AUTOMATION-EXCEPTION` if added post-launch).

**EntityRank:** auto-derived from registry verification (see §7).

**Freshness:** content hash diff timestamp from R2 `html/` versioned objects.

**IndoRelevance:** `m2m100` language detect → if `id` weight = 1.0, if `en` and `.id` ccTLD = 0.6, else 0.0. Geo IP of origin server adds bonus.

**QueryRel:** Vectorize cosine + BM25 over title/h1/anchor.

**Engagement:** Analytics Engine rollup (CTR, dwell, pogo-stick rate) into D1 `engagement_daily`.

**SpamPenalty:** Vectorize cosine to `pranala-spam-v1`. Threshold ≥ 0.85 = full deindex; 0.70–0.85 = −0.5 rank; < 0.70 = clean. Threshold values in code, not in admin UI.

**Automation contract:** Weights are constants in `src/ranker/weights.ts`. Changing them requires a code commit + canary diff report (auto-generated). No runtime knobs.

### 4. PSE seed graph
One-time D1 seed at launch: ~5K trusted Indonesian domains. Crawler walks outward. No human re-seeds; `seed_v2` would be a code commit.

### 5. SERP (search results page)
- Worker reads top-K from D1 + Vectorize fan-out.
- Cache key = `sha256(query + lang + region)` → KV 60s.
- Organic results UNION ALL flio ad results, with `is_sponsored` flag rendered as yellow "Iklan" pill above and below the organic block. Ranker MUST NOT see ad bid as a feature (lint guards `bid` access in ranker module).
- **Speculation Rules API** prerenders top 3 organic links on hover/viewport (`<script type="speculationrules">`). Result tap feels instant.
- **View Transitions API** morphs result card → destination page on tap (where same-origin) and morphs SERP filters (All/Web/Image/News/Q&A) on swipe.
- **Streaming HTML** via `TransformStream` — first 3 results paint < 200ms, rest stream in.

### 5a. Autocomplete (instant search) — sub-100ms, fully AI-ranked
**Endpoint:** `GET /api/ac?q=<prefix>&lang=id` returns JSON `{suggestions: [{text, type, score}]}` in ≤ 100ms p95 from edge.

**Three-stage suggestion pipeline (all AI, zero human-curated lists):**
1. **Prefix trie (KV-backed):** popular query log rolled up nightly into a compressed trie stored in `pranala-cache` KV under `ac:trie:v{N}`. Worker loads once per cold start, pinned in module scope. Returns top-10 prefix matches in <5ms.
2. **Typo tolerance:** Damerau-Levenshtein distance ≤ 2 against a hot-vocabulary set (top 100K queries). Implemented as a BK-tree also in KV. ~10ms.
3. **Semantic completion:** if prefix length ≥ 4 chars and trie returns < 5 results, embed prefix with `@cf/baai/bge-m3` and fan-out Vectorize query against `pranala-content-v1` titles. Returns conceptually-related queries even with novel phrasing. ~60ms. Cached per-prefix in KV 5min.

**Indonesian-aware tokenization:**
- Stemmer: lightweight Sastrawi-derived rules ported to TS (handles `ber-`, `me-`, `pe-`, `-kan`, `-an`, `-i` affixes).
- City/region expansion: `jkt` → `jakarta`, `sby` → `surabaya`, `bdg` → `bandung` — table loaded from KV `pranala-config:city_aliases` (auto-built once from Wikipedia geo data, never hand-edited).
- Common typo map: `gigi → gigit`, `klinikgigi → klinik gigi` (segmentation), `apotik → apotek` — auto-generated from query log misspellings via Llama 3.3 weekly cron.

**Personalization (privacy-respecting):**
- Per-user recent queries stored in `localStorage` only — never sent to server, never stored in D1.
- Suggestion re-rank on client side: queries the user has searched before float to top.

**Voice autocomplete:**
- `SpeechRecognition` API with `lang="id-ID"` and `interimResults=true`.
- As partial transcript arrives, fires `/api/ac` on each pause.
- "Tap to talk" button uses `navigator.vibrate(20)` haptic on press.

**Trending suggestions (zero-state, when input is empty):**
- Cron rolls up Analytics Engine top-N queries from last 1h/24h/7d into KV `ac:trending:v{N}`.
- Indonesia-only filter via geo of original searches.
- No human ever picks a trending term. If something abusive trends, the spam-vector classifier auto-suppresses it (cosine to spam corpus on the suggestion text).

**Automation contract:** there is no `autocomplete_blocklist` table editable by humans. Suppression is purely vector-similarity and abuse-classifier driven, decisions written to `ai_decisions`.

**CF limit fit:** trie payload < 2MB (within 25MB KV value cap, well under cold-start budget). BK-tree similar. Vectorize query < 50ms p95. KV read 1ms.

### 6. flio ads integration
- pranala registers itself as a flio publisher unit at startup.
- SERP renders `<div data-flio-key="unit_pranala_serp_top" data-flio-mode="native">`.
- All ad logic (bidding, targeting, fraud, payout) handled by flio.net — already 100% AI.
- Revenue: flio remits to pranala wallet weekly via existing flio payout cron.

### 7. Site owner dashboard + entity badge (zero human verification)
**Domain ownership verification:** DNS TXT record `pranala-verify=<token>` OR `/.well-known/pranala-{token}` file fetch. Worker fetches and verifies. Auto-grants ownership.

**Entity badge — AI-only verification chain:**
1. NPWP regex match → cek `pajak.go.id` public NPWP validator endpoint.
2. PT/CV → `ahu.go.id` company name search → fuzzy match.
3. Fintech → OJK whitelist scrape (cached in KV daily).
4. Food/cosmetic/drug → BPOM lookup.
5. Healthcare → Kemenkes faskes registry.
6. All required passes? → auto-issue badge + write `ai_decisions`. Any fail? → auto-deny + write reason. Owner sees AI-generated explanation, can re-submit after fixing — but no human ever reviews.

**Automation contract:** there is no `verifications.reviewer_id` column.

### 8. API (auto-issued)
- Signup → auto-generate API key (32 bytes hex).
- `RateLimitDO` enforces per-key QPS and monthly quota.
- No human ever provisions a key.

### 9. Anti-abuse (AI-only)
- Spam detection: Vectorize cosine + Llama classify combo.
- Cloaking detection: render Worker fetches with `Googlebot` UA vs `pranala-bot` UA, diff > 30% by tokens → auto-flag.
- Click fraud on flio ads: handled by flio's existing CF Turnstile + bot management layer.

### 10. DMCA — the single legal carve-out
Indonesian UU ITE / Permenkominfo 5/2020 requires a reachable contact for takedown counter-notices. This is the ONLY human-readable surface.

- Public form `/dmca` posts to `dmca_intake` D1 table.
- AI auto-classifies notice validity (URL exists, claimant info complete, sworn statement present). If valid → auto-deindex matching URLs within 1 hour and email claimant.
- Counter-notice form posts to `dmca_counter`. The dmca_counter rows are emailed weekly to `legal@pranala.org` (external counsel) — no in-app review UI exists.
- Counsel responds via email; their reply is processed by an inbound email worker (Cloudflare Email Routing → Worker) that parses an HMAC-signed verdict token. No human clicks "approve" inside pranala's UI.

This is the maximum tolerated human contact: 1 mailbox, 1 weekly digest, decisions returned via signed token. Everything else is forbidden by Constitution rule A6.

---

## CF Limits Compliance Matrix

| CF limit | Worst-case load | Mitigation |
|---|---|---|
| Worker CPU 30s req | SERP fan-out ≤ 200ms | Vectorize + KV cache 60s |
| Worker CPU 5min cron | PageRank shard tick | 1 shard / tick, 256 shards |
| Subrequests 1000 | Crawler batch | ≤ 100 URLs × 3 subreq |
| D1 10GB/DB | 10M URLs metadata only | Sharded 10 DBs by URL hash |
| D1 ~100 SQL bind vars | Bulk inserts | Chunk to 80 |
| D1 30s query | Joins | Pre-denormalized hot tables |
| KV 1 write/sec/key | Counters | Counters live in DOs, never KV |
| KV 25MB value | HTML | HTML never in KV; goes to R2 |
| Queues 100 msg/batch | Crawl fanout | Re-enqueue, not recursion |
| Cron 250/account | Multiple schedulers | Single `* * * * *` multiplexer + DO routing |
| Workers AI rate-limit/model | Reports | Queue consumer, never inline |
| Vectorize 5M vectors/index | 10M docs | 2 named indexes by year shard |
| R2 list ops cost | URL discovery | Never list; index in D1 |

---

## Mobile-Native UI Standards

Pranala SERP must feel like a native iOS/Android app, not a desktop search page. All standards from `/home/ucok/CLAUDE.md` apply, plus search-specific patterns below.

### Layout
- **Sticky compact header (52px):** logo, search input, voice mic, profile avatar. Auto-hides on scroll-down, reveals on scroll-up via `IntersectionObserver`.
- **Bottom navigation (fixed, 64px + safe-area-inset-bottom):** `Cari` | `Trending` | `Riwayat` | `Tersimpan` | `Akun`. Active state = filled icon + label.
- **Search input always front-and-center** — never behind a hamburger.
- **NO hamburger menu anywhere.**
- **Pull-to-refresh** on result lists via `touchstart`/`touchmove` + CSS transform.
- **Skeleton loaders** (gray shimmer cards matching result-card shape) — no spinners.

### Search box (the killer surface)
- Full-width pill with 16px radius, 48px tall, system font 17px (no zoom-on-focus on iOS).
- Live autocomplete dropdown drops below input, full-width on mobile, max-height 60vh, scroll-snaps each item to 56px tap target.
- Each suggestion row: leading icon (clock for recent, fire for trending, sparkle for AI semantic, location for places), text, trailing arrow → on tap fills input; arrow tap = submit.
- **Voice mic button** inside the pill on the right, 44×44px, animates to pulsing red circle while listening.
- **Long-press on a recent query** = bottom sheet with `Hapus` / `Bagikan` / `Cari di tab baru`.
- **Swipe left on a recent query** = delete with undo toast.
- Esc / swipe-down on dropdown closes it; on iOS, `inputmode="search"` shows the right keyboard with a "Cari" key.

### Filter tabs (horizontal swipe, no taps required)
- Below the search header: `Semua` · `Web` · `Gambar` · `Berita` · `Tanya Jawab` · `Maps` · `Toko` · `Tokoh`.
- Native horizontal scroll-snap container (`scroll-snap-type: x mandatory`).
- Buttons render at edges (left/right chevron) when overflow, per CLAUDE.md `feedback_horizontal_slide` rule. Pure CSS detection via `:has(.snap-overflow)`.
- Tab swipe triggers View Transition (cross-fade + slide).

### Result cards
- 16px corner radius, subtle shadow, 12px gap, full-bleed thumbnail when available.
- Each card shows: favicon · domain · title · snippet · meta row (time, geo, entity badge if any).
- **Long-press** = bottom-sheet menu: `Buka` / `Buka di tab baru` / `Bagikan` (uses `navigator.share()`) / `Salin tautan` / `Tidak relevan` (writes negative-feedback row → ranker training signal).
- **Swipe right** on a card = save to `Tersimpan` (offline-readable via Service Worker cache).
- **Tap** triggers View Transition where the favicon morphs into the destination page header.

### Image search
- Masonry grid via CSS `column-count` (1 col mobile, 2 tablet, 3 desktop).
- Tap → fullscreen lightbox with pinch-zoom (CSS `touch-action: pinch-zoom`).
- Stories-style horizontal swipe between images, dot indicators top.
- Long-press = save / share / report.

### Maps tab
- Leaflet + offline tiles served from R2 + Cloudflare cache.
- "Use my location" = `navigator.geolocation` with `enableHighAccuracy: false` (battery-friendly).
- Result pins clustered, tap = peek card slides up from bottom (50% sheet → drag up = full).

### Native cues
- System font stack: `-apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif`.
- Safe-area insets on top, bottom, left (for landscape notch).
- Tap targets ≥ 48×48px.
- Spacing 12px between interactive elements.
- `touch-action: manipulation` to kill 300ms tap delay.
- `prefers-reduced-motion: reduce` disables View Transitions.
- Dark mode auto via `prefers-color-scheme`.
- Haptic feedback on pull-to-refresh, voice-mic press, save-toast: `navigator.vibrate(15)`.

### Advanced web platform features (progressive enhancement)
| Feature | Use |
|---|---|
| **View Transitions API** | SERP ↔ result page, filter tab switches |
| **Speculation Rules API** | Prerender top 3 organic links |
| **Navigation API** | Back/forward feels instant, no white flash |
| **`navigator.share()`** | Native share sheet for results |
| **`SpeechRecognition`** (id-ID) | Voice input, real-time transcript |
| **`SpeechSynthesis`** (id-ID) | Read result snippets aloud (accessibility + voice answers) |
| **Web Push API** | Optional: notify when a saved query has new results |
| **Service Worker** | Network-first for `/api/*`, cache-first for static, offline fallback page, saved-results offline access |
| **Web App Manifest** | `display: standalone`, theme color, 192/512 icons, share-target API so users can share TO pranala |
| **Web Share Target API** | Pranala becomes a destination in Android share sheets — share-to-search any URL/text |
| **Background Sync** | Saved queries refresh in background |
| **`content-visibility: auto`** | Off-screen result cards skip layout/paint |
| **`scroll-snap`** | Horizontal carousels and image gallery |
| **`@view-transition` CSS** | Page-level navigation transitions |
| **CSS container queries** | Result card adapts to grid/list density |
| **CSS `:has()`** | Style header based on whether search has focus, etc. |
| **103 Early Hints** | Preload critical CSS + autocomplete trie before HTML response |
| **Brotli** | Auto on Cloudflare |
| **AVIF + WebP** with `<picture>` | Image search thumbs |

### PWA installability
- Manifest at `/manifest.webmanifest`.
- After 2nd visit: bottom-sheet promo to "Pasang Pranala" (Add to Home Screen).
- Standalone display: hide URL bar, full-bleed brand color.
- Share Target: register pranala as receiver of `text/plain` and `text/uri-list` — share any link from Chrome/WA → pranala opens with that URL pre-loaded as a "more like this" semantic search.

### Accessibility (WCAG 2.1 AA)
- Visible focus rings via `:focus-visible`.
- Skip-to-content link as first focusable element.
- ARIA live region announces "X hasil ditemukan" after each search.
- Screen reader labels in Indonesian (`aria-label="Tombol mikrofon untuk pencarian suara"`).
- Contrast ratios audited in CI.
- Text scales to 200% without horizontal scroll.

### Performance budget (enforced in CI)
- LCP < 2.0s on 3G Indonesia (target, not 2.5s).
- INP < 100ms (interaction).
- CLS < 0.05.
- JS bundle < 50KB gzipped on critical path; rest lazy-loaded.
- Autocomplete p95 < 100ms edge.
- Lighthouse Mobile ≥ 95 across all four scores.

---

## Build Order

1. Worker scaffold + wrangler.toml (multi-env, multi-D1 binding)
2. D1 schema migrations: ctrl + 10 idx shards (sharding helper module)
3. Trusted-seed loader + crawl_queue seed
4. `HostThrottle` Durable Object
5. Crawler queue consumer (`pranala-org-crawler`)
6. HTML parser + outlink extractor
7. Indexer queue consumer + Vectorize embedder
8. Ranker cron multiplexer + R2 graph shard writer
9. PageRank/TrustRank/EntityRank shard processors
10. SERP route + KV cache + flio ad slot
11. **Autocomplete pipeline** — trie builder cron, BK-tree typo, semantic Vectorize fallback, voice input, trending zero-state
12. **Mobile UI shell** — sticky header, bottom nav, skeleton loaders, View Transitions, Speculation Rules, scroll-snap filter tabs, swipe-saveable result cards, long-press bottom sheets
13. **PWA layer** — manifest, Service Worker (network-first API, cache-first static), Share Target receiver, Web Push opt-in
14. **Indonesian-aware tokenizer** — Sastrawi-derived stemmer + city aliases + auto-typo map (Llama weekly cron)
15. Owner dashboard (DNS-TXT auto-verify)
16. Entity badge auto-verifier (5 registry adapters)
17. Premium tier Xendit webhook + crawl_priority
18. AI report generator (Llama 3.3 70B → R2 markdown)
19. API key issuance + RateLimitDO
20. Spam vectorize index + cron classifier
21. DMCA intake + AI classifier + email-routing worker
22. `automation-guard.ts` lint suite + CI gate
23. CI perf budget gate (Lighthouse mobile ≥ 95, autocomplete p95 < 100ms)

---

## Automation acceptance gate (must pass before launch)

- [ ] `grep -rn "Approve\|Reject\|reviewer_id\|manual_review" src/` → 0 hits
- [ ] All POST/PUT/DELETE routes traced to `cron|queue|webhook|verified-self-service` (automated trace test)
- [ ] `ai_decisions` table has rows for last 24h covering ≥ 95% of state changes
- [ ] No human Slack/email alert says "click to approve"
- [ ] DMCA flow: only `legal@pranala.org` mailbox is human; counter-notice replies are HMAC-signed tokens
- [ ] Lint rule blocks `env.AI.run` direct calls outside `aiDecide()` wrapper
- [ ] Admin dashboard renders read-only — `<form>` count = 0 except `/dmca` public intake
- [ ] Autocomplete: no `autocomplete_blocklist` table; suppression is vector-classifier only
- [ ] Trending suggestions sourced from Analytics Engine rollup, not human picks
- [ ] Mobile Lighthouse ≥ 95 across all four categories (CI gate)
- [ ] Autocomplete p95 < 100ms from edge (synthetic check in CI)
- [ ] PWA installable check passes (manifest + SW + HTTPS)
- [ ] Share Target API registered (verified by intent simulation)
- [ ] No hamburger menus anywhere — bottom nav only
- [ ] Voice search works in Chrome Android + Safari iOS (lang=id-ID)

---

## ⚙ HARD CONSTRAINTS (enforced for all sites)

This domain MUST operate within these constraints — no exceptions:

- **100% Cloudflare serverless** — Workers + D1 + R2 + KV + Workers AI + Vectorize. NEVER PM2, NEVER VPS, NEVER Docker in production path.
- **100% AI-automated** — every customer interaction, every moderation decision, every transaction reconcile = AI. No manual queue, no live human chat support, no physical fulfillment.
- **1-operator solo** — one person can run the entire operation from a phone. No team meetings, no shared inbox, no shift rotation.
- **WhatsApp AI bot** for all support (24/7, instant response, no SLA promises that need humans).
- **Mayar QRIS** for all Indonesian payments (subscription auto-renew, no manual invoicing).
- **Indonesian UI primary** — bahasa-first, English fallback only where unavoidable.
- **Privacy** — opt-in only, delete-on-request honored within 24h (cron-driven).
- **No physical goods, no inventory** — digital products + affiliate referrals only.

**If the plan above describes any flow that violates these constraints, treat the plan as ASPIRATIONAL only and rework before building. The constraint trifecta wins.**

PREVIEW

pranala.org — Indonesia-only Search Engine

Positioning: Mesin pencari web Indonesia yang transparan, lambat tapi jujur. Tagline: Google ranks the web. Pranala ranks Indonesian trust.

Two hard constraints, equal weight:

100% Cloudflare — Workers + D1 + R2 + KV + Queues + Durable Objects + Cron + Vectorize + Workers AI. Zero origin servers, zero PM2, zero Docker in the request path.
100% AI automated — every step from crawl to ranking to billing to ad serving to support runs without a human in the loop. The only human surface allowed is a legally-mandated DMCA contact form (scoped below).

Automation Constitution (hard rules)

These rules are enforced in code via src/lib/automation-guard.ts. Any PR that violates them must fail CI.

#	Rule	Enforcement
A1	No admin UI may have an "Approve" or "Reject" button for content, listings, ads, badges, payouts, or rank decisions.	Lint rule: forbid `<button>Approve</button>` patterns; admin UI is read-only dashboards + AI-action audit log.
A2	No D1 column may be named `manual_review`, `pending_approval`, `reviewer_id`, or equivalent.	Schema lint at migration time.
A3	No worker route may require an `admin` JWT to write production data. All writes are AI-driven via Queue consumer or Cron.	Route-level test: every POST/PUT/DELETE traces back to `cron`, `queue`, `webhook`, or `verified-self-service`.
A4	No Slack/email notification may have "approve here" CTAs. Notifications are post-hoc reports only.	Template lint.
A5	Every AI decision must write `(decision, score, model, prompt_hash, ts)` to `ai_decisions` D1 table for audit.	Wrapper function `aiDecide()` is the only callable; ESLint forbids direct `env.AI.run()`.
A6	DMCA counter-notice handling is the ONLY allowed human surface, and is bounded to a single inbox processed by external counsel weekly — never inside the worker.	Single `legal@pranala.org` mailbox; nothing else routes to a human.

If a rule needs to break, the PR description must include AUTOMATION-EXCEPTION: <ticket> and the exception is permanent technical debt visible on /admin/debt.

Free-Tier Reality (HARD CONSTRAINT)

Pranala lives entirely on Cloudflare Free tier until revenue forces an upgrade. Slow, honest, cheap. No surprises.

Free-tier ceilings (per CF docs you linked)

Resource	Free quota	Pranala budget
Workers requests	100K/day	≤ 80K/day for SERP + AC + API combined
Workers CPU (request)	10ms	All hot paths must finish < 10ms
Workers CPU (cron)	30s	Crawl/index/rank budget lives here
D1 storage	5GB total, 1GB/DB	One DB only, ≤ 800MB metadata, HTML in R2
D1 reads	5M/day	SERP cached → ≤ 1M reads/day
D1 writes	100K/day	Crawl ≤ 10K URLs/day → ≤ 50K writes
KV reads	100K/day	AC trie cached in module scope
KV writes	1K/day	Trie rebuilt weekly, not daily
KV storage	1GB	Trie + config only, never per-page
R2 storage	10GB	Gzipped HTML (~5KB avg) → ≤ 2M pages
R2 Class A ops	1M/mo	≤ 33K writes/day
R2 Class B ops	10M/mo	Reads cheap, fine
Workers AI	10K Neurons/day	Cron-only, ≤ 100 LLM calls/day
Vectorize stored dims	5M/mo	1024d × 5K vectors max
Vectorize query dims	30M/mo	≤ 30K queries/mo (cron AC fallback only)
Analytics Engine	free w/ caps	Search log + clickthroughs only
Cron triggers	1K invocations/day on free	One `* * * * ` multiplexer = 1440/day → use `/2 * * * *` (720/day)

Forbidden on free tier

❌ Queues (paid only) → replaced with D1 polling work table (task_queue row + claimed_at timestamp + WHERE state='pending' LIMIT 50)
❌ Durable Objects (paid only) → replaced with D1 host-throttle row (host_state(host, last_fetch_at, crawl_delay_s), cron checks WHERE last_fetch_at < datetime('now', '-' || crawl_delay_s || ' seconds'))
❌ Workers AI per request → all AI moved to cron batch jobs
❌ Vectorize on hot path → Vectorize used only in weekly cron to update AC suggestions, never in the request

Growth ladder (upgrade triggers tied to revenue)

Stage	URLs indexed	Monthly traffic	Plan	Why upgrade
α (now)	0 → 100K	< 50K req/day	Free	Bootstrap
β	100K → 500K	50–200K req/day	Workers Paid $5/mo	Need 30s CPU on requests for autocomplete semantic + Queues
γ	500K → 5M	200K–2M req/day	+ R2 paid + D1 paid	HTML storage exceeds 10GB
δ	5M+	2M+ req/day	+ Workers AI paid	More Llama capacity

Until pranala makes ≥ Rp 75K/mo from flio ad revenue (covers $5 Workers Paid), it stays on Free. Premium subscriptions self-fund the next upgrade tier. No cash burn.

Cloudflare Resource Map (Free tier — single project)

Worker: pranala-org → custom domain pranala.org (handles routes + cron in one Worker to stay simple and within request budget)
D1: pranala_db (single DB ≤ 1GB; sharding deferred to stage γ)
R2: pranala-html (gzipped HTML only, hash-keyed)
KV: PRANALA_KV (AC trie, config, parsed robots.txt — single namespace)
Vectorize: pranala-ac-v1 (≤ 5K query embeddings, used in cron only) — added at stage β
Workers AI (cron only): @cf/baai/bge-m3, @cf/meta/llama-3.3-70b-instruct-fp8-fast, @cf/meta/m2m100-1.2b
Analytics Engine: pranala_events
Cron: */2 * * * * single multiplexer that dispatches by minute-modulo (crawl, index-roll-up, AC trie rebuild, etc.)
~~Queues~~ → D1 task_queue polling
~~Durable Objects~~ → D1 host_state row throttling

FASE 1 — Bootstrap (semua di tier gratis, semua UI Bahasa Indonesia)

Lingkup yang dikirim:

Worker tunggal dengan router Hono (TS).
Skema D1 minimal: urls, documents, links, host_state, task_queue, submissions, ai_decisions, search_log.
Cron */2 * * * * → minute-modulo dispatcher: crawl 5 URL/tick, index 10 dokumen/tick, AC trie rebuild jam 03:00 WIB.
Halaman beranda mobile-native (Bahasa Indonesia) dengan kotak pencarian + tombol mikrofon + autocomplete trie.
Halaman SERP mobile (Bahasa Indonesia) — bottom nav, kartu hasil, tab filter horizontal.
Endpoint /api/ac — trie + typo (BK-tree) saja di Fase 1; semantic ditunda ke Fase 2 (Vectorize).
Endpoint /api/submit — pengirim domain/sitemap publik, hasilnya di-enqueue di task_queue.
Crawler cron: ambil URL → fetch (UA Mozilla/5.0 (compatible; pranala-bot/1.0; +https://pranala.org/bot)) → simpan HTML gzip ke R2 → ekstrak title/meta/outlink → tulis ke D1 dengan throttle host.
Manifest PWA + Service Worker offline fallback.
Halaman /dmca (form publik) → dmca_intake D1.
Halaman /bot (info crawler) — disebutkan di User-Agent.
Halaman /transparansi — formula peringkat publik, daftar trusted seeds, statistik indeks live.

Yang DITUNDA ke Fase 2+ (butuh Workers Paid):

PageRank shard processing (R2 graph shards) — Fase 2
Vectorize semantic AC fallback — Fase 2
Premium tier billing (Xendit) — Fase 2
Owner dashboard + entity badge auto-verifier — Fase 3
View Transitions API + Speculation Rules — Fase 1 ya (gratis di browser)
Llama-generated reports — Fase 2 (butuh kuota AI lebih)
API key issuance — Fase 3

Bahasa: 100% Indonesia. Tidak ada teks Inggris di UI publik. Slogan: "Mesin pencari web Indonesia yang transparan, lambat tapi jujur."

Component Plans + Automation Contracts

1. Free slow crawl

Flow: Cron (1m) → drain N URLs from crawl_queue D1 table → enqueue to pranala-fetch → consumer fetches with Mozilla/5.0 (compatible; pranala-bot/1.0; +https://pranala.org/bot) → store gzipped HTML in R2 (html/{sha256}.gz) → metadata + outlinks to D1 shard → enqueue outlinks back into pranala-discover → discoverer dedupes against seen_urls (D1) → re-enqueues novel ones.

Politeness: HostThrottle DO per host, 1 req/sec default, honors Crawl-delay from robots.txt cached 24h in pranala-robots KV. No per-host concurrency.

Automation contract:

Robots.txt parsed by code, never overridden by humans.
Recrawl interval is a pure function of (rank_score, last_change_detected, content_hash_age). No manual "force recrawl" button anywhere.
Domain blocklist is auto-populated from spam vector hits + repeated 4xx/5xx; entries auto-expire after 30d unless re-flagged.

CF limit fit: consumer batch ≤ 100 URLs × 3 subreq each = 300 < 1000 cap. CPU < 30s. R2 PUT < 1000/invocation.

2. Premium indexing (sell speed, not rank)

Tiers (auto-billed via Xendit):

Free: best-effort crawl, no SLA
Starter Rp 99K/mo: 1K pages, weekly recrawl, indexing report
Pro Rp 499K/mo: 10K pages, daily recrawl, structured-data AI report, broken-link AI report
Business Rp 2.5M/mo: 100K pages, hourly recrawl, API access, entity badge auto-issuance

Webhook → activation:

Xendit /webhook/xendit verifies x-callback-token.
INSERT into subscriptions, UPDATE site tier, increment crawl_priority integer.
Crawler scheduler reads crawl_priority DESC first.
Indexing report is generated weekly by Cron → Llama 3.3 70B → markdown → R2 → linked from dashboard.

Automation contract: No human ever sees a payment row. Refund flow: Xendit chargeback webhook auto-disables tier and writes ai_decisions row.

3. Ranking algorithm (offline cron, no human levers)

Rank = 0.30·LinkRank + 0.20·TrustRank + 0.15·EntityRank
     + 0.10·Freshness + 0.10·IndoRelevance + 0.10·QueryRel
     + 0.05·Engagement − SpamPenalty

LinkRank (PageRank at 10M scale on Workers):

Adjacency stored as R2 JSONL shards (graph/shard-{0..255}.jsonl), 1M edges/shard.
Cron tick: load 1 shard → compute partial rank delta → write to D1 rank_partial → atomic merge.
One full iteration ≈ 256 ticks ≈ 256 minutes.
30 iterations to convergence ≈ 5–6 days. Fits CF Worker 5-min cron CPU per tick.

TrustRank: seeds in pranala-config KV trusted_seeds = hardcoded go.id, *.go.id, ojk.go.id, bpom.go.id, *.ac.id, kompas.com, tempo.co, detik.com (curated once at launch, never edited by humans — changes go through a seed_changes.sql migration that requires AUTOMATION-EXCEPTION if added post-launch).

EntityRank: auto-derived from registry verification (see §7).

Freshness: content hash diff timestamp from R2 html/ versioned objects.

IndoRelevance: m2m100 language detect → if id weight = 1.0, if en and .id ccTLD = 0.6, else 0.0. Geo IP of origin server adds bonus.

QueryRel: Vectorize cosine + BM25 over title/h1/anchor.

Engagement: Analytics Engine rollup (CTR, dwell, pogo-stick rate) into D1 engagement_daily.

SpamPenalty: Vectorize cosine to pranala-spam-v1. Threshold ≥ 0.85 = full deindex; 0.70–0.85 = −0.5 rank; < 0.70 = clean. Threshold values in code, not in admin UI.

Automation contract: Weights are constants in src/ranker/weights.ts. Changing them requires a code commit + canary diff report (auto-generated). No runtime knobs.

4. PSE seed graph

One-time D1 seed at launch: ~5K trusted Indonesian domains. Crawler walks outward. No human re-seeds; seed_v2 would be a code commit.

5. SERP (search results page)

Worker reads top-K from D1 + Vectorize fan-out.
Cache key = sha256(query + lang + region) → KV 60s.
Organic results UNION ALL flio ad results, with is_sponsored flag rendered as yellow "Iklan" pill above and below the organic block. Ranker MUST NOT see ad bid as a feature (lint guards bid access in ranker module).
Speculation Rules API prerenders top 3 organic links on hover/viewport (<script type="speculationrules">). Result tap feels instant.
View Transitions API morphs result card → destination page on tap (where same-origin) and morphs SERP filters (All/Web/Image/News/Q&A) on swipe.
Streaming HTML via TransformStream — first 3 results paint < 200ms, rest stream in.

5a. Autocomplete (instant search) — sub-100ms, fully AI-ranked

Endpoint: GET /api/ac?q=<prefix>&lang=id returns JSON {suggestions: [{text, type, score}]} in ≤ 100ms p95 from edge.

Three-stage suggestion pipeline (all AI, zero human-curated lists):

Prefix trie (KV-backed): popular query log rolled up nightly into a compressed trie stored in pranala-cache KV under ac:trie:v{N}. Worker loads once per cold start, pinned in module scope. Returns top-10 prefix matches in <5ms.
Typo tolerance: Damerau-Levenshtein distance ≤ 2 against a hot-vocabulary set (top 100K queries). Implemented as a BK-tree also in KV. ~10ms.
Semantic completion: if prefix length ≥ 4 chars and trie returns < 5 results, embed prefix with @cf/baai/bge-m3 and fan-out Vectorize query against pranala-content-v1 titles. Returns conceptually-related queries even with novel phrasing. ~60ms. Cached per-prefix in KV 5min.

Indonesian-aware tokenization:

Stemmer: lightweight Sastrawi-derived rules ported to TS (handles ber-, me-, pe-, -kan, -an, -i affixes).
City/region expansion: jkt → jakarta, sby → surabaya, bdg → bandung — table loaded from KV pranala-config:city_aliases (auto-built once from Wikipedia geo data, never hand-edited).
Common typo map: gigi → gigit, klinikgigi → klinik gigi (segmentation), apotik → apotek — auto-generated from query log misspellings via Llama 3.3 weekly cron.

Personalization (privacy-respecting):

Per-user recent queries stored in localStorage only — never sent to server, never stored in D1.
Suggestion re-rank on client side: queries the user has searched before float to top.

Voice autocomplete:

SpeechRecognition API with lang="id-ID" and interimResults=true.
As partial transcript arrives, fires /api/ac on each pause.
"Tap to talk" button uses navigator.vibrate(20) haptic on press.

Trending suggestions (zero-state, when input is empty):

Cron rolls up Analytics Engine top-N queries from last 1h/24h/7d into KV ac:trending:v{N}.
Indonesia-only filter via geo of original searches.
No human ever picks a trending term. If something abusive trends, the spam-vector classifier auto-suppresses it (cosine to spam corpus on the suggestion text).

Automation contract: there is no autocomplete_blocklist table editable by humans. Suppression is purely vector-similarity and abuse-classifier driven, decisions written to ai_decisions.

CF limit fit: trie payload < 2MB (within 25MB KV value cap, well under cold-start budget). BK-tree similar. Vectorize query < 50ms p95. KV read 1ms.

6. flio ads integration

pranala registers itself as a flio publisher unit at startup.
SERP renders <div data-flio-key="unit_pranala_serp_top" data-flio-mode="native">.
All ad logic (bidding, targeting, fraud, payout) handled by flio.net — already 100% AI.
Revenue: flio remits to pranala wallet weekly via existing flio payout cron.

7. Site owner dashboard + entity badge (zero human verification)

Domain ownership verification: DNS TXT record pranala-verify=<token> OR /.well-known/pranala-{token} file fetch. Worker fetches and verifies. Auto-grants ownership.

Entity badge — AI-only verification chain:

NPWP regex match → cek pajak.go.id public NPWP validator endpoint.
PT/CV → ahu.go.id company name search → fuzzy match.
Fintech → OJK whitelist scrape (cached in KV daily).
Food/cosmetic/drug → BPOM lookup.
Healthcare → Kemenkes faskes registry.
All required passes? → auto-issue badge + write ai_decisions. Any fail? → auto-deny + write reason. Owner sees AI-generated explanation, can re-submit after fixing — but no human ever reviews.

Automation contract: there is no verifications.reviewer_id column.

8. API (auto-issued)

Signup → auto-generate API key (32 bytes hex).
RateLimitDO enforces per-key QPS and monthly quota.
No human ever provisions a key.

9. Anti-abuse (AI-only)

Spam detection: Vectorize cosine + Llama classify combo.
Cloaking detection: render Worker fetches with Googlebot UA vs pranala-bot UA, diff > 30% by tokens → auto-flag.
Click fraud on flio ads: handled by flio's existing CF Turnstile + bot management layer.

10. DMCA — the single legal carve-out

Indonesian UU ITE / Permenkominfo 5/2020 requires a reachable contact for takedown counter-notices. This is the ONLY human-readable surface.

Public form /dmca posts to dmca_intake D1 table.
AI auto-classifies notice validity (URL exists, claimant info complete, sworn statement present). If valid → auto-deindex matching URLs within 1 hour and email claimant.
Counter-notice form posts to dmca_counter. The dmca_counter rows are emailed weekly to legal@pranala.org (external counsel) — no in-app review UI exists.
Counsel responds via email; their reply is processed by an inbound email worker (Cloudflare Email Routing → Worker) that parses an HMAC-signed verdict token. No human clicks "approve" inside pranala's UI.

This is the maximum tolerated human contact: 1 mailbox, 1 weekly digest, decisions returned via signed token. Everything else is forbidden by Constitution rule A6.

CF Limits Compliance Matrix

CF limit	Worst-case load	Mitigation
Worker CPU 30s req	SERP fan-out ≤ 200ms	Vectorize + KV cache 60s
Worker CPU 5min cron	PageRank shard tick	1 shard / tick, 256 shards
Subrequests 1000	Crawler batch	≤ 100 URLs × 3 subreq
D1 10GB/DB	10M URLs metadata only	Sharded 10 DBs by URL hash
D1 ~100 SQL bind vars	Bulk inserts	Chunk to 80
D1 30s query	Joins	Pre-denormalized hot tables
KV 1 write/sec/key	Counters	Counters live in DOs, never KV
KV 25MB value	HTML	HTML never in KV; goes to R2
Queues 100 msg/batch	Crawl fanout	Re-enqueue, not recursion
Cron 250/account	Multiple schedulers	Single `* * * * *` multiplexer + DO routing
Workers AI rate-limit/model	Reports	Queue consumer, never inline
Vectorize 5M vectors/index	10M docs	2 named indexes by year shard
R2 list ops cost	URL discovery	Never list; index in D1

Mobile-Native UI Standards

Pranala SERP must feel like a native iOS/Android app, not a desktop search page. All standards from /home/ucok/CLAUDE.md apply, plus search-specific patterns below.

Layout

Sticky compact header (52px): logo, search input, voice mic, profile avatar. Auto-hides on scroll-down, reveals on scroll-up via IntersectionObserver.
Bottom navigation (fixed, 64px + safe-area-inset-bottom): Cari | Trending | Riwayat | Tersimpan | Akun. Active state = filled icon + label.
Search input always front-and-center — never behind a hamburger.
NO hamburger menu anywhere.
Pull-to-refresh on result lists via touchstart/touchmove + CSS transform.
Skeleton loaders (gray shimmer cards matching result-card shape) — no spinners.

Search box (the killer surface)

Full-width pill with 16px radius, 48px tall, system font 17px (no zoom-on-focus on iOS).
Live autocomplete dropdown drops below input, full-width on mobile, max-height 60vh, scroll-snaps each item to 56px tap target.
Each suggestion row: leading icon (clock for recent, fire for trending, sparkle for AI semantic, location for places), text, trailing arrow → on tap fills input; arrow tap = submit.
Voice mic button inside the pill on the right, 44×44px, animates to pulsing red circle while listening.
Long-press on a recent query = bottom sheet with Hapus / Bagikan / Cari di tab baru.
Swipe left on a recent query = delete with undo toast.
Esc / swipe-down on dropdown closes it; on iOS, inputmode="search" shows the right keyboard with a "Cari" key.

Filter tabs (horizontal swipe, no taps required)

Below the search header: Semua · Web · Gambar · Berita · Tanya Jawab · Maps · Toko · Tokoh.
Native horizontal scroll-snap container (scroll-snap-type: x mandatory).
Buttons render at edges (left/right chevron) when overflow, per CLAUDE.md feedback_horizontal_slide rule. Pure CSS detection via :has(.snap-overflow).
Tab swipe triggers View Transition (cross-fade + slide).

Result cards

16px corner radius, subtle shadow, 12px gap, full-bleed thumbnail when available.
Each card shows: favicon · domain · title · snippet · meta row (time, geo, entity badge if any).
Long-press = bottom-sheet menu: Buka / Buka di tab baru / Bagikan (uses navigator.share()) / Salin tautan / Tidak relevan (writes negative-feedback row → ranker training signal).
Swipe right on a card = save to Tersimpan (offline-readable via Service Worker cache).
Tap triggers View Transition where the favicon morphs into the destination page header.

Image search

Masonry grid via CSS column-count (1 col mobile, 2 tablet, 3 desktop).
Tap → fullscreen lightbox with pinch-zoom (CSS touch-action: pinch-zoom).
Stories-style horizontal swipe between images, dot indicators top.
Long-press = save / share / report.

Maps tab

Leaflet + offline tiles served from R2 + Cloudflare cache.
"Use my location" = navigator.geolocation with enableHighAccuracy: false (battery-friendly).
Result pins clustered, tap = peek card slides up from bottom (50% sheet → drag up = full).

Native cues

System font stack: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif.
Safe-area insets on top, bottom, left (for landscape notch).
Tap targets ≥ 48×48px.
Spacing 12px between interactive elements.
touch-action: manipulation to kill 300ms tap delay.
prefers-reduced-motion: reduce disables View Transitions.
Dark mode auto via prefers-color-scheme.
Haptic feedback on pull-to-refresh, voice-mic press, save-toast: navigator.vibrate(15).

Advanced web platform features (progressive enhancement)

Feature	Use
View Transitions API	SERP ↔ result page, filter tab switches
Speculation Rules API	Prerender top 3 organic links
Navigation API	Back/forward feels instant, no white flash
`navigator.share()`	Native share sheet for results
`SpeechRecognition` (id-ID)	Voice input, real-time transcript
`SpeechSynthesis` (id-ID)	Read result snippets aloud (accessibility + voice answers)
Web Push API	Optional: notify when a saved query has new results
Service Worker	Network-first for `/api/*`, cache-first for static, offline fallback page, saved-results offline access
Web App Manifest	`display: standalone`, theme color, 192/512 icons, share-target API so users can share TO pranala
Web Share Target API	Pranala becomes a destination in Android share sheets — share-to-search any URL/text
Background Sync	Saved queries refresh in background
`content-visibility: auto`	Off-screen result cards skip layout/paint
`scroll-snap`	Horizontal carousels and image gallery
`@view-transition` CSS	Page-level navigation transitions
CSS container queries	Result card adapts to grid/list density
CSS `:has()`	Style header based on whether search has focus, etc.
103 Early Hints	Preload critical CSS + autocomplete trie before HTML response
Brotli	Auto on Cloudflare
AVIF + WebP with `<picture>`	Image search thumbs

PWA installability

Manifest at /manifest.webmanifest.
After 2nd visit: bottom-sheet promo to "Pasang Pranala" (Add to Home Screen).
Standalone display: hide URL bar, full-bleed brand color.
Share Target: register pranala as receiver of text/plain and text/uri-list — share any link from Chrome/WA → pranala opens with that URL pre-loaded as a "more like this" semantic search.

Accessibility (WCAG 2.1 AA)

Visible focus rings via :focus-visible.
Skip-to-content link as first focusable element.
ARIA live region announces "X hasil ditemukan" after each search.
Screen reader labels in Indonesian (aria-label="Tombol mikrofon untuk pencarian suara").
Contrast ratios audited in CI.
Text scales to 200% without horizontal scroll.

Performance budget (enforced in CI)

LCP < 2.0s on 3G Indonesia (target, not 2.5s).
INP < 100ms (interaction).
CLS < 0.05.
JS bundle < 50KB gzipped on critical path; rest lazy-loaded.
Autocomplete p95 < 100ms edge.
Lighthouse Mobile ≥ 95 across all four scores.

Build Order

Worker scaffold + wrangler.toml (multi-env, multi-D1 binding)
D1 schema migrations: ctrl + 10 idx shards (sharding helper module)
Trusted-seed loader + crawl_queue seed
HostThrottle Durable Object
Crawler queue consumer (pranala-org-crawler)
HTML parser + outlink extractor
Indexer queue consumer + Vectorize embedder
Ranker cron multiplexer + R2 graph shard writer
PageRank/TrustRank/EntityRank shard processors
SERP route + KV cache + flio ad slot
Autocomplete pipeline — trie builder cron, BK-tree typo, semantic Vectorize fallback, voice input, trending zero-state
Mobile UI shell — sticky header, bottom nav, skeleton loaders, View Transitions, Speculation Rules, scroll-snap filter tabs, swipe-saveable result cards, long-press bottom sheets
PWA layer — manifest, Service Worker (network-first API, cache-first static), Share Target receiver, Web Push opt-in
Indonesian-aware tokenizer — Sastrawi-derived stemmer + city aliases + auto-typo map (Llama weekly cron)
Owner dashboard (DNS-TXT auto-verify)
Entity badge auto-verifier (5 registry adapters)
Premium tier Xendit webhook + crawl_priority
AI report generator (Llama 3.3 70B → R2 markdown)
API key issuance + RateLimitDO
Spam vectorize index + cron classifier
DMCA intake + AI classifier + email-routing worker
automation-guard.ts lint suite + CI gate
CI perf budget gate (Lighthouse mobile ≥ 95, autocomplete p95 < 100ms)

Automation acceptance gate (must pass before launch)

grep -rn "Approve\|Reject\|reviewer_id\|manual_review" src/ → 0 hits
All POST/PUT/DELETE routes traced to cron|queue|webhook|verified-self-service (automated trace test)
ai_decisions table has rows for last 24h covering ≥ 95% of state changes
No human Slack/email alert says "click to approve"
DMCA flow: only legal@pranala.org mailbox is human; counter-notice replies are HMAC-signed tokens
Lint rule blocks env.AI.run direct calls outside aiDecide() wrapper
Admin dashboard renders read-only — <form> count = 0 except /dmca public intake
Autocomplete: no autocomplete_blocklist table; suppression is vector-classifier only
Trending suggestions sourced from Analytics Engine rollup, not human picks
Mobile Lighthouse ≥ 95 across all four categories (CI gate)
Autocomplete p95 < 100ms from edge (synthetic check in CI)
PWA installable check passes (manifest + SW + HTTPS)
Share Target API registered (verified by intent simulation)
No hamburger menus anywhere — bottom nav only
Voice search works in Chrome Android + Safari iOS (lang=id-ID)

⚙ HARD CONSTRAINTS (enforced for all sites)

This domain MUST operate within these constraints — no exceptions:

100% Cloudflare serverless — Workers + D1 + R2 + KV + Workers AI + Vectorize. NEVER PM2, NEVER VPS, NEVER Docker in production path.
100% AI-automated — every customer interaction, every moderation decision, every transaction reconcile = AI. No manual queue, no live human chat support, no physical fulfillment.
1-operator solo — one person can run the entire operation from a phone. No team meetings, no shared inbox, no shift rotation.
WhatsApp AI bot for all support (24/7, instant response, no SLA promises that need humans).
Mayar QRIS for all Indonesian payments (subscription auto-renew, no manual invoicing).
Indonesian UI primary — bahasa-first, English fallback only where unavoidable.
Privacy — opt-in only, delete-on-request honored within 24h (cron-driven).
No physical goods, no inventory — digital products + affiliate referrals only.

If the plan above describes any flow that violates these constraints, treat the plan as ASPIRATIONAL only and rework before building. The constraint trifecta wins.

AI ASSISTANT

Ask AI to research, improve, or generate content.

Try: "Research competitors for this niche"

Actions