Positioning: Mesin pencari web Indonesia yang transparan, lambat tapi jujur. Tagline: Google ranks the web. Pranala ranks Indonesian trust.
Two hard constraints, equal weight:
These rules are enforced in code via src/lib/automation-guard.ts. Any PR that violates them must fail CI.
| # | Rule | Enforcement |
|---|---|---|
| A1 | No admin UI may have an "Approve" or "Reject" button for content, listings, ads, badges, payouts, or rank decisions. | Lint rule: forbid <button>Approve</button> patterns; admin UI is read-only dashboards + AI-action audit log. |
| A2 | No D1 column may be named manual_review, pending_approval, reviewer_id, or equivalent. |
Schema lint at migration time. |
| A3 | No worker route may require an admin JWT to write production data. All writes are AI-driven via Queue consumer or Cron. |
Route-level test: every POST/PUT/DELETE traces back to cron, queue, webhook, or verified-self-service. |
| A4 | No Slack/email notification may have "approve here" CTAs. Notifications are post-hoc reports only. | Template lint. |
| A5 | Every AI decision must write (decision, score, model, prompt_hash, ts) to ai_decisions D1 table for audit. |
Wrapper function aiDecide() is the only callable; ESLint forbids direct env.AI.run(). |
| A6 | DMCA counter-notice handling is the ONLY allowed human surface, and is bounded to a single inbox processed by external counsel weekly — never inside the worker. | Single legal@pranala.org mailbox; nothing else routes to a human. |
If a rule needs to break, the PR description must include AUTOMATION-EXCEPTION: <ticket> and the exception is permanent technical debt visible on /admin/debt.
Pranala lives entirely on Cloudflare Free tier until revenue forces an upgrade. Slow, honest, cheap. No surprises.
| Resource | Free quota | Pranala budget |
|---|---|---|
| Workers requests | 100K/day | ≤ 80K/day for SERP + AC + API combined |
| Workers CPU (request) | 10ms | All hot paths must finish < 10ms |
| Workers CPU (cron) | 30s | Crawl/index/rank budget lives here |
| D1 storage | 5GB total, 1GB/DB | One DB only, ≤ 800MB metadata, HTML in R2 |
| D1 reads | 5M/day | SERP cached → ≤ 1M reads/day |
| D1 writes | 100K/day | Crawl ≤ 10K URLs/day → ≤ 50K writes |
| KV reads | 100K/day | AC trie cached in module scope |
| KV writes | 1K/day | Trie rebuilt weekly, not daily |
| KV storage | 1GB | Trie + config only, never per-page |
| R2 storage | 10GB | Gzipped HTML (~5KB avg) → ≤ 2M pages |
| R2 Class A ops | 1M/mo | ≤ 33K writes/day |
| R2 Class B ops | 10M/mo | Reads cheap, fine |
| Workers AI | 10K Neurons/day | Cron-only, ≤ 100 LLM calls/day |
| Vectorize stored dims | 5M/mo | 1024d × 5K vectors max |
| Vectorize query dims | 30M/mo | ≤ 30K queries/mo (cron AC fallback only) |
| Analytics Engine | free w/ caps | Search log + clickthroughs only |
| Cron triggers | 1K invocations/day on free | One * * * * * multiplexer = 1440/day → use */2 * * * * (720/day) |
task_queue row + claimed_at timestamp + WHERE state='pending' LIMIT 50)host_state(host, last_fetch_at, crawl_delay_s), cron checks WHERE last_fetch_at < datetime('now', '-' || crawl_delay_s || ' seconds'))| Stage | URLs indexed | Monthly traffic | Plan | Why upgrade |
|---|---|---|---|---|
| α (now) | 0 → 100K | < 50K req/day | Free | Bootstrap |
| β | 100K → 500K | 50–200K req/day | Workers Paid $5/mo | Need 30s CPU on requests for autocomplete semantic + Queues |
| γ | 500K → 5M | 200K–2M req/day | + R2 paid + D1 paid | HTML storage exceeds 10GB |
| δ | 5M+ | 2M+ req/day | + Workers AI paid | More Llama capacity |
Until pranala makes ≥ Rp 75K/mo from flio ad revenue (covers $5 Workers Paid), it stays on Free. Premium subscriptions self-fund the next upgrade tier. No cash burn.
pranala-org → custom domain pranala.org (handles routes + cron in one Worker to stay simple and within request budget)pranala_db (single DB ≤ 1GB; sharding deferred to stage γ)pranala-html (gzipped HTML only, hash-keyed)PRANALA_KV (AC trie, config, parsed robots.txt — single namespace)pranala-ac-v1 (≤ 5K query embeddings, used in cron only) — added at stage β@cf/baai/bge-m3, @cf/meta/llama-3.3-70b-instruct-fp8-fast, @cf/meta/m2m100-1.2bpranala_events*/2 * * * * single multiplexer that dispatches by minute-modulo (crawl, index-roll-up, AC trie rebuild, etc.)task_queue pollinghost_state row throttlingLingkup yang dikirim:
urls, documents, links, host_state, task_queue, submissions, ai_decisions, search_log.*/2 * * * * → minute-modulo dispatcher: crawl 5 URL/tick, index 10 dokumen/tick, AC trie rebuild jam 03:00 WIB./api/ac — trie + typo (BK-tree) saja di Fase 1; semantic ditunda ke Fase 2 (Vectorize)./api/submit — pengirim domain/sitemap publik, hasilnya di-enqueue di task_queue.Mozilla/5.0 (compatible; pranala-bot/1.0; +https://pranala.org/bot)) → simpan HTML gzip ke R2 → ekstrak title/meta/outlink → tulis ke D1 dengan throttle host./dmca (form publik) → dmca_intake D1./bot (info crawler) — disebutkan di User-Agent./transparansi — formula peringkat publik, daftar trusted seeds, statistik indeks live.Yang DITUNDA ke Fase 2+ (butuh Workers Paid):
Bahasa: 100% Indonesia. Tidak ada teks Inggris di UI publik. Slogan: "Mesin pencari web Indonesia yang transparan, lambat tapi jujur."
Flow: Cron (1m) → drain N URLs from crawl_queue D1 table → enqueue to pranala-fetch → consumer fetches with Mozilla/5.0 (compatible; pranala-bot/1.0; +https://pranala.org/bot) → store gzipped HTML in R2 (html/{sha256}.gz) → metadata + outlinks to D1 shard → enqueue outlinks back into pranala-discover → discoverer dedupes against seen_urls (D1) → re-enqueues novel ones.
Politeness: HostThrottle DO per host, 1 req/sec default, honors Crawl-delay from robots.txt cached 24h in pranala-robots KV. No per-host concurrency.
Automation contract:
(rank_score, last_change_detected, content_hash_age). No manual "force recrawl" button anywhere.CF limit fit: consumer batch ≤ 100 URLs × 3 subreq each = 300 < 1000 cap. CPU < 30s. R2 PUT < 1000/invocation.
Tiers (auto-billed via Xendit):
Webhook → activation:
/webhook/xendit verifies x-callback-token.subscriptions, UPDATE site tier, increment crawl_priority integer.crawl_priority DESC first.Automation contract: No human ever sees a payment row. Refund flow: Xendit chargeback webhook auto-disables tier and writes ai_decisions row.
Rank = 0.30·LinkRank + 0.20·TrustRank + 0.15·EntityRank
+ 0.10·Freshness + 0.10·IndoRelevance + 0.10·QueryRel
+ 0.05·Engagement − SpamPenalty
LinkRank (PageRank at 10M scale on Workers):
graph/shard-{0..255}.jsonl), 1M edges/shard.rank_partial → atomic merge.TrustRank: seeds in pranala-config KV trusted_seeds = hardcoded go.id, *.go.id, ojk.go.id, bpom.go.id, *.ac.id, kompas.com, tempo.co, detik.com (curated once at launch, never edited by humans — changes go through a seed_changes.sql migration that requires AUTOMATION-EXCEPTION if added post-launch).
EntityRank: auto-derived from registry verification (see §7).
Freshness: content hash diff timestamp from R2 html/ versioned objects.
IndoRelevance: m2m100 language detect → if id weight = 1.0, if en and .id ccTLD = 0.6, else 0.0. Geo IP of origin server adds bonus.
QueryRel: Vectorize cosine + BM25 over title/h1/anchor.
Engagement: Analytics Engine rollup (CTR, dwell, pogo-stick rate) into D1 engagement_daily.
SpamPenalty: Vectorize cosine to pranala-spam-v1. Threshold ≥ 0.85 = full deindex; 0.70–0.85 = −0.5 rank; < 0.70 = clean. Threshold values in code, not in admin UI.
Automation contract: Weights are constants in src/ranker/weights.ts. Changing them requires a code commit + canary diff report (auto-generated). No runtime knobs.
One-time D1 seed at launch: ~5K trusted Indonesian domains. Crawler walks outward. No human re-seeds; seed_v2 would be a code commit.
sha256(query + lang + region) → KV 60s.is_sponsored flag rendered as yellow "Iklan" pill above and below the organic block. Ranker MUST NOT see ad bid as a feature (lint guards bid access in ranker module).<script type="speculationrules">). Result tap feels instant.TransformStream — first 3 results paint < 200ms, rest stream in.Endpoint: GET /api/ac?q=<prefix>&lang=id returns JSON {suggestions: [{text, type, score}]} in ≤ 100ms p95 from edge.
Three-stage suggestion pipeline (all AI, zero human-curated lists):
pranala-cache KV under ac:trie:v{N}. Worker loads once per cold start, pinned in module scope. Returns top-10 prefix matches in <5ms.@cf/baai/bge-m3 and fan-out Vectorize query against pranala-content-v1 titles. Returns conceptually-related queries even with novel phrasing. ~60ms. Cached per-prefix in KV 5min.Indonesian-aware tokenization:
ber-, me-, pe-, -kan, -an, -i affixes).jkt → jakarta, sby → surabaya, bdg → bandung — table loaded from KV pranala-config:city_aliases (auto-built once from Wikipedia geo data, never hand-edited).gigi → gigit, klinikgigi → klinik gigi (segmentation), apotik → apotek — auto-generated from query log misspellings via Llama 3.3 weekly cron.Personalization (privacy-respecting):
localStorage only — never sent to server, never stored in D1.Voice autocomplete:
SpeechRecognition API with lang="id-ID" and interimResults=true./api/ac on each pause.navigator.vibrate(20) haptic on press.Trending suggestions (zero-state, when input is empty):
ac:trending:v{N}.Automation contract: there is no autocomplete_blocklist table editable by humans. Suppression is purely vector-similarity and abuse-classifier driven, decisions written to ai_decisions.
CF limit fit: trie payload < 2MB (within 25MB KV value cap, well under cold-start budget). BK-tree similar. Vectorize query < 50ms p95. KV read 1ms.
<div data-flio-key="unit_pranala_serp_top" data-flio-mode="native">.Domain ownership verification: DNS TXT record pranala-verify=<token> OR /.well-known/pranala-{token} file fetch. Worker fetches and verifies. Auto-grants ownership.
Entity badge — AI-only verification chain:
pajak.go.id public NPWP validator endpoint.ahu.go.id company name search → fuzzy match.ai_decisions. Any fail? → auto-deny + write reason. Owner sees AI-generated explanation, can re-submit after fixing — but no human ever reviews.Automation contract: there is no verifications.reviewer_id column.
RateLimitDO enforces per-key QPS and monthly quota.Googlebot UA vs pranala-bot UA, diff > 30% by tokens → auto-flag.Indonesian UU ITE / Permenkominfo 5/2020 requires a reachable contact for takedown counter-notices. This is the ONLY human-readable surface.
/dmca posts to dmca_intake D1 table.dmca_counter. The dmca_counter rows are emailed weekly to legal@pranala.org (external counsel) — no in-app review UI exists.This is the maximum tolerated human contact: 1 mailbox, 1 weekly digest, decisions returned via signed token. Everything else is forbidden by Constitution rule A6.
| CF limit | Worst-case load | Mitigation |
|---|---|---|
| Worker CPU 30s req | SERP fan-out ≤ 200ms | Vectorize + KV cache 60s |
| Worker CPU 5min cron | PageRank shard tick | 1 shard / tick, 256 shards |
| Subrequests 1000 | Crawler batch | ≤ 100 URLs × 3 subreq |
| D1 10GB/DB | 10M URLs metadata only | Sharded 10 DBs by URL hash |
| D1 ~100 SQL bind vars | Bulk inserts | Chunk to 80 |
| D1 30s query | Joins | Pre-denormalized hot tables |
| KV 1 write/sec/key | Counters | Counters live in DOs, never KV |
| KV 25MB value | HTML | HTML never in KV; goes to R2 |
| Queues 100 msg/batch | Crawl fanout | Re-enqueue, not recursion |
| Cron 250/account | Multiple schedulers | Single * * * * * multiplexer + DO routing |
| Workers AI rate-limit/model | Reports | Queue consumer, never inline |
| Vectorize 5M vectors/index | 10M docs | 2 named indexes by year shard |
| R2 list ops cost | URL discovery | Never list; index in D1 |
Pranala SERP must feel like a native iOS/Android app, not a desktop search page. All standards from /home/ucok/CLAUDE.md apply, plus search-specific patterns below.
IntersectionObserver.Cari | Trending | Riwayat | Tersimpan | Akun. Active state = filled icon + label.touchstart/touchmove + CSS transform.Hapus / Bagikan / Cari di tab baru.inputmode="search" shows the right keyboard with a "Cari" key.Semua · Web · Gambar · Berita · Tanya Jawab · Maps · Toko · Tokoh.scroll-snap-type: x mandatory).feedback_horizontal_slide rule. Pure CSS detection via :has(.snap-overflow).Buka / Buka di tab baru / Bagikan (uses navigator.share()) / Salin tautan / Tidak relevan (writes negative-feedback row → ranker training signal).Tersimpan (offline-readable via Service Worker cache).column-count (1 col mobile, 2 tablet, 3 desktop).touch-action: pinch-zoom).navigator.geolocation with enableHighAccuracy: false (battery-friendly).-apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif.touch-action: manipulation to kill 300ms tap delay.prefers-reduced-motion: reduce disables View Transitions.prefers-color-scheme.navigator.vibrate(15).| Feature | Use |
|---|---|
| View Transitions API | SERP ↔ result page, filter tab switches |
| Speculation Rules API | Prerender top 3 organic links |
| Navigation API | Back/forward feels instant, no white flash |
navigator.share() |
Native share sheet for results |
SpeechRecognition (id-ID) |
Voice input, real-time transcript |
SpeechSynthesis (id-ID) |
Read result snippets aloud (accessibility + voice answers) |
| Web Push API | Optional: notify when a saved query has new results |
| Service Worker | Network-first for /api/*, cache-first for static, offline fallback page, saved-results offline access |
| Web App Manifest | display: standalone, theme color, 192/512 icons, share-target API so users can share TO pranala |
| Web Share Target API | Pranala becomes a destination in Android share sheets — share-to-search any URL/text |
| Background Sync | Saved queries refresh in background |
content-visibility: auto |
Off-screen result cards skip layout/paint |
scroll-snap |
Horizontal carousels and image gallery |
@view-transition CSS |
Page-level navigation transitions |
| CSS container queries | Result card adapts to grid/list density |
CSS :has() |
Style header based on whether search has focus, etc. |
| 103 Early Hints | Preload critical CSS + autocomplete trie before HTML response |
| Brotli | Auto on Cloudflare |
AVIF + WebP with <picture> |
Image search thumbs |
/manifest.webmanifest.text/plain and text/uri-list — share any link from Chrome/WA → pranala opens with that URL pre-loaded as a "more like this" semantic search.:focus-visible.aria-label="Tombol mikrofon untuk pencarian suara").HostThrottle Durable Objectpranala-org-crawler)automation-guard.ts lint suite + CI gategrep -rn "Approve\|Reject\|reviewer_id\|manual_review" src/ → 0 hitscron|queue|webhook|verified-self-service (automated trace test)ai_decisions table has rows for last 24h covering ≥ 95% of state changeslegal@pranala.org mailbox is human; counter-notice replies are HMAC-signed tokensenv.AI.run direct calls outside aiDecide() wrapper<form> count = 0 except /dmca public intakeautocomplete_blocklist table; suppression is vector-classifier onlyThis domain MUST operate within these constraints — no exceptions:
If the plan above describes any flow that violates these constraints, treat the plan as ASPIRATIONAL only and rework before building. The constraint trifecta wins.
Ask AI to research, improve, or generate content.
Try: "Research competitors for this niche"