Platform Architecture
How RLVNCE turns web sources into queryable, monitorable corpora. Seven services, extensible at every layer.
Service architecture
Purpose-built services, each owning a specific domain
Client / Agent
|
API Gateway (routing, auth, rate limiting)
|
+-- Control Plane ---------- Corpora, crawls, changes, usage
| |
| +-- Connector Apps -- rlvnce-web-crawler, custom, ...
| +-- Index Nodes ----- BM25 shards (retrieval engine)
|
+-- Ranker Apps ------------ rlvnce-ranker, custom, ...
| |
| +-- Index Nodes ----- retrieval fan-out
|
+-- Dispatch --------------- Webhook subscriptions + delivery
+-- App -------------------- Dashboard, billing, API keys
+-- MCP Server ------------- Agent tool integrationAPI Gateway
Edge routing, API key authentication, per-plan rate limiting, request tracing. All public traffic enters through the gateway.
Control Plane
Coordination layer for the platform. Manages corpora, sources, crawl policies, job scheduling, document storage, change feeds, and usage metering. Serves the public REST API for these domains.
Connectors
Apps that ingest documents into corpora. The standard web connector (rlvnce-web-crawler) is an SQS-driven Python app that handles HTTP crawling, extraction, and sitemap discovery. It runs as a regular app with the same rights as any third-party connector. Custom connectors plug into the same framework.
Index Nodes
Rust-based BM25/BM25F search engine. Documents are distributed across shards, which are automatically created and rebalanced across nodes as corpora grow. Supports replicas for high availability.
Rankers
Apps that handle the search read path. The standard ranker (rlvnce-ranker) requests a query plan from CP, fans out to index nodes, applies BM25F ranking with attribute boosts and recency decay. It runs as a regular app - custom rankers use the same framework and have the same capabilities.
Dispatch
Webhook subscription management, event matching, and outbound delivery. Each delivery is HMAC-SHA256 signed. Failed deliveries are retried with exponential backoff.
App
Dashboard UI, user accounts, organizations, API key management, plan billing (Stripe), and the apps catalog for connectors and rankers.
The corpus lifecycle
From source definition to queryable index
Define
Create a corpus with seed URLs and allowed domains. Configure crawl policies (depth, rate, schedule) and search policies (field weights, boosts, filters).
Crawl
Stateless crawler workers pull tasks from a queue, fetch pages with conditional GET, discover sitemaps automatically, and report extracted documents. Scheduled recrawls run on configurable intervals.
Extract
Readability-style content extraction: title, H1-H3 headings, body text, meta description, language detection, published date. URL canonicalization, SHA-256 content hashing, and raw HTML snapshots.
Index
Documents are stored and pushed to index node shards for BM25F full-text search. Shards are created automatically (fill-then-spill) and rebalanced across nodes. Custom attributes are indexed for filtering and boosting.
Query
The ranker requests a query plan (shard locations + search policy), fans out to index nodes in parallel, applies ranking (field weights, attribute boosts, recency decay), and assembles the final result set.
Monitor
Every document create/update/delete emits a change event. Cursor-based change feed API for polling. Webhook subscriptions for push-based notifications, signed with HMAC-SHA256.
Extensible by design
Every corpus has a connector and a ranker - both are swappable apps
RLVNCE's standard connector (rlvnce-web-crawler) and standard ranker (rlvnce-ranker) are regular apps that run within the same framework as any third-party or user-built app. They have no special privileges - the platform treats all apps equally. This means custom apps have the same capabilities, APIs, and scaling properties as the standard ones.
Connectors
A connector defines how documents get into a corpus - it discovers, fetches, parses, and submits documents. Each connector declares its config schema (user-configurable settings) and attribute schema (what metadata it produces) via a manifest. Swap in a different connector to index a different type of source - APIs, databases, SaaS tools. The Apps SDK for building custom connectors is coming soon.
Rankers
A ranker defines how documents come out of a corpus - it takes a query, retrieves candidates from index shards, applies scoring logic, and returns ranked results. Rankers can also post-process results (aggregate, filter, transform, deduplicate). Swap in a different ranker for ML reranking or domain-specific scoring. The Apps SDK for building custom rankers is coming soon.
Apps catalog
All connectors and rankers - including RLVNCE's own - are apps with versioned manifests, config schemas, and attribute declarations. Deploy privately for your organization or publish to the marketplace. The platform validates compatibility between connector attributes and ranker requirements at install time.
Infrastructure
Built for reliability and automatic scaling
Automatic shard management
Index shards are created, replicated, and rebalanced across nodes automatically as corpora grow. No manual capacity planning.
Crawl scheduling
Background scheduler evaluates corpus crawl policies and triggers recrawls on configurable intervals (hourly, daily, weekly, or custom).
Fault tolerance
Crawler tasks retry automatically on failure via queue visibility timeouts. Index node replicas serve reads if a primary is unavailable. Draining nodes migrate shards before shutdown.
Usage metering
Per-tenant, per-corpus metering of crawl pages, queries, document fetches, and storage. Race-safe atomic counters. Plan limits enforced in real time.
Data isolation
Tenant-scoped at every layer
Every organization's data is isolated at the application and database level. Corpora, documents, API keys, and usage data are scoped to your organization. Every API request is authenticated via the gateway and scoped to the authenticated tenant - no cross-tenant data access.
Published corpora in the catalog are accessible read-only to subscribers. Your corpus configuration, crawl policies, and source lists remain private. Subscription access is checked on every query - subscribers can search and read documents but cannot modify the corpus.