Platform Architecture

How RLVNCE turns web sources into queryable, monitorable corpora. Seven services, extensible at every layer.

Service architecture

Purpose-built services, each owning a specific domain

Client / Agent
    |
API Gateway (routing, auth, rate limiting)
    |
    +-- Control Plane ---------- Corpora, crawls, changes, usage
    |       |
    |       +-- Connector Apps -- rlvnce-web-crawler, custom, ...
    |       +-- Index Nodes ----- BM25 shards (retrieval engine)
    |
    +-- Ranker Apps ------------ rlvnce-ranker, custom, ...
    |       |
    |       +-- Index Nodes ----- retrieval fan-out
    |
    +-- Dispatch --------------- Webhook subscriptions + delivery
    +-- App -------------------- Dashboard, billing, API keys
    +-- MCP Server ------------- Agent tool integration

API Gateway

Edge routing, API key authentication, per-plan rate limiting, request tracing. All public traffic enters through the gateway.

Control Plane

Coordination layer for the platform. Manages corpora, sources, crawl policies, job scheduling, document storage, change feeds, and usage metering. Serves the public REST API for these domains.

Connectors

Apps that ingest documents into corpora. The standard web connector (rlvnce-web-crawler) is an SQS-driven Python app that handles HTTP crawling, extraction, and sitemap discovery. It runs as a regular app with the same rights as any third-party connector. Custom connectors plug into the same framework.

Index Nodes

Rust-based BM25/BM25F search engine. Documents are distributed across shards, which are automatically created and rebalanced across nodes as corpora grow. Supports replicas for high availability.

Rankers

Apps that handle the search read path. The standard ranker (rlvnce-ranker) requests a query plan from CP, fans out to index nodes, applies BM25F ranking with attribute boosts and recency decay. It runs as a regular app - custom rankers use the same framework and have the same capabilities.

Dispatch

Webhook subscription management, event matching, and outbound delivery. Each delivery is HMAC-SHA256 signed. Failed deliveries are retried with exponential backoff.

App

Dashboard UI, user accounts, organizations, API key management, plan billing (Stripe), and the apps catalog for connectors and rankers.

The corpus lifecycle

From source definition to queryable index

Define

Create a corpus with seed URLs and allowed domains. Configure crawl policies (depth, rate, schedule) and search policies (field weights, boosts, filters).

Crawl

Stateless crawler workers pull tasks from a queue, fetch pages with conditional GET, discover sitemaps automatically, and report extracted documents. Scheduled recrawls run on configurable intervals.

Extract

Readability-style content extraction: title, H1-H3 headings, body text, meta description, language detection, published date. URL canonicalization, SHA-256 content hashing, and raw HTML snapshots.

Index

Documents are stored and pushed to index node shards for BM25F full-text search. Shards are created automatically (fill-then-spill) and rebalanced across nodes. Custom attributes are indexed for filtering and boosting.

Query

The ranker requests a query plan (shard locations + search policy), fans out to index nodes in parallel, applies ranking (field weights, attribute boosts, recency decay), and assembles the final result set.

Monitor

Every document create/update/delete emits a change event. Cursor-based change feed API for polling. Webhook subscriptions for push-based notifications, signed with HMAC-SHA256.

Extensible by design

Every corpus has a connector and a ranker - both are swappable apps

RLVNCE's standard connector (rlvnce-web-crawler) and standard ranker (rlvnce-ranker) are regular apps that run within the same framework as any third-party or user-built app. They have no special privileges - the platform treats all apps equally. This means custom apps have the same capabilities, APIs, and scaling properties as the standard ones.

Connectors

A connector defines how documents get into a corpus - it discovers, fetches, parses, and submits documents. Each connector declares its config schema (user-configurable settings) and attribute schema (what metadata it produces) via a manifest. Swap in a different connector to index a different type of source - APIs, databases, SaaS tools. The Apps SDK for building custom connectors is coming soon.

Rankers

A ranker defines how documents come out of a corpus - it takes a query, retrieves candidates from index shards, applies scoring logic, and returns ranked results. Rankers can also post-process results (aggregate, filter, transform, deduplicate). Swap in a different ranker for ML reranking or domain-specific scoring. The Apps SDK for building custom rankers is coming soon.

Apps catalog

All connectors and rankers - including RLVNCE's own - are apps with versioned manifests, config schemas, and attribute declarations. Deploy privately for your organization or publish to the marketplace. The platform validates compatibility between connector attributes and ranker requirements at install time.

Infrastructure

Built for reliability and automatic scaling

Automatic shard management

Index shards are created, replicated, and rebalanced across nodes automatically as corpora grow. No manual capacity planning.

Crawl scheduling

Background scheduler evaluates corpus crawl policies and triggers recrawls on configurable intervals (hourly, daily, weekly, or custom).

Fault tolerance

Crawler tasks retry automatically on failure via queue visibility timeouts. Index node replicas serve reads if a primary is unavailable. Draining nodes migrate shards before shutdown.

Usage metering

Per-tenant, per-corpus metering of crawl pages, queries, document fetches, and storage. Race-safe atomic counters. Plan limits enforced in real time.

Data isolation

Tenant-scoped at every layer

Every organization's data is isolated at the application and database level. Corpora, documents, API keys, and usage data are scoped to your organization. Every API request is authenticated via the gateway and scoped to the authenticated tenant - no cross-tenant data access.

Published corpora in the catalog are accessible read-only to subscribers. Your corpus configuration, crawl policies, and source lists remain private. Subscription access is checked on every query - subscribers can search and read documents but cannot modify the corpus.

Start building on RLVNCE

Create your first corpus and start querying in minutes.

Get started