IDX Market Data Platform

Real-time Indonesian stock market data pipeline — from exchange feed to client dashboard

Parse Rate

3.1M/s

messages per second

Tickers

900+

IDX listed stocks

Latency

<50ms

feed-to-DB p50

Components

Docker services

Uptime

99.9%

during market hours

What This Platform Does

Receives real-time ITCH binary feed from the Indonesia Stock Exchange (IDX), parses it at wire speed, stores ticks and aggregated bars in hot storage (QuestDB), drains historical data to cold storage (ClickHouse), computes proprietary metrics (HD, RSI), and serves everything via a Go/Fiber API to web clients and an AmiBroker plugin.

Tech Stack

Rust Parser

Zero-alloc ITCH decoder. 3.1M msg/s, <0.3us per message.

Redpanda

Kafka-compatible message bus. Single binary, 6 topics.

QuestDB

Hot time-series store. ILP ingestion, SAMPLE BY aggregation.

ClickHouse

Cold analytical store. 40:1 compression, unlimited history.

Redis

Sub-ms cache. Sessions, rate limits, WS fan-out.

PostgreSQL

Auth DB. Users, keys, audit log, partitioned tables.

Go + Fiber

Single binary API. REST + WS + SSE + Admin + Portal.

TradingView

Professional charting via UDF datafeed protocol.

Architecture

Single binary Go API serving three domains — Admin, Portal, Data API

System Data Flow

Exchange

ITCH Feed

→

Parser

Rust

→

Bus

Redpanda

→

Hot Store

QuestDB

→

API

Go/Fiber

→

Clients

TradingView

API Server — Three Domains

Admin Panel

/admin/*

Session Auth + CSRF
Manage users, keys, plugins
HD config, tier config, audit

User Portal

/portal/*

Session Auth + CSRF
Own keys, machines, usage
Subscription management

Data API

/v1/*

JWT Auth (RS256)
Rate limited, tier enforced
REST + WS + SSE

All three domains run in a single Go binary on port :18090. Shared connection pools for PostgreSQL, Redis, QuestDB, and ClickHouse.

Data Pipeline

Zero-loss path from exchange feed to client-facing API

Exchange Feed

RabbitMQ AMQP from IDX feed provider. Prefetch=500, <1ms local network latency.

Rust Parser (zero-alloc)

Binary ITCH decode at 3.1M msg/s. Handles Trade, Orderbook, Snapshot, Index messages.

QuestDB (ILP TCP) Redpanda (async)

OHLCV Aggregator (Rust)

Consumes idx.ticks, builds 1-minute OHLCV bars. Filters board=RG only.

QuestDB idx_ohlcv Redpanda idx.ohlcv Redis PUBLISH

Metric Worker (Rust)

Consumes idx.ohlcv, computes HD (7-step pipeline) + RSI-14 per 1m bar.

QuestDB metrics_hd Redis cache (sub-ms) ClickHouse archive

Go API Server (:18090)

QuestDB for hot data (14 days), ClickHouse for historical, Redis for real-time cache + WS pub/sub.

Clients

TradingView Chart (UDF), AmiBroker Plugin (REST + WS), Admin Panel (HTMX), User Portal (HTMX).

Latency Budget

Hop	p50	p99	Notes
RabbitMQ → Parser	<1ms	<2ms	Local network, prefetch 500
Parser → QuestDB	~50ms	~150ms	ILP batch 500 msgs / 100ms
Parser → Redpanda	<5ms	<10ms	Async produce
Redpanda → Metric Worker	<5ms	<20ms	Consumer lag
Go API query	1-5ms	10-50ms	QuestDB HTTP SQL
End-to-end	<100ms	<300ms

Services & Ports

All services run as Docker containers with idxmdp- prefix

Service	Container	Port(s)	Purpose
Go API	`go-api`	:18090	REST + WS + SSE + Admin + Portal + Chart + Docs
Prometheus	(same binary)	:2112	/metrics scrape endpoint
Rust Parser	`idxmdp-parser`	-	ITCH feed → QuestDB + Redpanda
Metric Worker	`idxmdp-metric-worker`	-	HD + RSI computation
QuestDB	`idxmdp-questdb`	:19000 (HTTP) :19009 (PG)	Hot storage (7 tables)
ClickHouse	`idxmdp-clickhouse`	:28123 (HTTP) :29001 (native)	Cold storage (historical)
Redpanda	`idxmdp-redpanda`	:29092 :28082 :29644	Message bus (6 topics)
Redis	`idxmdp-redis`	:26379	Cache, sessions, pub/sub
PostgreSQL	`idxmdp-postgres`	:25432	Auth DB (16 tables)
CH Drain	`idxmdp-ch-drain`	-	QuestDB → ClickHouse drain
Dashboard	`idxmdp-dashboard`	:18080	Legacy Rust monitoring
Grafana	`idxmdp-grafana`	:13000	Monitoring dashboards (2 pre-provisioned)
Prometheus	`idxmdp-prometheus`	:19090	Metrics collection + alert rules
Alertmanager	`idxmdp-alertmanager`	:19093	Alert routing → Telegram

QuestDB Tables (7)

Table	Description	Key Columns
`idx_ticks`	Raw trade ticks	ticker, board, price, volume, ts
`idx_ohlcv`	1-minute OHLCV bars	ticker, open, high, low, close, volume, ts
`idx_snapshot`	Best bid/ask snapshots	ticker, bid, ask, bid_vol, ask_vol, ts
`idx_orderbook`	Order book updates	ticker, side, price, volume, ts
`idx_index`	Index values (IHSG etc)	index_code, value, ts
`idx_contracts`	Contract metadata	ticker, name, board, ts
`metrics_hd`	HD + RSI metric values	ticker, freq, hd, rsi, ts

Redpanda Topics (6)

Topic	Producer	Consumer	Partitions
`idx.ticks`	Parser	Aggregator, CH Drain	10
`idx.ohlcv`	Aggregator	Metric Worker, WS	10
`idx.orderbook`	Parser	(deferred to M4)	10
`idx.snapshot`	Parser	Snapshot cache	10
`idx.index`	Parser	Index display	10
`metrics.hd`	Metric Worker	Real-time HD	10

Credentials (Dev Environment)

All credentials below are for local development only

These credentials are for local development. Never use them in production.

Service	Host:Port	Username	Password
PostgreSQL	`localhost:25432`	`admin`	`secret`
Redis	`localhost:26379`	-	`idxmdp_redis_dev_2026`
ClickHouse	`localhost:28123`	`default`	(none)
QuestDB	`localhost:19000`	-	(none)
Redpanda	`localhost:28082`	-	(none)
Grafana	`localhost:13000`	`admin`	`admin` (change on first login)
Prometheus	`localhost:19090`	-	(none)
Alertmanager	`localhost:19093`	-	(none)

Admin Panel Access

URL	Username	Password	Role
`/auth/login`	`superadmin`	`Admin123456`	superadmin

Environment Variables

PORT=18090
DATABASE_URL=postgres://admin:secret@localhost:25432/idx_admin?sslmode=disable
REDIS_ADDR=localhost:26379
REDIS_PASSWORD=idxmdp_redis_dev_2026
QUESTDB_HOST=localhost
QUESTDB_HTTP_PORT=19000
CLICKHOUSE_DSN=http://localhost:28123
JWT_PRIVATE_KEY_PATH=secrets/jwt_private.pem
JWT_PUBLIC_KEY_PATH=secrets/jwt_public.pem
JWT_ISSUER=idx-market-data
CHART_LIB_PATH=../charting_library-master
SUPERADMIN_PASSWORD=Admin123456

Docker Compose

All containers use idxmdp- prefix on idxmdp_net network

# Start all services
docker compose up -d

# View parser logs
docker compose logs -f idxmdp-parser

# Check container status
docker ps --filter "name=idxmdp"

# Restart parser
docker compose restart idxmdp-parser

Volume Mounts

Service	Volume	Purpose
QuestDB	`./data/questdb`	Time-series data
ClickHouse	`./data/clickhouse`	Historical data
PostgreSQL	`./data/postgres`	Auth database
Redis	`./data/redis`	AOF persistence
Redpanda	`./data/redpanda`	Kafka log segments

Storage & Retention

Hot/cold storage strategy with automatic data lifecycle

Store	Data	Retention	Strategy
QuestDB	Ticks, Snapshots	3 days	Partition TTL cron (DROP PARTITION)
QuestDB	OHLCV, metrics_hd	14 days	Partition TTL cron
ClickHouse	OHLCV + Ticks	Unlimited	Columnar compression (~40:1)
Redis	Metric cache	24h TTL	Key expiry
PostgreSQL	Audit log	6 months	Monthly partitions, auto-detach

The idxmdp-ch-drain container periodically reads completed QuestDB partitions and inserts them into ClickHouse, typically running at 16:30 WIB after market close.

Public Endpoints

No authentication required

Method	Path	Description
GET	`/health`	Basic liveness → `{"ok":true}`
GET	`/ready`	Deep check: Redis, PG, QuestDB, ClickHouse
POST	`/v1/auth/token`	API key → JWT (15 min TTL)
POST	`/v1/auth/refresh`	Refresh expiring JWT

Token Exchange Example

// Request
POST /v1/auth/token
Content-Type: application/json

{ "api_key": "idx_live_a1b2c3d4..." }

// Response
{
  "ok": true,
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIs...",
    "expires_in": 900,
    "tier": "pro"
  }
}

Response Envelope

All API responses use a standard envelope:

Success

{ "ok": true,
  "data": {...},
  "meta": { "latency_us": 142 } }

Error

{ "ok": false,
  "error": "description",
  "meta": { ... } }

Data API

JWT required — Authorization: Bearer {token}

Method	Path	Description
GET	`/v1/snapshot/:ticker`	Latest tick for a symbol
GET	`/v1/ohlcv/:ticker`	OHLCV bars (QuestDB + ClickHouse hybrid)
GET	`/v1/symbols`	All available tickers
GET	`/v1/metrics/latest`	Latest metric from Redis (sub-ms)
GET	`/v1/metrics/history`	Metric history (QuestDB/ClickHouse)
GET	`/v1/hd/chart/:ticker`	HD dashboard (enterprise only)

OHLCV Query Example

GET /v1/ohlcv/BBCA?tf=1m&from=2026-03-25&to=2026-03-26
Authorization: Bearer eyJ...

{
  "ok": true,
  "data": [{
    "ticker": "BBCA",
    "date": "2026-03-25T02:00:00Z",
    "bar_time": "09:00",
    "freq": "1m",
    "open": 8900, "high": 8925,
    "low": 8875, "close": 8925,
    "volume": 123456,
    "freq_cnt": 42,
    "hd": 0
  }]
}

Plugin Endpoints

AmiBroker DLL wire protocol — API key in body, not JWT

Method	Path	Description
POST	`/v1/plugin/activate`	Machine licensing (key + machine ID)
POST	`/v1/plugin/deactivate`	Release machine slot
POST	`/v1/plugin/heartbeat`	Keepalive + token refresh (every 10 min)
POST	`/v1/plugin/report`	DLL error/event telemetry

Plugin Lifecycle Flow

Activate

Send API key + machine_id + machine_name + version. Returns JWT + tier capabilities.

Heartbeat (every 10 min)

Send current token. Returns fresh token with new expiry.

Fetch Data

Use JWT from activate/heartbeat on GET /v1/ohlcv, /v1/symbols, /v1/snapshot.

Stream (optional)

WS /v1/stream with JWT. Subscribe to tickers for real-time 1m bars.

Deactivate

On DB unload, release machine slot. Send api_key + machine_id.

Streaming (WS/SSE)

Real-time OHLCV bars via WebSocket or Server-Sent Events

WebSocket

WS /v1/stream

Bidirectional. Subscribe/unsubscribe per ticker. JWT in header. Best for plugins.

Server-Sent Events

SSE /v1/live

Server-push only. Query params for symbols. Auto-reconnect. Best for web clients.

WebSocket Protocol

// Subscribe
{"action":"subscribe", "tickers":["BBCA","BBRI"], "freq":"1m"}

// Unsubscribe
{"action":"unsubscribe", "tickers":["BBRI"]}

// Incoming bar
{"ticker":"BBCA", "bar_time":"09:15", "open":8900,
 "high":8925, "low":8875, "close":8925,
 "volume":45000, "hd":0}

Tier Limits

Tier	WebSocket	Max Symbols
Free	Blocked	-
Pro	Allowed	50
Enterprise	Allowed	Unlimited

TradingView UDF Datafeed

UDF-compatible endpoints for the TradingView charting library

Method	Path	Description
GET	`/udf/config`	Datafeed configuration
GET	`/udf/symbols?symbol=BBCA`	Symbol resolution
GET	`/udf/search?query=BB&limit=30`	Symbol search
GET	`/udf/history?symbol=X&resolution=R&from=T&to=T`	OHLCV bars (columnar)
GET	`/udf/time`	Server Unix timestamp

Supported resolutions: 1 (1m), 5, 15, 30, 60 (1h), D (daily). Access chart at /chart?symbol=BBCA&interval=D&theme=dark

History Response Format

// Success (columnar format)
{ "s": "ok",
  "t": [1711324800, 1711411200],
  "o": [8900, 8925],
  "h": [8950, 8975],
  "l": [8875, 8900],
  "c": [8925, 8950],
  "v": [123456, 98765] }

// No data (with hint)
{ "s": "no_data", "nextTime": 1711238400 }

Authentication

Dual auth model — JWT for API, sessions for Admin/Portal

API Key Lifecycle

Create Key

Generate 32 random bytes → format as idx_live_{64 hex} → SHA-256 hash stored in DB. Full key shown once.

Exchange for JWT

Web: POST /v1/auth/token. Plugin: POST /v1/plugin/activate. Returns RS256 JWT valid for 15 minutes.

Revoke

Revoked keys fail on next token exchange. Active JWTs expire naturally (max 15 min).

JWT Claims Structure

{
  "sub": "user-uuid",
  "iss": "idx-market-data",
  "tier": "pro",
  "key_id": "key-uuid",
  "machine_id": "sha256(...)",  // plugin only
  "exp": 1711929600,
  "iat": 1711928700
}

Tier System

Single source of truth in internal/config/tier.go

Feature	Free	Pro	Enterprise
Rate Limit	60 req/min	600 req/min	Unlimited
Timeframes	Daily only	All (1m-1d)	All (1m-1d)
History Depth	90 days	730 days	Unlimited
HD Access	Stripped	Obfuscated (FNV)	Raw values
WebSocket	Blocked	50 symbols	Unlimited
Max Machines	1	2	5
Streaming	No	Yes	Yes

Tier Override Priority

Highest

API Key Override

Medium

DB Tier Config

Fallback

Hardcoded tier.go

RBAC & HD Access

Role-based access control + tier-based data filtering

Roles

Role	Scope	Access
superadmin	System	Everything — user management, HD config, tier config, imports
admin	System	View all users/keys, manage plugins, view audit
user	Self	Own keys, own machines, request upgrades, view own usage

HD (Hidden Delta) Access by Tier

Tier	HD Behavior	Implementation
Free	HD = 0 (stripped)	Middleware zeroes hd field on all bars
Pro	HD obfuscated (relative)	FNV hash seeded by client_id — relative values preserved
Enterprise	Raw HD (full access)	No modification, raw computed values

Monitoring

Real-time observability across the full platform stack

Monitoring Stack

Go API (:2112/metrics)

Exposes Prometheus-format metrics: request rates, latency histograms, error counters, rate limiter stats, audit health, plugin gauges.

Prometheus (:19090)

Scrapes Go API metrics every 15s. Evaluates alert rules (feed staleness, error rates, pool exhaustion). Stores 15-day time-series history.

Grafana datasource Alert rules

Grafana (:13000)

Visualization layer. 2 pre-provisioned dashboards with auto-configured Prometheus datasource. Login: admin / admin.

Alertmanager (:19093)

Routes alerts from Prometheus to Telegram bot. Requires TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID env vars.

Dashboards

Built-in

/ops/latency

QuestDB ages, Redis/PG/CH health, Redpanda consumer lag, parser channel buffers, pipeline performance

Built-in

/ops/ranking

Top 20 stocks by volume, value, trade frequency (auto-refresh 10s)

Grafana Dashboard

IDX API Dashboard

10 panels: request rate, latency p50/p95/p99, error rate (5xx), active WS connections, active SSE connections, rate limiter fail-open counter, audit fallback writes, stale plugin activations, request rate by status code, plugin activations.

Grafana Dashboard

IDX Platform Overview

8 panels: service health, QuestDB write rate (ILP rows/sec), ClickHouse insert rate, Redpanda consumer lag, Redis memory usage, Redis connected clients, disk usage, CPU usage.

Grafana dashboards are auto-provisioned from monitoring/grafana/dashboards/. Datasource auto-configured to Prometheus at :19090. No manual setup needed — just docker compose up -d grafana.

Access Points

Tool	URL	Auth	Purpose
Grafana	`http://localhost:13000`	`admin` / `admin`	Time-series dashboards (API + Platform)
Prometheus	`http://localhost:19090`	none	PromQL queries, alert rule status
Alertmanager	`http://localhost:19093`	none	Active alerts, silences, routing
Pipeline Monitor	/ops/latency	none	Live pipeline health (built into Go API)
Market Ranking	/ops/ranking	none	Top 20 volume/value/frequency

Prometheus Metrics Exposed

# Request metrics
http_requests_total{method, path, status}
http_request_duration_seconds{method, path}

# Rate limiting
rate_limit_hits_total
rate_limit_fail_open_total

# Audit
audit_writes_total
audit_fallback_writes_total

# Plugin
plugin_activations_active
plugin_heartbeat_failures_total

# WebSocket / SSE
ws_connections_active
sse_connections_active

Docker Compose Commands

# Start monitoring stack
docker compose -f docker-compose.dev.yml up -d grafana prometheus alertmanager

# Check status
docker ps --filter "name=idxmdp-grafana"
docker ps --filter "name=idxmdp-prometheus"

# View Grafana logs
docker logs idxmdp-grafana --tail 20

# Restart monitoring
docker compose -f docker-compose.dev.yml restart grafana prometheus

Alert Thresholds

Metric	Warning	Critical	Action
Feed → DB latency	>5s	>60s	Check parser logs, QuestDB health
Request p99	>50ms	>200ms	Check DB pool exhaustion, slow queries
rate_limit_fail_open	>0 for 5m	>0 for 15m	Redis connectivity issue
audit_fallback_writes	>0	>0 for 10m	PostgreSQL connection issue
Consumer lag (Redpanda)	>1000	>10000	Metric worker or CH drain backpressure

Alertmanager requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID env vars. Without them, Alertmanager will restart-loop. Set via .env or skip Alertmanager if Telegram bot isn't configured yet.

HD & RSI Metrics

Proprietary market indicators computed in real-time by the Rust metric worker

HD (Hidden Delta) — 7-Step Pipeline

Price Delta

delta = close - prev_close

Signed Volume

signed_vol = (delta >= 0) ? +volume : -volume — only during gate hours (09:05 – 14:50 WIB)

Cumulative Signed Volume

cum_signed = running_sum(signed_vol) — resets daily

Net Signed Volume

net_signed = cum_positive + cum_negative

MAPO

mapo = ema(net_signed, period=20) — Moving Average Price x Outstanding

DMAPO

dmapo = mapo - prev_day_mapo — frozen at 15:00 WIB

HD Value

hd = net_signed - dmapo — the final Hidden Delta indicator

Gate Times (Configurable via Admin)

Gate	Default	Description
Gate Open	09:05:00 WIB	Start accumulating signed volume
Gate Close	14:50:00 WIB	Stop accumulating signed volume
DMAPO Freeze	15:00:00 WIB	Freeze daily MAPO snapshot

Gate times are configurable via /admin/hd-config (superadmin only) with hot-reload via Redis pub/sub to the Rust worker. No restart needed.

RSI (Relative Strength Index)

// Standard RSI-14 on 1-minute close prices
gain = avg(positive_deltas, 14)
loss = avg(negative_deltas, 14)
rs   = gain / loss
rsi  = 100 - (100 / (1 + rs))

Key Decisions

Architecture and design choices with rationale

Why One Binary (not 8)?

The reference idx-data-api used 8 separate binaries. We consolidated because:

Solo operator = one process to monitor, one set of logs
Fiber handles REST + WebSocket on one port
Shared connection pools (no duplicated PG/Redis connections)
Fewer containers = less Docker complexity

Why QuestDB + ClickHouse (dual storage)?

QuestDB: Optimized for ingestion (ILP, 10M+ rows/s). SAMPLE BY for OHLCV aggregation.
ClickHouse: Optimized for analytical queries. 40:1 compression.
Drain: After 3 days, QuestDB partitions drain to ClickHouse and get dropped.

Why Redpanda (not Kafka)?

Single binary (no ZooKeeper/KRaft)
Kafka-compatible API (same client libraries)
Lower resource overhead for single-node
Built-in admin API at :28082

Why Rust Parser (not Go)?

3.1M msg/s vs ~500K in Go (6x faster)
Zero-alloc parsing (<0.3us per message)
No GC pauses during critical ingestion path
Binary ITCH protocol needs tight memory control

Why Board='RG' Filter?

IDX has boards: RG (Regular), NG (Negotiated), TN (Tunai/Cash). Only RG represents true market price discovery. NG/TN are negotiated off-market and would distort OHLCV.

Why HTMX for Admin (not React)?

Admin panel = 12 CRUD pages, not a charting app
HTMX via embed.FS = zero npm, zero build step, zero CORS
Tailwind + DaisyUI = modern UI with minimal effort
Alpine.js for lightweight interactivity

Project Stages

Scope-driven development stages with phases inside each — not time-boxed sprints

Each Stage delivers one major capability. Stages contain Phases — ordered implementation steps. The tree below shows what was built, which services were introduced, and current status.

Parser Backbone

Live ITCH feed → parsed data into hot storage

DONE ▶

P1 — Rust ITCH Parser
Zero-alloc binary decoder for 7 message types: Trade, Snapshot, Orderbook (bid/ask), Contract, Index. Benchmarked at 3.1M msg/s with <0.3µs per message. Fat LTO + native CPU (AVX2/SSE4.2).

Rust binary
P2 — Dual RabbitMQ Consumer
Two AMQP exchanges (itchdata + idxdata) on one shared channel. Prefetch=500, zero lock contention. Auto-reconnect on connection loss.

RabbitMQ (remote)
P3 — QuestDB Ingestion
6 parallel ILP writers over TCP with TCP_NODELAY. Batch 500 msgs or 100ms flush. Auto-reconnect. Writes to 6 tables.

QuestDB
P4 — OHLCV Aggregation
1-minute bar state machine using FxHashMap per symbol. Filters board=RG only (excludes NG/TN negotiated trades).
P5 — Pipeline Channels
9 crossbeam bounded channels for backpressure: consumer→parser (65K), parser→aggregator (65K), parser→writers (256–8K).
P6 — Dashboard + LogView
Built-in Rust HTTP dashboards. OHLCV charts, ranking, latency page (:18080). Live message stream, per-table stats (:18081).

What was achieved

Live IDX feed ingestion at 12K msg/s with 250x headroom
6 QuestDB tables populated: idx_ticks, idx_ohlcv, idx_snapshot, idx_orderbook, idx_index, idx_contracts
Criterion benchmarks with TSV history tracking (13 benchmark points)
Operations guide + decision log documented

Services Introduced

Rust Parser

Data Ingestion Engine

Receives binary ITCH feed from IDX exchange via RabbitMQ AMQP. Decodes 7 message types at wire speed. Produces parsed records to QuestDB (ILP TCP) and Redpanda (async Kafka). The core backbone — everything downstream depends on this.

3.1M/s

Parse Rate

Threads

<0.3µs

Per Message

QuestDB

Hot Time-Series Store

Ingestion-optimized columnar database. Receives data via ILP (Influx Line Protocol) over TCP at port 19009. Purpose-built for time-series with native SAMPLE BY for OHLCV aggregation. Stores the last 3–14 days of live data depending on table.

Tables

3-14d

Retention

:19000

HTTP Port

Metric Pipeline

Proprietary indicators (HD + RSI) computed from live OHLCV bars

DONE ▶

P1 — Redpanda Message Bus
Kafka-compatible single-binary message broker. 6 topics (idx.ticks, idx.ohlcv, idx.orderbook, idx.snapshot, idx.index, metrics.hd), 10 partitions each. Replaces need for full Kafka cluster.

Redpanda
P2 — ClickHouse Cold Storage
Columnar analytical database for unlimited history. 40:1 compression ratio. Receives drained data from QuestDB via ch-drain worker. ReplacingMergeTree for deduplication.

ClickHouseCH Drain
P3 — Redis Cache Layer
In-memory cache for sub-millisecond metric lookups. Stores latest HD/RSI per ticker with 24h TTL. Also used for WS pub/sub fan-out, sessions, and rate limiting.

Redis
P4 — HD Engine (7-Step Pipeline)
Rust MetricEngine trait. Computes Hidden Delta: price delta → signed volume → cumulative → net signed → MAPO (EMA-20) → DMAPO (daily diff) → HD value. Gate times: 09:05–14:50 WIB.
P5 — HD Accuracy Verification
100% match vs Go reference implementation across 606 tickers. Accuracy fixture with CSV comparison gate in CI.
P6 — RSI-14 Engine
Standard Wilder's RSI on 1-minute close prices. 14-period lookback. 4 unit tests. Runs alongside HD in the same metric worker.
P7 — Metric Worker Consumer
Kafka consumer group metric-worker-hd consuming idx.ohlcv topic. Computes HD+RSI per bar, writes to QuestDB metrics_hd + Redis cache + ClickHouse archive.
P8 — QuestDB Partition TTL
Cron job drops old partitions: 3 days for ticks/snapshots/orderbook, 14 days for OHLCV/metrics. Prevents storage exhaustion.
P9 — Historical Backfill
Replay tool ingests historical fixture files (10.3M messages) to populate ClickHouse with pre-launch data.
P10 — ClickHouse Drain
Consumer group ch-drain reads from multiple Redpanda topics and inserts into ClickHouse. Typically runs after market close (16:30 WIB).
P11 — Orderbook Kafka Deferral
Orderbook publishing to Kafka disabled (82% of message volume). QuestDB still receives orderbook data. Re-enable for M4 order flow metric.

What was achieved

HD metric: 100% accuracy match vs Go reference across all 606 tickers
RSI-14 Wilder's with 4 passing tests
Full hot/cold storage pipeline: QuestDB (hot, 3-14d) → ClickHouse (cold, unlimited)
Redpanda message bus with 6 topics, 2 consumer groups operational
Redis cache serving sub-ms metric lookups

Services Introduced

Redpanda

Kafka-Compatible Message Bus

Single-binary Kafka replacement. No ZooKeeper/KRaft needed. Serves as the event backbone: parser publishes parsed data, consumers (metric worker, CH drain, WS broadcaster) subscribe to topics. Built-in admin API + Prometheus metrics at :29644.

Topics

Partitions

:29092

Kafka Port

ClickHouse

Cold Analytical Store

Columnar OLAP database for unlimited historical data. Achieves 40:1 compression ratio. Uses ReplacingMergeTree for deduplication. Receives data from CH Drain worker after QuestDB partitions age out. Powers historical OHLCV queries and analytics.

40:1

Compression

∞

Retention

:28123

HTTP Port

Redis

In-Memory Cache + Pub/Sub

Sub-millisecond key-value store. Caches latest HD/RSI metric values per ticker (24h TTL). Powers WebSocket fan-out via PUBLISH/SUBSCRIBE. Also handles session storage, rate limit counters (Lua sliding window), and API response caching.

<1ms

Latency

24h

Cache TTL

:26379

Port

Metric Worker

Real-Time Indicator Engine

Rust consumer that subscribes to idx.ohlcv topic. Computes HD (7-step pipeline) and RSI-14 for every 1-minute bar. Outputs to QuestDB (metrics_hd table), Redis (sub-ms cache), and ClickHouse (historical archive). Hot-reloadable gate times via Redis pub/sub.

606

Tickers

Metrics

Bar Interval

Reconstructed API Layer

Single Go binary serving REST + WS + SSE + Admin + Portal

DONE ▶

P1 — Foundation + Auth
Go/Fiber project structure with internal/ package layout. PostgreSQL migrations (16 tables). JWT RS256 issuer. API key hashing (SHA-256). Rate limiter (Redis Lua sliding window). Health/ready endpoints.

PostgreSQLGo/Fiber
P2 — Data Endpoints
GET /v1/snapshot/:ticker (QuestDB), GET /v1/ohlcv/:ticker (hybrid QuestDB+ClickHouse), GET /v1/symbols, GET /v1/metrics/latest (Redis sub-ms), GET /v1/hd/chart/:ticker. Tier enforcement in RBAC middleware.
P3 — Plugin Endpoints
AmiBroker DLL wire protocol: POST /v1/plugin/activate (machine licensing), /deactivate, /heartbeat (10-min keepalive), /report (telemetry). Stale plugin cleanup job (5-min). API key auto-expire (hourly).
P4 — WebSocket + SSE Streaming
WS /v1/stream (Redis PubSub → WS broadcast). SSE /v1/live (alternative for web). Subscribe/unsubscribe JSON protocol matching plugin expectations. Tier gating (Free=blocked, Pro=50 sym, Enterprise=unlimited).
P5 — Admin Panel
HTMX + Tailwind + Alpine.js + DaisyUI. Session auth + CSRF. Dashboard, users, API keys, plugins, audit, CSV import. HD config page (gate times + hot-reload). Tier config page (per-tier variables + per-key overrides). Subscription approval queue.
P6 — User Portal
Self-service: own API keys, own machines, usage stats, setup guide. Subscription page (tier comparison, upgrade requests). Email verification + password reset flows.
P7 — Zero-Silent-Error + Polish
Audit write-ahead buffer with fallback file. Prometheus counters/gauges. Graceful shutdown (drain + close pools). TradingView UDF datafeed integration. Response envelope matching plugin expectations.

What was achieved

Single Go binary replaces 8 separate services from reference project
Three-domain architecture: Admin (/admin/*), Portal (/portal/*), Data API (/v1/*)
PostgreSQL with 16 tables: users, API keys, plugins, audit, settings, subscriptions, payments
Full AmiBroker plugin compatibility (exact endpoint paths + response format)
TradingView charting via UDF datafeed at /chart
Rate limiting, RBAC, HD obfuscation, tier enforcement from single TierMatrix

Services Introduced

Go API Server

Unified Application Server

Single Fiber binary serving everything: REST API with JWT auth, WebSocket streaming, SSE, Admin panel (HTMX), User portal, TradingView UDF, Prometheus metrics. Shared connection pools for all databases. Replaces 8 separate binaries from reference architecture.

:18090

Port

Domains

50+

Endpoints

PostgreSQL

Authentication & State Database

Relational database for all application state: user accounts, API keys (SHA-256 hashed), plugin activations, audit log (monthly partitions), system settings, subscription requests, payment records. CITEXT for case-insensitive email. Supports hot-reload of tier + HD configs.

Tables

RS256

JWT Signing

:25432

Port

Monitoring & Alerting

Prometheus + Grafana + Alertmanager + Telegram notifications

DONE ▶

P1 — Prometheus Metrics
Go API exposes /metrics at :2112. Request rates, latency histograms, error counters, rate limit hits, audit fallback writes, plugin activation gauges.

Prometheus
P2 — Grafana Dashboards
Pre-configured dashboards for pipeline health, API performance, and database status. Auto-provisioned data sources.

Grafana
P3 — Alertmanager Rules
Alert rules for feed staleness (>60s), request p99 (>200ms), rate limit fail-open, audit fallback writes, DB connection pool exhaustion.

Alertmanager
P4 — Telegram Bot Integration
Alertmanager → Telegram bot for ops notifications. Separate channels planned: ops (private), status (public), clients (private).
P5 — Ops Dashboards (Go API)
/ops/latency page: QuestDB table health, Redis/PG/CH status, Redpanda consumer lag, parser channel capacities. /ops/ranking: top 20 by volume, value, frequency.
P6 — HD Hot-Reload
Admin changes HD gate times → PostgreSQL → Redis PUBLISH "config:hd" → Rust metric worker picks up new config → next bar uses new gates. Zero downtime.

What was achieved

Full observability stack: Prometheus → Grafana → Alertmanager → Telegram
Self-hosted ops pages at /ops/latency and /ops/ranking (live data, no external tools needed)
HD configuration hot-reload without service restart

WebSocket & SSE Streaming

Real-time data push to clients (built inside S3 Phase 4)

DONE ▶

P1 — WebSocket Server
WS /v1/stream — bidirectional. Subscribe/unsubscribe per ticker with JSON protocol. Redis PubSub → WS broadcast. JWT in Authorization header.
P2 — Server-Sent Events
GET /v1/live — server-push only. Query params for symbols + metrics. Auto-reconnect built in. Best for web clients that don't need bidirectional.
P3 — Tier Gating
Free = WS blocked entirely. Pro = max 50 symbols. Enterprise = unlimited. HD values stripped/obfuscated per tier in stream.

S5 was implemented as part of S3 Phase 4. Listed separately for architectural clarity — streaming is a distinct capability.

Additional Metrics

Expanding the indicator library beyond HD

PARTIAL ▶

M1 — RSI-14 (Wilder's)
Relative Strength Index with 14-period lookback on 1-minute close prices. Smoothed average gains/losses. 4 unit tests passing.
M2 — MACD
Moving Average Convergence Divergence. EMA-12 / EMA-26 / Signal-9. On demand.
M3 — OBV
On-Balance Volume. Cumulative volume with sign determined by close direction.
M4 — Order Flow
Requires re-enabling orderbook Kafka publishing (currently deferred, 82% of volume).
M5–M9 — Future Indicators
Bollinger Bands, Stochastic, ATR, VWAP, custom signals. Each = 1 engine file in Rust.

Each new metric follows the same pattern: implement MetricEngine trait in Rust, add to worker consumer, write to metrics_hd table + Redis. ~1 session per metric.

Stock Screener

Multi-criteria stock filtering and ranking engine

HOLD ▶

P1 — Screener Query Engine
Filter stocks by metric thresholds (HD > X, RSI < 30, volume > Y). Combine multiple conditions with AND/OR logic.
P2 — Saved Screens
Users save custom screener configurations. Alert when stocks match criteria.
P3 — Screener API
REST endpoint: GET /v1/screener?filters=... Returns ranked list of matching tickers.

On hold — requires discussion on scope, tier access, and which metrics to expose in screener filters.

Broker Scraper

External broker data collection and integration

HOLD ▶

P1 — Broker Data Source
Scrape or integrate with broker APIs for additional data not available in ITCH feed.
P2 — Data Normalization
Normalize broker-specific formats into platform standard schema.

On hold — needs discussion on target brokers, data scope, and legal considerations.

Status Page

Public-facing service status + incident communication

DONE ▶

P1 — Error Response Integration
5xx and degraded responses include status_url field pointing to public status page. Clients can show "check status" link automatically.
P2 — /ready Degraded Mode
GET /ready returns degraded status when any backend (Redis, PG, QuestDB, CH) is unhealthy. Prometheus alert triggers on degraded state.
P3 — Instatus Setup
External hosted status page (Instatus.com). Manual setup required: create account, configure components, set STATUS_PAGE_URL env var.

S10

Client Dashboard

TradingView charts + documentation + ops monitoring pages

DONE ▶

P1 — TradingView Chart
Professional charting at /chart with UDF datafeed. Supports 1m, 5m, 15m, 30m, 1h, daily resolutions. Dark theme. Symbol search. HD metric overlay for enterprise tier.
P2 — Documentation Site
Comprehensive docs at /docs. Architecture, pipeline, services, credentials, API reference, tier system, metrics, key decisions. Interactive with collapsible sections and keyboard navigation.
P3 — Pipeline Latency Monitor
/ops/latency — live dashboard showing QuestDB table health, Redis/PG/CH connectivity, Redpanda consumer lag, parser channel buffer capacity, pipeline performance stats.
P4 — Market Ranking
/ops/ranking — top 20 stocks by volume, value, and trade frequency. Auto-refreshes every 10 seconds during trading hours.

S11

Payment Integration

Xendit/Midtrans gateway for automated tier upgrades

HOLD ▶

P1 — Database Schema (prepared in S3)
Tables ready: subscription_plans, subscriptions, payments, webhook_events. Seed data for Pro Monthly (Rp 299K), Pro Annual (Rp 2.99M), Enterprise Monthly (Rp 999K), Enterprise Annual (Rp 9.99M).
P2 — PaymentGateway Interface
internal/payment/gateway.go defines CreateInvoice, VerifyWebhook, GetPaymentStatus, CancelSubscription. Webhook stub routes return 501.
P3 — Xendit Integration
Native IDR support, virtual accounts (BCA/BNI/BRI/Mandiri), QRIS, e-wallets (OVO/Dana/GoPay). HMAC webhook verification.
P4 — Billing Automation
Recurring billing, grace period (7 days), auto-downgrade on expiry. Portal billing page for payment history + receipts.

On hold — database schema and interfaces prepared. Actual payment gateway integration deferred until customer base justifies it. Manual tier upgrades via admin panel work in the meantime.

Project Repository

idx-market-data-platform/
  go-api/                      Go/Fiber API server (:18090)
  rust-workers/                Parser + metric worker + aggregator + ch-drain
  docker/                      Docker compose configs + Grafana/Prometheus
  schema/                      PostgreSQL (16 tables) + ClickHouse schemas
  fixture/                     Test fixtures, HD accuracy CSV, replay data
  charting_library-master/     TradingView charting library (licensed)
  docs/                        Tech specs, decisions log, operations guide

Service Architecture Summary

12 services in total. All use idxmdp- container prefix on idxmdp_net Docker network. Fully independent — no shared services with other projects.

Rust Parser

Ingestion

Metric Worker

HD + RSI

Redpanda

Message Bus

QuestDB

Hot Store

ClickHouse

Cold Store

Redis

Cache + PubSub

PostgreSQL

Auth DB

Go API

:18090

Telegram Alerts Setup

Get notified on your phone the moment a stock matches your criteria

The Screener can deliver alerts directly to your Telegram. This guide walks you through the one-time setup and creating your first alert. Once linked, every alert you create gets delivered to your chat — not a shared channel.

What you'll need

A Telegram account on your phone
An account on this platform (login required)
About 5 minutes

Step 1 — Find your Telegram chat ID

The platform needs to know which Telegram chat to send your alerts to. Find your numeric chat ID:

Open Telegram on your phone
Search for the bot @userinfobot
Tap Start
The bot replies with your Id: — a positive integer like 560442208
Copy that number — you'll need it in Step 3

Step 2 — Start a chat with the IDX Alerts bot

Critical step. Telegram bots cannot send messages to a user who has never messaged them first. Skip this and your alerts will silently fail with chat not found.

In Telegram, search for @idx_testhink_bot
Open the chat
Tap Start (or send any message — /start is the convention)

That's it. You don't need to interact with the bot beyond this. It's purely for delivery.

Step 3 — Link Telegram in the dashboard

Log into the dashboard
Open Screener → Alerts tab
In the Telegram delivery card, click Link Telegram
A 6-digit code appears (e.g. 451096)
In the form below the code, fill in the inputs left-to-right:
- Left field: your chat ID from Step 1 (e.g. 560442208)
- Right field: the 6-digit code from above — auto-filled, but verify it matches
Click Verify

The card flips to show Linked to chat <your ID> with a green Linked badge. You're set up.

Step 4 — Create your first alert

In the same Alerts tab, scroll down to the Create alert form:

Field	What to enter
Ticker	The stock symbol, e.g. `BBCA`
Alert name	Anything memorable, e.g. `BBCA oversold`
Conditions	Click + Add condition, set field, operator, value

Worked example — notify me when BBRI's RSI drops below 30:

Ticker: BBRI
Alert name: BBRI oversold
Condition: field RSI, operator <, value 30

Click Create alert. The new alert appears in the Active alerts table above with status enabled.

Step 5 — Wait for delivery

The alert worker checks every 10 seconds. When the condition transitions from false to true, a message lands on your phone:

🔔 BBRI
BBCA oversold

The message includes the ticker and your alert name.

How delivery actually works

Each transition fires once. If RSI drops below 30, you get one message. If it rises back above 30 then drops again, you get another. Mid-condition (already true), no re-fire.
Quotas: 50 alerts per user.
Disable temporarily: toggle the alert in the Active alerts list to pause without deleting.
One ticker per alert: a single alert tracks one ticker. To watch multiple tickers, create one alert per ticker.
Cross-ticker conditions are not allowed for alerts (e.g. you cannot reference index membership). Use the Screener tab for cross-ticker scans.

Troubleshooting

Symptom	Likely cause	Fix
`code unknown or expired` on Verify	The code in the page expired, or the dashboard cached an old one	Refresh the browser tab and request a fresh Link Telegram code
Alert stays `enabled` but no message arrives	You skipped Step 2 (didn't DM the bot first)	Open Telegram, DM `@idx_testhink_bot` once. The alert auto-retries on the next 10-second cycle — no need to recreate.
`Verification failed: code does not match this user`	You pasted someone else's code	Click Re-link for a fresh code tied to your account
Got a message but the data feels stale	The screener cache may be slow to refresh	Check the Live badge in the Alerts page header. Yellow or red means upstream isn't current.
`Screener is disabled in this build.` on the Screener tab	The platform operator has not enabled the screener (`S7_ENABLED=false`)	Contact the operator to flip the flag

Stop receiving alerts

Delete individual alerts from the Active alerts list, OR
Toggle an alert to disabled to pause it without losing the configuration, OR
Click Re-link in the Telegram delivery card and verify a different chat ID to redirect alerts elsewhere

What's happening behind the scenes

When you create an alert, it lives in postgres with last_eval = false. Every 10 seconds, the alert-worker service runs a cycle:

SELECTs all enabled alerts
HGETs the current snapshot from Redis (screener:state:tickers)
For each alert, evaluates the condition tree against the snapshot row for that alert's ticker
Compares the result to the alert's last_eval. On a false → true transition, the worker:
- Inserts a row into alert_fired (PK on (alert_id, transition_ts) — idempotent, so worker restarts can't re-deliver the same transition)
- Calls Telegram's sendMessage API with your linked chat ID
- On 200 OK, marks delivery_status = sent
- On 4xx/5xx, marks failed and retries with exponential backoff (1, 2, 4, 8, 16 seconds, max 5 attempts)

The dispatcher's most common failure mode is HTTP 400 chat not found — that's the bot-anti-spam rule from Step 2 biting late.

IDX Symbol Names Refresh

Regenerate the ticker → company-name map shown in the chart's symbol search

The chart's symbol search displays the full company name next to each IDX ticker (e.g. BBCA → "PT Bank Central Asia Tbk"). That mapping lives in go-api/internal/refdata/idx_symbols.json and is embedded into the API binary at build time. This guide explains how to regenerate the file when new tickers list on IDX or when names change.

When to refresh

New IPO — ticker shows the bare code instead of company name in symbol search
Ticker symbol change after merger / corporate action (e.g. EXCL → XLSmart Telecom Sejahtera)
Quarterly hygiene — KSEI republishes ownership data monthly; pulling a fresh snapshot picks up renames you might have missed

Typical cadence: every 2–3 months, or whenever the screener shows an unmapped ticker that traders ask about.

Why we don't fetch live from IDX

The official endpoints at www.idx.co.id/primary/ListedCompany/… and /secondary/get/v1/… are behind Cloudflare's bot challenge. Plain curl returns HTTP 403 with the “Just a moment…” interstitial. A headless-browser scraper would bypass it but is slow and brittle, so we use KSEI's republished ownership CSV instead — same roster, no bot challenge.

Source of truth: KSEI ownership data

KSEI (the Indonesian central securities depository) publishes monthly stock-ownership snapshots which include the full issuer_name for every listed company. The community repo aryakdaniswara/idx-stock-ownership mirrors these as structured CSV. We use the latest CSV's share_code + issuer_name columns.

Refresh procedure

The whole pipeline is a single bash session — no committed script, since the source URL changes per snapshot. Run from the repo root.

Step 1 — Find the latest KSEI CSV

# List recent CSV files in the source repo
curl -sS "https://api.github.com/repos/aryakdaniswara/idx-stock-ownership/contents/data" \
  | grep '"download_url".*\.csv'

Pick the most recent file (filename pattern kepemilikan_saham_YYYYMMDD.csv) and copy its download_url.

Step 2 — Download and extract unique pairs

# Replace URL with the latest from Step 1
curl -sS "https://raw.githubusercontent.com/aryakdaniswara/idx-stock-ownership/main/data/kepemilikan_saham_YYYYMMDD.csv" \
  -o /tmp/ksei.csv

# Extract unique (ticker, name) pairs from columns 2 + 3
awk -F',' 'NR>1 {print $2"\t"$3}' /tmp/ksei.csv | sort -u > /tmp/ksei_pairs.tsv
wc -l /tmp/ksei_pairs.tsv   # expect ~950

Step 3 — Pull current QDB ticker universe

We only emit entries for tickers that actually trade in our QDB — warrants and delisted codes get the bare-ticker fallback automatically.

curl -sS -G "http://localhost:19000/exec" \
  --data-urlencode "query=SELECT DISTINCT ticker FROM idx_ohlcv ORDER BY ticker" \
  | python3 -c "import json,sys; d=json.load(sys.stdin); [print(r[0]) for r in d['dataset']]" \
  > /tmp/qdb_tickers.txt

# Build the intersection (preserve all KSEI variants per ticker)
awk -F'\t' '
  NR==FNR { ksei[$1] = ksei[$1] "\n" $2; next }
  ($1 in ksei) {
    n = split(ksei[$1], variants, "\n")
    for (i=2; i<=n; i++) print $1"\t"variants[i]
  }
' /tmp/ksei_pairs.tsv /tmp/qdb_tickers.txt > /tmp/intersection.tsv

Step 4 — Title-case + write JSON

python3 << 'PYEOF' > go-api/internal/refdata/idx_symbols.json
import re, json

ticker_to_names = {}
with open('/tmp/intersection.tsv') as f:
    for line in f:
        line = line.rstrip('\n')
        if '\t' not in line: continue
        ticker, name = line.split('\t', 1)
        ticker_to_names.setdefault(ticker, []).append(name)

def pick_canonical(names):
    # Drop sub-class variants like "MVS GOTO ..." in favor of plain "GOTO ..."
    candidates = [n for n in names if not re.match(r'^(MVS|DR)\s', n)]
    if not candidates: candidates = names
    return min(candidates, key=len)

pairs = {t: pick_canonical(ns) for t, ns in ticker_to_names.items()}

def fix_parens(s):
    s = re.sub(r'\(\s+', '(', s); s = re.sub(r'\s+\)', ')', s); return s

ACRONYMS = {
    'PT', 'XL', 'CIMB', 'BNI', 'BCA', 'BRI', 'BTPN', 'BFI', 'AKR',
    'MNC', 'GMF', 'KAI', 'IDX', 'OCBC', 'NISP', 'UOB', 'HSBC', 'IFG',
    'BSI', 'WSBP', 'KB', 'IBK', 'QNB', 'SMBC', 'BTN', 'CBP',
}

def smart_title(s):
    s = fix_parens(s); out = []
    for word in s.split():
        m = re.match(r'^(\W*)(.*?)(\W*)$', word)
        prefix, body, suffix = (m.group(1), m.group(2), m.group(3)) if m else ('', word, '')
        if not body: out.append(word); continue
        upper = body.upper()
        if body.lower() == 'tbk':       out.append(prefix + 'Tbk' + suffix)
        elif upper in ACRONYMS:          out.append(prefix + upper + suffix)
        elif body[0].isalpha():          out.append(prefix + body.capitalize() + suffix)
        else:                            out.append(word)
    return ' '.join(out)

out = {
    "_comment": "IDX ticker -> full company name. Sourced from KSEI ownership data, filtered to tickers actually present in our QDB universe, title-cased. Tickers absent fall back to the bare ticker."
}
for ticker in sorted(pairs):
    cleaned = smart_title(pairs[ticker])
    if not cleaned.startswith('PT '): cleaned = 'PT ' + cleaned
    out[ticker] = cleaned

print(json.dumps(out, indent=2, ensure_ascii=False))
PYEOF

Step 5 — Hand-clean the diff

Smart-title gets ~95% right, but a handful of brand acronyms come out wrong (e.g. XLSmart → Xlsmart). Diff against the previous version and patch the obvious ones in-place:

git diff go-api/internal/refdata/idx_symbols.json | head -100

Common touch-ups:

Add new acronym to the ACRONYMS set if it appears in >1 company name
Single-occurrence quirks: just hand-edit the JSON value
Verify BBCA, TLKM, BMRI, BBRI, GOTO as smoke-check anchors

Step 6 — Rebuild + verify live

docker compose -f docker-compose.dev.yml build api
docker compose -f docker-compose.dev.yml up -d api

# Smoke test: search "BBCA" should return the full company name
curl -sS -b /tmp/cookies.txt "http://localhost:18090/udf/search?query=BBCA&limit=3" \
  | python3 -m json.tool

Expected first result:

{
  "description": "PT Bank Central Asia Tbk",
  "exchange": "IDX",
  "full_name": "IDX:BBCA",
  "symbol": "BBCA",
  "ticker": "BBCA",
  "type": "stock"
}

How the lookup is wired

File: go-api/internal/refdata/idx_symbols.json — flat { "TICKER": "Full Name" } map plus a leading _comment key.
Loader: go-api/internal/refdata/refdata.go — //go:embed's the JSON, parses on package init, exposes refdata.CompanyName(ticker).
Consumer: go-api/internal/handler/udf.go — UDFSearch and UDFSymbolResolve call the helper stockDescription(ticker) which falls back to the bare ticker for unmapped equities.
Search filter: UDFSearch also matches against the company name — typing “Bank Central” finds BBCA.

Adding a single ticker without a full refresh

If only one or two new tickers need adding (e.g. an IPO this week), skip the pipeline and patch the JSON directly:

{
  …
  "NEWX": "PT New Company Tbk",
  …
}

Then rebuild api. Keep the file alphabetically sorted to keep diffs reviewable.

Troubleshooting

Symptom	Likely cause	Fix
Ticker still shows bare code in search after rebuild	JSON didn't actually change OR build didn't include refdata package	Verify with `git diff` + `docker compose build api --no-cache`
Search returns ALL tickers as bare code	JSON parse error on init — likely a trailing comma or unescaped quote	Check api logs: `docker logs idxmdp-api 2>&1 \| grep "refdata:"` — the loader logs `failed to parse` on bad JSON
Company name is title-cased weirdly (e.g. `Xlsmart`)	Brand acronym not in the `ACRONYMS` set — smart-title fell back to `str.capitalize()`	Either hand-edit that one entry, or add the acronym to the set and regenerate
KSEI repo at the GitHub URL is gone or stale	Maintainer abandoned it — happens with one-person open-source data mirrors	Search GitHub for `idx saham indonesia csv` for an alternative mirror, OR fall back to a Playwright scraper against IDX directly

Operations Guide

Start · Stop · Status · Monitoring · Logging — everything you need to run the platform day-to-day

Full source: docs/OPERATIONS-GUIDE.md. This page mirrors the runbook verbatim for quick in-browser access. All commands run from the repo root: /home/testhink/idx-market-data-platform.

Quick Links

URL	What it shows
`http://localhost:18082`	Ops Dashboard — system-wide status, services, tables, topics
`/ops/latency`	Latency monitor — per-table age, LIVE/STALE, write rate
`/ops/ranking`	Top 20 by volume/value/frequency
`http://localhost:19000`	QuestDB console — SQL queries
`http://localhost:28123`	ClickHouse HTTP — direct queries
`http://localhost:13000`	Grafana — `admin` / `admin`

1. Start / Stop

# Start everything (all containers, dependency order)
docker compose -f docker-compose.dev.yml up -d

# Start individual services
docker compose -f docker-compose.dev.yml up -d questdb
docker compose -f docker-compose.dev.yml up -d parser
docker compose -f docker-compose.dev.yml up -d metric-worker

# Stop everything (preserves volumes)
docker compose -f docker-compose.dev.yml down

# Full wipe (removes data volumes — re-apply schemas after)
docker compose -f docker-compose.dev.yml down -v

2. Status Check

# Visual (recommended)
# → http://localhost:18082 — auto-refresh every 10s

# CLI
docker compose -f docker-compose.dev.yml ps
docker logs idxmdp-parser --tail 20
docker logs idxmdp-metric-worker --tail 20
docker logs idxmdp-ch-drain --tail 20

3. Monitoring Commands

Kafka / Redpanda

docker exec idxmdp-redpanda rpk topic list
docker exec idxmdp-redpanda rpk topic consume idx.ohlcv --num 1
docker exec idxmdp-redpanda rpk group describe metric-worker-hd
docker exec idxmdp-redpanda rpk group describe ch-drain

QuestDB (hot storage)

curl -s "http://localhost:19000/exec?query=SELECT%20count()%20FROM%20idx_ticks"
curl -s "http://localhost:19000/exec?query=SELECT%20count()%20FROM%20metrics_hd"

ClickHouse (cold storage)

docker exec idxmdp-clickhouse clickhouse-client --database market_data \
  --query "SELECT count() FROM idx_ticks"

docker exec idxmdp-clickhouse clickhouse-client \
  --query "SHOW TABLES FROM market_data"

Redis (cache)

docker exec idxmdp-redis redis-cli -a idxmdp_redis_dev_2026 GET last:hd:BBCA
docker exec idxmdp-redis redis-cli -a idxmdp_redis_dev_2026 KEYS "last:hd:*"
docker exec idxmdp-redis redis-cli -a idxmdp_redis_dev_2026 DBSIZE

4. Logs & Levels

# Follow all / specific service
docker compose -f docker-compose.dev.yml logs -f
docker compose -f docker-compose.dev.yml logs -f parser

# Last N lines
docker logs idxmdp-parser --tail 50

# RUST_LOG levels (set in docker-compose.dev.yml)
RUST_LOG=idx_parser=info        # default
RUST_LOG=idx_parser=debug       # every skipped message
RUST_LOG=idx_parser=warn        # errors only

All services use Docker json-file driver with rotation: max 50 MB per file × 3 files = ~150 MB per service.

5. Maintenance

# QuestDB retention (keep 14 days)
./scripts/questdb-retention.sh 14 --dry-run
./scripts/questdb-retention.sh 14

# Recommended cron — daily after market close (16:30 WIB)
# 30 16 * * 1-5 /home/testhink/idx-market-data-platform/scripts/questdb-retention.sh 14

# Rebuild after code change
docker compose -f docker-compose.dev.yml build parser
docker compose -f docker-compose.dev.yml up -d

6. Daily Workflow (Trading Day)

07:50   docker compose -f docker-compose.dev.yml up -d
        Open http://localhost:18082 — verify all green

08:30   Pre-market: check logview for SNAP/BID/ASK messages

09:00   Market opens: check TRADE messages in logview
        Dashboard candles building (BBCA, TLKM etc.)
        ops dashboard: metrics_hd row count growing

During  Monitor via ops dashboard (:18082)
        docker logs idxmdp-parser --tail 5
        docker logs idxmdp-metric-worker --tail 5

16:15   Market closes — aggregator flushes open bars
16:30   Run retention: ./scripts/questdb-retention.sh 14

7. Graceful Shutdown & Morning Startup

Principle: stop writers upstream-first, let the pipeline drain, verify zero in-flight, then stop storage bottom-up. Start in the exact reverse order. Never use down -v — that wipes volumes.

7.1 Shutdown — Phase 1: Pre-shutdown drain check

Do not start shutdown until you confirm the pipeline is idle (after 16:15 WIB post-trade window).

# 1a. Verify no new ticks are flowing (should be stable between two checks ~10s apart)
curl -s "http://localhost:19000/exec?query=SELECT%20count()%20FROM%20idx_ticks%20WHERE%20ts%20%3E%20dateadd('m',-1,now())"

# 1b. Verify Kafka consumer lag is zero for every group / every partition
docker exec idxmdp-redpanda rpk group describe ch-drain
docker exec idxmdp-redpanda rpk group describe metric-worker-hd
docker exec idxmdp-redpanda rpk group describe metric-worker-rsi
# Every partition must show LAG = 0. If not, WAIT — do not proceed.

7.2 Shutdown — Phase 2: Run daily retention (optional)

./scripts/questdb-retention.sh 14

7.3 Shutdown — Phase 3: Stop writers upstream-first (drain the pipeline)

# 3a. Stop parser FIRST — cuts off new data at the source
docker compose -f docker-compose.dev.yml stop parser

# 3b. Wait ~30s for the aggregator to flush in-memory 1m bars to Kafka
sleep 30

# 3c. Re-verify consumer lag is zero
docker exec idxmdp-redpanda rpk group describe metric-worker-hd
docker exec idxmdp-redpanda rpk group describe ch-drain

# 3d. Stop metric-worker (drains idx.ohlcv → metrics_hd / metrics_rsi)
docker compose -f docker-compose.dev.yml stop metric-worker

# 3e. Stop ch-drain (drains idx.* → ClickHouse)
docker compose -f docker-compose.dev.yml stop ch-drain

7.4 Shutdown — Phase 4: Stop read-side tier

docker compose -f docker-compose.dev.yml stop api dashboard logview ops
docker compose -f docker-compose.dev.yml stop grafana alertmanager prometheus node-exporter redis-exporter

7.5 Shutdown — Phase 5: Snapshot data stores

# 5a. Redis — force a background save so in-memory state hits disk
docker exec idxmdp-redis redis-cli -a idxmdp_redis_dev_2026 BGSAVE
sleep 3
docker exec idxmdp-redis redis-cli -a idxmdp_redis_dev_2026 LASTSAVE

# 5b. ClickHouse — optional lazy merge (skip if in a hurry; merges happen next start)
docker exec idxmdp-clickhouse clickhouse-client --database market_data \
  -q "OPTIMIZE TABLE idx_ticks FINAL" 2>/dev/null || true
docker exec idxmdp-clickhouse clickhouse-client --database market_data \
  -q "OPTIMIZE TABLE idx_ohlcv FINAL" 2>/dev/null || true

# 5c. QuestDB auto-commits its WAL on clean stop — nothing to do

7.6 Shutdown — Phase 6: Stop infrastructure bottom-up

# Redpanda must stop AFTER all consumers are gone (done in §7.3)
docker compose -f docker-compose.dev.yml stop redpanda

# Then storage
docker compose -f docker-compose.dev.yml stop clickhouse questdb redis postgres

# Verify all stopped
docker compose -f docker-compose.dev.yml ps

Faster alternative after §7.3 drain: once writers are stopped and lag is zero, you can collapse §7.4–7.6 into a single docker compose -f docker-compose.dev.yml stop — graceful SIGTERM to everything. Volumes are preserved. Data-loss risk: zero.

7.7 Startup — Phase 1: Infrastructure first (T-60min, 08:00 WIB)

cd /home/testhink/idx-market-data-platform

# Bring up storage + Kafka first; wait for healthchecks
docker compose -f docker-compose.dev.yml up -d questdb redpanda clickhouse redis postgres

# Poll until all five are (healthy) — usually <30s
for i in 1 2 3 4 5 6; do
  docker compose -f docker-compose.dev.yml ps questdb redpanda clickhouse redis postgres
  sleep 5
done

7.8 Startup — Phase 2: Smoke-test infrastructure

# QuestDB — row counts persisted from last session
curl -s "http://localhost:19000/exec?query=SELECT%20count()%20FROM%20idx_ticks"
curl -s "http://localhost:19000/exec?query=SELECT%20count()%20FROM%20idx_ohlcv"

# Redpanda — topics present (disk-persisted)
docker exec idxmdp-redpanda rpk topic list

# ClickHouse — tables present
docker exec idxmdp-clickhouse clickhouse-client -q "SHOW TABLES FROM market_data"

# Redis — ping
docker exec idxmdp-redis redis-cli -a idxmdp_redis_dev_2026 PING

# Postgres — users table
docker exec idxmdp-postgres psql -U idxmdp -d idxmdp -c "SELECT count(*) FROM users"

If any check fails, STOP here. Do not start writers on top of broken storage.

7.9 Startup — Phase 3: Start consumers before the producer

# ch-drain + metric-worker first — so they're already consuming
# by the time parser starts firing. Prevents startup lag spikes.
docker compose -f docker-compose.dev.yml up -d ch-drain metric-worker

# Confirm they joined their consumer groups cleanly
docker logs --since 30s idxmdp-ch-drain 2>&1 | tail -10
docker logs --since 30s idxmdp-metric-worker 2>&1 | tail -10

7.10 Startup — Phase 4: API, monitoring, read-side tier

docker compose -f docker-compose.dev.yml up -d api
curl -sf http://localhost:18090/health && echo ' API OK'

docker compose -f docker-compose.dev.yml up -d dashboard logview ops
docker compose -f docker-compose.dev.yml up -d prometheus grafana alertmanager node-exporter redis-exporter

7.11 Startup — Phase 5: Parser LAST (T-30min, 08:30 WIB)

# Parser is the data source — start it last so everything downstream is listening
docker compose -f docker-compose.dev.yml up -d parser

# Watch it connect to RabbitMQ and start consuming
docker logs -f --since 10s idxmdp-parser
# Expect: "RabbitMQ connected", message decode counts climbing

7.12 Startup — Phase 6: Pre-market verification (T-30 to T-0)

# 6a. All containers running and healthy
docker compose -f docker-compose.dev.yml ps

# 6b. Parser metrics flowing
curl -s http://localhost:9464/metrics | grep -E 'idx_parser_itch_bytes_total|idx_parser_messages_total'

# 6c. Consumer groups rejoined with zero lag
docker exec idxmdp-redpanda rpk group describe ch-drain
docker exec idxmdp-redpanda rpk group describe metric-worker-hd

# 6d. Pre-market ticks flowing after 08:45 WIB
curl -s "http://localhost:19000/exec?query=SELECT%20count()%20FROM%20idx_ticks%20WHERE%20ts%20%3E%20dateadd('m',-5,now())"

# 6e. Ops dashboard visual check
# → http://localhost:18082  — all 15+ services green
# → http://localhost:13000/d/pipeline-flow  — Grafana pipeline-flow dashboard

7.13 Startup — Phase 7: Market open (09:00 WIB)

# First trades should arrive within the first minute
docker logs --since 2m idxmdp-parser 2>&1 | grep -i trade | head -5

# 1m bars start building at 09:01
curl -s "http://localhost:19000/exec?query=SELECT%20ticker%2C%20ts%2C%20close%20FROM%20idx_ohlcv%20WHERE%20ts%20%3E%20dateadd('m'%2C-2%2Cnow())%20LIMIT%205"

# Chart at /chart?symbol=BBCA should show live candles updating

7.14 Recovery — if something goes wrong

Symptom	Likely cause	Fix
Consumer lag growing on startup	metric-worker started before Kafka was ready	`docker compose restart metric-worker` after confirming Redpanda is healthy
Parser can’t connect to RabbitMQ	Upstream RabbitMQ down or credentials expired	Check `docker logs idxmdp-parser`; verify `.env` AMQP URL
QuestDB not accepting writes	WAL corruption or disk full	`df -h`; `docker logs idxmdp-questdb`
ClickHouse slow to start	Large merge from prior `OPTIMIZE FINAL`	Wait; `docker logs idxmdp-clickhouse`
Chart shows stale bars	Browser cache or stale connection	Hard reload (Ctrl+Shift+R); check `/health`
Gaps in today’s data after startup	Parser missed early messages during startup lag	Use the Backfill Guide runbook

8. Troubleshooting

Symptom	Check	Fix
No trade data	`rpk topic list` — `idx.ticks` missing	Market may be closed; check RabbitMQ connectivity
`metrics_hd` empty	`docker logs idxmdp-metric-worker`	Check `idx.ohlcv` exists (needs trade ticks first)
ClickHouse empty	`docker logs idxmdp-ch-drain`	Check consumer-group lag; old-format messages are skipped
Container restarting	`docker logs <container> --tail 50`	Check config/connection errors
Port conflict	`ss -tlnp \| grep <port>`	Find the collision with `docker ps`

Monitoring Guide

Grafana · Prometheus · Alertmanager · Telegram — the authoritative monitoring runbook

Full source: docs/MONITORING-GUIDE.md. The shorter Monitoring page is a UI-level summary; this one is the operational bible.

1. Scope

In scope: container health, QuestDB ILP throughput, ClickHouse inserts, Redpanda consumer lag, Redis memory & clients, host CPU/disk/mem, Go API RPS/latency/errors, Telegram alert routing.

Out of scope: business metrics, billing state, user analytics, market-data correctness (HD accuracy — see Backfill Guide).

The Rust parser/metric-worker/ch-drain binaries do not currently expose Prometheus metrics. Placeholder scrape jobs exist at ports 9464/9465/9466 in monitoring/prometheus/prometheus.yml (commented out). S2 follow-up: axum-based /metrics endpoints in rust-workers/src/metrics_server.rs.

2. Scrape Targets

Container	Port	Path	Exposes
`idxmdp-api`	2112	`/metrics`	Go/Fiber: RPS, latency histograms, error counters, WS/SSE counts
`idxmdp-questdb`	9003	`/metrics`	ILP committed rows/sec, table row counts
`idxmdp-clickhouse`	9363	`/metrics`	insert rate, system metrics
`idxmdp-redpanda`	9644	`/public_metrics`	consumer lag per group/topic
`idxmdp-node-exporter`	9100	`/metrics`	host CPU / memory / disk
`idxmdp-redis-exporter`	9121	`/metrics`	Redis memory / client count

3. Access Points

Tool	URL	Auth
Grafana	`http://localhost:13000`	`admin` / `admin`
Prometheus	`http://localhost:19090`	none
Alertmanager	`http://localhost:19093`	none

4. Start / Verify Stack

# Bring up monitoring
docker compose -f docker-compose.dev.yml up -d grafana prometheus alertmanager \
  node-exporter redis-exporter

# Verify each scrape target is UP
curl -s http://localhost:19090/api/v1/targets | jq '.data.activeTargets[] | {job:.labels.job, health:.health}'

# Check a specific metric is being scraped
curl -s 'http://localhost:19090/api/v1/query?query=up' | jq '.data.result'

5. Dashboards (pre-provisioned)

Grafana

IDX API Dashboard

10 panels — request rate, latency p50/p95/p99, 5xx rate, active WS/SSE, rate-limit fail-open, audit fallback writes, stale plugin activations.

Grafana

IDX Platform Overview

8 panels — service health, QuestDB ILP rows/sec, ClickHouse insert rate, Redpanda lag, Redis memory/clients, disk, CPU.

6. Alert Thresholds

Metric	Warning	Critical	Action
Feed → DB latency	>5s	>60s	Check parser logs, QuestDB health
API request p99	>50ms	>200ms	DB pool exhaustion, slow queries
`rate_limit_fail_open`	>0 for 5m	>0 for 15m	Redis connectivity
`audit_fallback_writes`	>0	>0 for 10m	PostgreSQL connectivity
Redpanda consumer lag	>1000	>10000	Metric worker or CH drain backpressure

7. Telegram Routing

# Required env vars (set in .env before starting alertmanager)
TELEGRAM_BOT_TOKEN=...
TELEGRAM_CHAT_ID=...

# Test send
docker exec idxmdp-alertmanager amtool alert add test severity=warning \
  --alertmanager.url=http://localhost:9093

Without these env vars Alertmanager will restart-loop. Either set them or remove the alertmanager service from the compose file until Telegram is configured.

8. Common Troubleshooting

Symptom	Diagnosis	Fix
Target `DOWN` in Prometheus	`curl http://<container>:<port>/metrics` from inside `idxmdp-prometheus`	Port / network mismatch in `prometheus.yml`
Grafana shows "No Data"	Check time range, datasource URL (should be `http://prometheus:9090`)	Re-provision datasource from `monitoring/grafana/`
Alertmanager restart-loop	`docker logs idxmdp-alertmanager`	Set Telegram env vars or disable service
QuestDB metric gaps	QuestDB was restarted (counter reset)	Use `rate()` with `resets()` in PromQL

Backfill Guide

Recover · Recompute · Restore — the data-recovery swiss army knife

Full source: docs/BACKFILL-GUIDE.md. All commands run inside idxmdp-parser. The binary is at /usr/local/bin/backfill.

Quick Reference

I need to…	Run this
Recompute 1m bars from raw ticks for a time window	Runbook A (§6.1)
Recompute HD/RSI metrics from existing 1m bars	Runbook B (§6.2)
Restore historical OHLCV from CSV files	Runbook C (§6.3)
Repair HD/RSI flat-day after metric-worker cold-start	Runbook D (§6.4) — `./scripts/repair-metrics.sh`

Where to look first when something goes wrong:
• Parser logs: docker logs --since 1h idxmdp-parser
• Aggregator metrics: curl -s http://localhost:9464/metrics | grep aggregator
• QuestDB console: http://localhost:19000
• ClickHouse: docker exec idxmdp-clickhouse clickhouse-client -q "SELECT count() FROM idx_ohlcv WHERE date='2026-04-08'"

1. Tiers Rebuilt

Tier	Storage	Subcommand
Hot OHLCV	QuestDB `idx_ohlcv`	`ohlcv`
Cold OHLCV	ClickHouse `idx_ohlcv`	`ohlcv`, `restore`
Metrics (HD/RSI)	QuestDB `metrics_*`	`metric`

All subcommands re-use the live pipeline's own Aggregator and MetricEngine code so output is byte-identical. Every run is idempotent.

2. Binary Location

The backfill binary lives at rust-workers/target/release/backfill after cargo build --release --bin backfill. For production runs, always invoke it inside the parser container (docker exec idxmdp-parser backfill…) so it uses the same network, env vars and config as the live workers. The host-side binary is only for --dry-run validation.

3. Subcommand Reference

3.1 `ohlcv` — re-aggregate 1-minute bars

# One day, all tickers
docker exec idxmdp-parser backfill ohlcv \
  --from 2026-04-07 --to 2026-04-08

# A single minute, single ticker
docker exec idxmdp-parser backfill ohlcv \
  --from 2026-04-07T09:14:00 --to 2026-04-07T09:15:00 --ticker BBCA

# Dry-run (count ticks, no writes)
docker exec idxmdp-parser backfill ohlcv \
  --from 2026-04-07 --to 2026-04-08 --dry-run

3.2 `metric` — recompute HD or RSI

docker exec idxmdp-parser backfill metric \
  --engine hd --from 2026-04-01 --to 2026-04-08

docker exec idxmdp-parser backfill metric \
  --engine rsi --from 2026-04-07 --to 2026-04-08 --ticker BBCA

Stateful engines (HD, RSI) require warm-up. The tool automatically queries bars before --from and feeds them into the engine without writing output. For RSI-14 specifically, widen the warm-up to at least 14 prior bars or the first 14 rows will be NaN/0.

3.3 `restore` — import CSV into ClickHouse

docker cp /home/testhink/dumps/2026-Q1.csv idxmdp-parser:/tmp/2026-Q1.csv
docker exec idxmdp-parser backfill restore /tmp/2026-Q1.csv

4. Time Format & UTC Gotcha

Accepted forms (all interpreted as UTC, --from inclusive, --to exclusive):

YYYY-MM-DD — e.g. 2026-04-07 (midnight UTC)
YYYY-MM-DDTHH:MM:SS ISO — 2026-04-07T09:14:00
YYYY-MM-DD HH:MM:SS — 2026-04-07 09:14:00

IDX trades on WIB (UTC+7). Subtract 7 hours before passing to backfill.

Event	WIB	UTC
Session 1 open	09:00	02:00
Session 1 close	12:00	05:00
Session 2 open	13:30	06:30
Pre-close auction	16:14	09:14
Market close	16:15	09:15

If --dry-run reports 0 ticks retrieved for a window when the market was open, you forgot to subtract 7 hours.

5. Runbook A — Recover a missing minute bar

# 1. Detect (QuestDB)
SELECT ticker, count() FROM idx_ohlcv
WHERE ts >= '2026-04-07T09:14:00.000Z'
  AND ts <  '2026-04-07T09:15:00.000Z'
GROUP BY ticker;

# 2. Dry-run
docker exec idxmdp-parser backfill ohlcv \
  --from 2026-04-07T09:14:00 --to 2026-04-07T09:15:00 --dry-run

# 3. Run
docker exec idxmdp-parser backfill ohlcv \
  --from 2026-04-07T09:14:00 --to 2026-04-07T09:15:00

# 4. Verify — both tiers should now match

6. Runbook B — Recompute HD/RSI for a full day

# 1. Detect gaps
SELECT count() FROM metrics_hd
WHERE ts >= '2026-04-07T02:00:00.000Z'
  AND ts <  '2026-04-07T09:30:00.000Z';

# 2. Dry-run (watch for "Warm-up complete: N bars replayed")
docker exec idxmdp-parser backfill metric \
  --engine hd --from 2026-04-07 --to 2026-04-08 --dry-run

# 3. Run
docker exec idxmdp-parser backfill metric \
  --engine hd --from 2026-04-07 --to 2026-04-08

# 4. Verify
SELECT symbol, count(), min(ts), max(ts)
FROM metrics_hd WHERE ts IN '2026-04-07'
GROUP BY symbol LIMIT 10;

Note: IN '2026-04-07' is QuestDB shorthand for the entire UTC day.

7. Runbook C — Restore ClickHouse from CSV

docker cp /home/testhink/dumps/idx_ohlcv-2026Q1.csv idxmdp-parser:/tmp/restore.csv
docker exec idxmdp-parser backfill restore /tmp/restore.csv --dry-run
docker exec idxmdp-parser backfill restore /tmp/restore.csv

# Then re-run Runbook B to rebuild dependent metrics

8. Runbook D — Repair HD/RSI after a metric-worker cold-start

Use this when an unexpected restart leaves metric-worker running with empty in-memory state — the symptom is HD value=0.0 (or RSI stuck at 50.0) for every bar of the affected day across all symbols. Root cause: metric-worker started before ClickHouse was ready to serve queries, so warm-up logged warm-up failed: ... — starting cold and the engine never seeded.

1. Detect — confirm the cold-start case (not a real flat market):

-- QuestDB. zero_cnt ≈ total_cnt means the whole day is bogus.
SELECT count(*) AS total_cnt,
       sum(case when value=0 then 1 else 0 end) AS zero_cnt,
       count(distinct symbol) AS syms
FROM metrics_hd WHERE ts >= '2026-05-07';

# Cross-check metric-worker startup log
docker logs idxmdp-metric-worker --since 24h | grep -E "warm-up failed|warm-up: [0-9]+ bars replayed"

A warm-up failed: ... line on the most recent restart confirms the diagnosis.

2. One-command fix:

# today (UTC)
./scripts/repair-metrics.sh

# a specific past day
./scripts/repair-metrics.sh 2026-05-07

The script does four steps (not three — the first version of this runbook missed step 4, leaving the chart showing huge |delta| bars at the day boundary because ClickHouse still held the cold-start zeros while QuestDB had been repaired):

Restarts idxmdp-metric-worker and waits for warm-up: N bars replayed (rebuilds in-memory state from ClickHouse).
Drops the bad day's partition in QuestDB metrics_hd and metrics_rsi.
Re-runs backfill metric --engine hd and --engine rsi for that day — both subcommands do their own warm-up before processing, so output values are continuous with the prior day. Writes to QuestDB only.
Runs sync-metrics-hd-to-ch.py and sync-metrics-rsi-to-ch.py to mirror the corrected rows into ClickHouse metrics_hd / metrics_rsi. Charts at 1D+ resolution query CH (not QDB), so without this step the histogram shows tall green+red bars at the boundary where CH has stale zeros while QDB is clean.

3. Verify — pick a known symbol and confirm continuity:

-- QuestDB. Today's first value should equal yesterday's last value
-- (first bar of a new day inherits prev-day frozen DMAPO).
SELECT ts, value FROM metrics_hd
WHERE symbol = 'PIPA' AND ts >= '2026-05-06' AND ts < '2026-05-08'
ORDER BY ts;

Idempotent — safe to re-run. Elapsed time: ~60s warmup + ~25s HD backfill + ~25s RSI backfill. Not the fix: seeding refdata / symbol map — refdata is unaffected; this is purely an in-memory engine-state problem.

9. Troubleshooting

Where to look first

docker logs --since 1h idxmdp-parser
docker logs --since 1h idxmdp-metric-worker
docker logs --since 1h idxmdp-ch-drain

# QuestDB console
# → http://localhost:19000

# ClickHouse client
docker exec -it idxmdp-clickhouse clickhouse-client --database market_data

# Prometheus (backfill + aggregator counters)
# → http://localhost:19090/graph?g0.expr=aggregator_bars_emitted_total
# → http://localhost:19090/graph?g0.expr=backfill_rows_written_total

Symptom	Likely cause	Fix
`Retrieved 0 ticks from QuestDB`	Passed WIB time as UTC	Subtract 7 hours
`Authentication failed`	Wrong `IDX__CLICKHOUSE__*`	`docker exec idxmdp-parser env \| grep CLICKHOUSE`
`QuestDB ILP: connect refused`	Running on host not in container	Use `docker exec idxmdp-parser…`
`No bars found for the given range`	Empty `idx_ohlcv` for window	Run `backfill ohlcv` first
`No hd output produced`	Warm-up window had no prior bars	Widen `--from`

Data Pipeline Flow

End-to-end flow from IDX ITCH feed to client dashboards and AmiBroker plugin

Full source: docs/DATA-PIPELINE-FLOW.md. The Data Pipeline page in the Overview group is a visual summary; this page contains the narrative.

Stage Map

IDX Exchange → RabbitMQ

Binary ITCH v5 feed published to two upstream AMQP exchanges (itchdata, idxdata).

Rust Parser (idxmdp-parser)

Zero-alloc decoder on 9 threads. Consumes from both exchanges with prefetch=500, emits into 9 crossbeam bounded channels for backpressure. 3.1M msg/s at <0.3µs per message.

QuestDB (hot)

6 parallel ILP writers over TCP (port 9009, NODELAY). Batch 500 msgs or 100ms flush. 7 tables: idx_ticks, idx_ohlcv, idx_snapshot, idx_orderbook, idx_index, idx_contracts, metrics_hd.

Redpanda (message bus)

Kafka-compatible single binary. 6 topics with 10 partitions each: idx.ticks, idx.ohlcv, idx.snapshot, idx.index, metrics.hd. Orderbook currently disabled (82% of volume, will re-enable for M4).

Metric Worker (idxmdp-metric-worker)

Kafka consumer group metric-worker-hd. Consumes idx.ohlcv, runs HD 7-step pipeline + RSI-14, writes to QuestDB metrics_hd/metrics_rsi + Redis cache + ClickHouse archive.

ClickHouse Drain (idxmdp-ch-drain)

Consumer group ch-drain. Reads all topics, inserts into ClickHouse ReplacingMergeTree. Typically runs after market close (16:30 WIB).

Go API (idxmdp-api)

Single Fiber binary serving Admin / Portal / Data API on :18090. REST + WebSocket + SSE + TradingView UDF. Reads from QuestDB (recent), ClickHouse (history), Redis (latest metrics).

Clients

TradingView charting library (web), AmiBroker plugin (desktop), admin HTMX panel.

Topic & Table Matrix

Data	Kafka topic	QuestDB table	ClickHouse table
Raw trades	`idx.ticks`	`idx_ticks`	`idx_ticks`
1m bars	`idx.ohlcv`	`idx_ohlcv`	`idx_ohlcv`
Snapshots	`idx.snapshot`	`idx_snapshot`	`idx_snapshot`
Orderbook	deferred	`idx_orderbook`	—
Index	`idx.index`	`idx_index`	`idx_index`
HD metric	`metrics.hd`	`metrics_hd`	`metrics_hd`

Retention

Tier	QuestDB	ClickHouse
Ticks / snapshots / orderbook	3 days	unlimited
1m OHLCV	14 days	unlimited
Metrics (HD / RSI)	14 days	unlimited

Partition TTL runs as a cron job after market close. See Operations Guide §5.

Parser Tech Spec

Rust ITCH parser — 9-thread pipeline, zero-alloc decoder, 3.1M msg/s

Full source: docs/PARSER-TECH-SPEC.md.

Architecture — 9 Thread Pipeline

RabbitMQ ──▶ Consumer (x2, prefetch=500)
                  │
                  ▼
           Parser threads (x1, zero-alloc decoder)
                  │
        ┌─────────┼──────────┐
        ▼         ▼          ▼
   Aggregator  ILP Writers  Kafka Producer
   (1m bars)   (x6, NODELAY) (async)
        │         │          │
        ▼         ▼          ▼
     idx_ohlcv  6 QDB tables Redpanda (6 topics)

Supported Message Types

Type	Name	Purpose
1	Trade	Price / volume / board (`RG` / `NG` / `TN`)
2	Snapshot	OHLC + volume + value per ticker
3	Orderbook Bid	Top-of-book bid update
4	Orderbook Ask	Top-of-book ask update
5	Index	IHSG and sector indices
6	Contract	Instrument metadata
9	Heartbeat / status	Feed health

Performance

Metric	Value	Notes
Parse rate	3.1M msg/s	Criterion bench, fat LTO + native CPU
Per-message	<0.3 µs	Zero allocations on the hot path
Live load	~12K msg/s	250x headroom vs live feed
Channel capacity	65K / 65K / 256–8K	Consumer → parser → writers

Board Filter

IDX has three boards: RG (Regular — real market price discovery), NG (Negotiated — off-market), TN (Tunai / Cash). Only RG ticks flow into idx_ohlcv. Filtering out NG/TN matches the Go reference implementation and prevents negotiated trades from distorting OHLCV.

Build Flags

# Cargo.toml release profile
[profile.release]
lto = "fat"
codegen-units = 1
panic = "abort"

# Build command
RUSTFLAGS="-C target-cpu=native" cargo build --release

Bench & Testing Guide

Criterion benchmarks, TSV history tracking, HD accuracy fixtures

Full source: docs/BENCH-GUIDE.md. All commands run from the repo root.

Run Benchmarks

# Full bench suite (from host, NOT container)
cd rust-workers && cargo bench

# Record a bench point with a note in docs/bench-history.tsv
./scripts/bench-record.sh "hd engine v2 with EMA fast-path"

# View historical trend
cat docs/bench-history.tsv

HD Accuracy Fixture

The HD engine has a 100% accuracy gate in CI: the Rust implementation is compared byte-for-byte against the Go reference across 606 tickers. The fixture lives at fixture/hd-accuracy.csv.

# Run the accuracy test locally
cd rust-workers && cargo test --release hd_accuracy

# If it fails, diff is written to /tmp/hd-diff.csv

Unit Tests

# Rust workers
cd rust-workers && cargo test --release

# Go API
cd go-api && go test ./...

# Go API with race detector
cd go-api && go test -race ./...

Replay Fixtures

# Replay a recorded ITCH dump into the live pipeline
docker exec idxmdp-parser replay /etc/idx-parser/fixture/itch-low26.txt

# Large fixture (10.3M messages)
docker exec idxmdp-parser replay /etc/idx-parser/fixture/full-day.txt

Project Status

Stage-by-stage progress snapshot — what's done, what's next, what's on hold

Full source: docs/PROJECT-STATUS.md. See also the Project Stages page for the detailed phase tree inside each stage.

Current Status

Completed

S1–S6, S9

Parser, Metrics, API, Auth, Monitoring

S10

Client dashboard

On Hold

S7, S8, S11

Screener, Scraper, Payments

Stage Roster

Stage	Scope	Status
S1	Parser Backbone — Rust ITCH decoder, QuestDB ingestion, OHLCV aggregator	DONE
S2	Metric Pipeline — HD (7-step, 100% Go match), RSI-14, Redpanda, ClickHouse drain, backfill tool	DONE
S3	API — Go/Fiber single binary with Admin / Portal / Data API, 16 Postgres tables	DONE
S4	Telegram Bot — status alerts, admin commands	DONE
S5	TradingView charting, HD/RSI chart overlays	DONE
S6	RSI engine integration into metric worker + chart	DONE
S7	Screener	HOLD
S8	Broker scraper	HOLD
S9	Monitoring — Prometheus, Grafana, Alertmanager, Telegram, Instatus	DONE
S10	Client dashboard	NEXT
S11	Payments (Xendit)	HOLD

Recent Milestones

2026-04-08 — Operations / Monitoring guides expanded; backfill-guide quality fixes (TL;DR, RSI warm-up note, troubleshooting "where to look").
2026-04-07 — S2-B5 HD accuracy fixture at 100% match vs Go reference across 606 tickers.
2026-04-03 — S3 API DONE: 1 binary, 3 domains, 16 PG tables, HTMX admin, Xendit prep.
2026-04-02 — S2 backfill tool + CI/CD gate DONE.

Enterprise Code Review

Comprehensive E2E audit — 2026-04-12 — Branch: feat/ch-qdb-hybrid-query

This review covers the full platform: Rust workers, Go API, infrastructure, security, and end-to-end data pipeline. Findings are grouped by severity with reasoning and file references.

Audit Scope

Five parallel review agents examined distinct domains simultaneously. Each agent performed an independent deep read of all files in its domain.

Domain	Agent Focus	Files Reviewed
Rust Workers	Consumer, parser, aggregator, db_writer, engines, ch_drain, metric_worker	18 files
Go API	Handlers, middleware, DB clients, cache, audit, tier system, tests	25+ files
Infrastructure	Docker Compose (dev/staging/prod), Dockerfiles, schemas, monitoring	15 files
Security	OWASP Top 10, auth/authz, credential management, injection surfaces	All public endpoints + middleware
Data Pipeline	End-to-end trace from Kafka ingest to API response	13 files in data path order

Findings Overview

Critical

Fix before Monday trading

Important

Fix this sprint

Minor

Backlog

Total

Unique de-duplicated findings

Data Flow with Failure Points

The diagram below traces a single market tick from vendor Redpanda through every processing stage to the client API response. Red markers indicate where data can be silently lost.

Vendor Redpanda (port 19092, SASL SCRAM-SHA-256)

Two topics: itchdata (snapshots, orderbook, index) and idxdata (trade ticks). Two independent consumer threads with separate group IDs. If one topic crashes, the other keeps flowing.

consumer.rs → crossbeam channel (65,536 cap)

Backpressure via pause()/resume() on Kafka partitions. When the channel is full, the consumer pauses fetching and retries with 50ms sleep. Auto-commit is true (5s interval) — offset commits happen based on wall time, not processing completion.

Backpressure: correct

parser.rs → aggregator.rs

Pipe-delimited ASCII parsing. Trade ticks use blocking send (never dropped). Snapshots, orderbook, and index use try_send — dropped if channel full.

Snapshot/Index: Kafka published BEFORE drop check OHLCV bars: silent let _ = try_send

db_writer.rs → QuestDB ILP (TCP)

ILP over TCP with reconnect-once-then-drop. If QuestDB is down and the second write attempt also fails, the entire batch is permanently lost. The caller always calls buf.clear() regardless of success.

CRITICAL: batch lost on double-write failure

kafka_producer.rs → Internal Redpanda (29092)

Fire-and-forget publish. No backpressure propagation from internal Redpanda back to the consumer. If internal Redpanda is full, messages are silently dropped by librdkafka’s internal queue.

ch_drain.rs → ClickHouse

Consumes from internal Redpanda. Batches 1000 rows, HTTP INSERT with 3 retries. Two critical bugs: (a) auto-commit fires before CH insert confirms — on crash, offset is advanced but rows never reach CH. (b) buf.rows.clear() runs unconditionally even when insert fails — rows are permanently discarded after 3 retries.

CRITICAL: auto-commit before insert CRITICAL: rows cleared on failure

metric_worker.rs → HD/RSI Engines → QuestDB + Redis

Warm-up from ClickHouse on startup (entire history buffered into RAM — OOM risk). auto.offset.reset=latest means fresh consumer groups skip all historical messages. Sinks (QDB + Redis) errors are logged but offset is committed anyway.

OOM risk on warm-up Stale-seeded metrics post-restart

Go API → Hybrid CH + QDB Queries → Client

Split at now-1h: ClickHouse for cold data, QuestDB for hot. QDB wins on overlap. MetricHistory incorrectly calls QueryOHLCV (price bars) for the CH window instead of QueryMetrics (HD/RSI values).

MetricHistory CH window returns wrong data type

Severity Definitions

Level	Definition	SLA
CRITICAL	Data loss, security breach, or build failure. System is actively vulnerable or losing data.	Fix before next trading session
IMPORTANT	Reliability risk, correctness bug, or defense-in-depth gap. Not actively exploited but will cause incidents.	Fix within 1 week
MINOR	Code quality, latent risk, or operational improvement. Low probability of triggering.	Backlog

Data Pipeline Audit

End-to-end trace from Kafka ingest to API response — every failure point identified

C4: ch_drain Auto-Commits Before ClickHouse Insert

CRITICAL rust-workers/src/bin/ch_drain.rs:313-314

What happens: The ch_drain Kafka consumer uses enable.auto.commit = true with a 5-second interval. The consumer accumulates rows in memory and flushes to ClickHouse when the batch reaches 1,000 rows or 5 seconds pass. Since auto-commit fires on wall time (not on successful insert), there is a race window on every cycle.

Failure scenario:

Consumer polls and buffers 800 rows over 4.5 seconds
At 5.0 seconds, Kafka auto-commit fires — offsets for those 800 messages are committed to the broker
At 5.0 seconds, the flush timer also fires — HTTP INSERT to ClickHouse begins
ClickHouse returns 503 (overloaded). All 3 retry attempts fail
The 800 rows are discarded (see C5 below). The offsets are already committed. Those messages will never be replayed

Impact: Up to 1,000 rows per table per event are permanently lost from ClickHouse cold storage. The user sees gaps in historical charts that don’t appear in QuestDB (which got the data via direct ILP write).

Fix: Switch to enable.auto.commit = false and commit offsets manually after ch.insert_json_rows() succeeds. The metric_worker already does this correctly at worker/mod.rs:177.

C5: ch_drain Clears Row Buffer on Insert Failure

CRITICAL rust-workers/src/bin/ch_drain.rs:424-431

What happens: After the insert attempt (success or failure), buf.rows.clear() and buf.last_flush = Instant::now() execute unconditionally — they are outside the match arms.

match ch.insert_json_rows(&database, buf.table, &buf.rows).await {
    Ok(()) => { total_inserted += count; ... }
    Err(e) => { total_errors += count; ... }
}
// OUTSIDE the match — runs regardless:
buf.rows.clear();          // ← rows gone forever
buf.last_flush = Instant::now();

Combined with C4: This guarantees that any ClickHouse hiccup results in permanent data loss. The rows cannot be retried (cleared from memory) and cannot be replayed from Kafka (offset already committed).

Fix: Move buf.rows.clear() into the Ok(()) arm only. On failure, leave rows in the buffer for retry on the next flush cycle.

C6: db_writer Drops Entire Batch on QuestDB Double-Failure

CRITICAL rust-workers/src/db_writer.rs:148-156

What happens: The ILP writer attempts a TCP write. On failure, it reconnects and retries once. If the retry also fails, the function returns without error — and the caller’s macro always calls buf.clear() afterward.

fn flush(stream: &mut TcpStream, buf: &str, addr: &str, label: &str) {
    if let Err(e) = stream.write_all(buf.as_bytes()) {
        *stream = connect(addr, label);             // reconnect
        if let Err(e2) = stream.write_all(buf.as_bytes()) {
            tracing::error!("{}: retry write failed, {} bytes lost", ...);
            // ← returns normally, caller will .clear() the buffer
        }
    }
}

Impact: A QuestDB outage lasting more than one flush cycle (typically seconds) causes silent loss of ticks and OHLCV bars from the hot store. Since the parser doesn’t retry at the channel level, these rows are gone.

Fix: Return Result from flush(). On failure, skip buf.clear() so the batch is preserved for the next flush attempt. Add a idx_parser_db_writer_rows_lost_total counter.

I1: Aggregator OHLCV Bars Silently Dropped

IMPORTANT rust-workers/src/aggregator.rs:277-279

What happens: Completed OHLCV bars are sent via let _ = self.ohlcv_tx.try_send(cb). The let _ = pattern discards the Result — if the channel is full, the bar vanishes with no log, no metric, no alert.

Why it matters: A slow QuestDB writer causes the OHLCV channel to fill. Bars dropped here affect both QuestDB and ClickHouse (since the Kafka producer is downstream). Unlike tick drops (logged with a warning), OHLCV drops are completely invisible.

Fix: Log on drop and increment idx_parser_aggregator_ohlcv_drops_total.

I3: Snapshot Published to Kafka Before QDB Drop Check

IMPORTANT rust-workers/src/pipeline.rs:251,272

What happens: The Kafka publish (kp.send_snapshot) fires before the try_send to the QuestDB channel. If the QDB channel is full, the snapshot is sent to Kafka (and eventually to ClickHouse via ch_drain) but never reaches QuestDB.

Impact: ClickHouse has snapshot/index rows that QuestDB doesn’t. Any query that reads from QuestDB exclusively will see gaps. The same pattern affects idx_index messages.

I4: Orderbook Depth Indexing Mismatch

IMPORTANT rust-workers/src/db_writer.rs:82 vs ch_drain.rs:212

Destination	Depth Index	Best Bid/Ask
QuestDB (ILP writer)	0-based (`enumerate()` starts at 0)	depth = 0
ClickHouse (ch_drain)	1-based (`i + 1`)	depth = 1

Any cross-database join or comparison on orderbook depth is off-by-one. Screener features that merge both sources will silently mis-label price levels.

I5: MetricHistory Returns Wrong Data Type from ClickHouse

IMPORTANT go-api/internal/handler/data.go:337-355

What happens: The MetricHistory handler (for HD/RSI values) calls ch.QueryOHLCV() for the ClickHouse time window. QueryOHLCV reads from idx_ohlcv (price bars: open, high, low, close, volume) — not from metrics_hd or metrics_rsi (metric values: value, mapo, direction).

Impact: For any request where the date range extends more than 1 hour into the past, the CH portion returns OHLCV price bars instead of HD/RSI metric values. The QDB portion (last hour) is correct. The merged response is structurally valid JSON but semantically wrong — the client chart displays price data where it expects metric indicator values.

Fix: Create ch.QueryMetrics(ctx, symbol, metric, from, to) that reads from metrics_hd/metrics_rsi and call it from MetricHistory.

I10: HDChart hotCut Computed Per-Goroutine

IMPORTANT go-api/internal/handler/hdchart.go:127

What happens: The HD chart handler spawns 6 goroutines (one per timeframe: 1m, 5m, 15m, 30m, 1h, 1d). Each goroutine computes hotCut = time.Now().Add(-1h) independently. If goroutine scheduling crosses a second boundary, different timeframes use different split points.

Impact: The 6-timeframe chart response has inconsistent overlap boundaries. One timeframe may show a gap or duplicate bar at the split point while others are clean. Visible as chart glitches near the 1-hour mark.

Fix: Compute hotCut once before launching goroutines and pass it as a parameter.

Rust Workers Review

Consumer, parser, aggregator, db_writer, Kafka producer, engines, ch_drain, metric_worker

C2: Credentials in Committed config.toml

CRITICAL rust-workers/config.toml:6,12,34

The Kafka SASL password (bridge2025!) and Redis password (idxmdp_redis_dev_2026) are in config.toml, which is committed to git and COPY-ed into the Docker image (Dockerfile line 72). Anyone with repo access or image registry access can extract these credentials.

Why this matters: The Kafka credentials grant read access to the live IDX market data feed. The Redis password grants access to cached metrics, session data, and tier configuration.

Fix: Replace config.toml values with placeholders. Supply real credentials only via IDX__* environment variables at runtime. The settings loader already supports this — config.toml should be a template, not a credential store.

I6: Unsafe UTF-8 in kafka_producer.rs

IMPORTANT rust-workers/src/kafka_producer.rs:58-98

What happens: The producer uses unsafe { self.buf.as_mut_vec() } to write JSON directly into a String’s internal buffer via serde_json::to_writer. If serde_json encounters an IO error mid-write, the String may contain partial UTF-8, violating its invariant and causing undefined behaviour on any subsequent string operation.

Fix: Replace with serde_json::to_string(tick) which is safe and has negligible cost since the buffer is cloned anyway.

I8: metric_worker auto.offset.reset=latest

IMPORTANT rust-workers/src/worker/mod.rs:81

What happens: When the metric worker starts with a new consumer group (no committed offset), it only processes bars arriving after startup. The warm-up from ClickHouse compensates, but if ch_drain is also lagging, the warm-up produces incomplete state.

Impact: HD/RSI values appear correct after ~14 bars but are stale-seeded for the first few minutes post-restart. This is visible as a small “jump” in the metric chart immediately after parser restart.

I9: warmup.rs Buffers Entire CH History in RAM

IMPORTANT rust-workers/src/worker/warmup.rs:52-53

What happens: resp.text().await? buffers the complete ClickHouse response (all rows from idx_ohlcv) into a single String before line-by-line parsing begins. At 700 symbols × 330 bars/day × N days, this easily reaches hundreds of MB.

Fix: Add WHERE ts >= now() - INTERVAL 30 DAY to bound the warm-up query, or use reqwest::Response::bytes_stream() for streaming line-by-line processing.

I21: String Keys in HD/RSI Engine Hot Path

IMPORTANT rust-workers/src/engine/hd.rs:117

What happens: The engine uses FxHashMap<String, HdTickerState>. The .entry(symbol.to_owned()) call allocates a new heap String on every bar (~1,400/s), even for tickers already in the map. The main parser correctly uses fixed [u8; 16] keys.

Fix: Use SmallVec<[u8; 16]> or a fixed-size array key, matching the aggregator’s pattern.

M1: parse_u32 Wrapping Overflow

MINOR rust-workers/src/parser.rs:403-408

The parser uses wrapping_mul/wrapping_add to avoid panics, but a 10-digit volume string like "5000000000" passes the length guard (len ≤ 10) yet overflows a u32 (max 4,294,967,295). The wrapping produces a silently wrong value (705,032,704 instead of 5,000,000,000).

Fix: Replace with checked_mul(10)?.checked_add(...)? to return None on overflow.

M3: Pipeline Comment Says “RabbitMQ”

MINOR rust-workers/src/pipeline.rs:19

Comment says “Raw bytes from RabbitMQ consumer” — should say Kafka/Redpanda after the migration.

M4: Redis KEYS Command in Ops Dashboard

MINOR rust-workers/src/bin/ops.rs:277

KEYS last:hd:* is O(N) and blocks Redis. Safe at current scale (~2,800 keys) but runs every 10 seconds on the ops dashboard. Replace with SCAN for non-blocking enumeration.

Go API Review

Handlers, middleware, DB clients, cache, audit, tier system, tests

C9: Redis Key Injection via Unvalidated metric Param

CRITICAL go-api/internal/handler/data.go:281

key := "last:" + metric + ":" + symbol

symbol is validated by ValidTicker (regex ^[A-Z0-9]{1,10}$), but metric is taken directly from c.Query("metric") with zero validation. An attacker can supply ?metric=../../session to probe arbitrary Redis key namespaces, potentially reading session tokens or rate-limit buckets.

Fix:

var validMetrics = map[string]struct{}{"hd": {}, "rsi": {}}
if _, ok := validMetrics[metric]; !ok {
    return c.Status(400).JSON(models.Err("invalid metric"))
}

C10: Silent Date Parse Failure Bypasses Tier Limits

CRITICAL go-api/internal/handler/data.go:322-323

from, _ := time.Parse("2006-01-02", fromStr)
to, _   := time.Parse("2006-01-02", toStr)

Parse errors are silently discarded. When time.Parse fails, it returns zero-time (year 0001-01-01). The history-depth clamp on line 327 clamps this to earliest = now - HistoryDays, which works for non-enterprise tiers. But enterprise tiers with IsUnlimitedHistory() == true skip the clamp entirely, passing year-0001 to the database and returning the entire ClickHouse history.

Compare with: OHLCVCached (lines 169-175) correctly returns HTTP 400 on parse failure.

Fix: Return 400 on bad date, matching the existing pattern.

I2: indexCache Thundering Herd

IMPORTANT go-api/internal/handler/udf.go:56-75

What happens: The index name cache uses a read-unlock → check-freshness → re-lock-and-write pattern with no singleflight guard. When the 5-minute TTL expires, all concurrent TradingView chart loads simultaneously call qdb.Indices(ctx) instead of just one. TradingView fires multiple /udf/symbols and /udf/search requests per chart load.

Fix: Use sync.Once or golang.org/x/sync/singleflight to deduplicate concurrent refreshes.

I7: Tier Subscriber Has No Reconnect

IMPORTANT go-api/internal/config/tier_subscriber.go:41-44

When the Redis pubsub channel closes (Redis restart, network blip), the subscriber goroutine logs a warning and exits permanently. After this, no tier hot-reloads will be applied to this API instance until the process restarts. This is the only background job without reconnect logic — StartAuditReplay and StartKeyExpiry both have retry loops.

Fix: Wrap the subscribe/listen loop in an outer reconnect loop with exponential backoff.

I20: Admin Self-Promotion of Tier

IMPORTANT go-api/internal/handler/admin/users_edit.go:109-111

The admin UserEdit handler correctly gates role changes to superadmin-only. But tier changes have no such restriction. An admin can pass their own user ID and promote their account tier to “enterprise”, bypassing billing entirely.

Fix: Prevent admins from editing their own tier, or require superadmin role for tier changes.

Other Important Findings

ID	Finding	File
I17	`TokenRefresh` doesn’t verify API key is still active before issuing new JWT	`handler/auth.go:59-90`
I19	`generateSecureToken` ignores `rand.Read` error — zero-entropy token on failure	`handler/auth_session.go:440-443`
I6 (audit)	Partial JSONL write on `WriteByte('\n')` failure corrupts fallback file	`audit/async.go:117-135`

Minor Findings

ID	Finding	File
M5	UDF W/M resolutions silently alias to 1d instead of returning error	`handler/udf.go:17-29`
M6	CSRF cookie missing `Secure` flag	`middleware/csrf.go:29-35`
M7	Session cookie missing `Secure` flag	`handler/auth_session.go:83-90`

Security Audit

OWASP Top 10 — authentication, authorization, injection, credential management

C1: Telegram Bot Token in .env.example

CRITICAL .env.example:151

TELEGRAM_BOT_TOKEN=8715922974:AAHWx7cmM6WL1QD2CfMoqzZDwTPRVMA5a0s

.env.example is explicitly allowed through .gitignore (line 6: !.env.example), meaning it is committed and visible in git history. This token follows the exact structure of a real Telegram bot API token. Possession grants full bot control: receiving alert messages, sending to channels, enumerating chat IDs.

Immediate action: Revoke via @BotFather (/revoke), generate new token, replace in .env.example with CHANGE_ME_YOUR_BOT_TOKEN.

C3: /ops/* Endpoints Have Zero Authentication

CRITICAL go-api/cmd/server/main.go:177-180

Four routes registered before the JWT auth group:

app.Get("/ops/latency", handler.OpsLatencyPage())
app.Get("/ops/ranking", handler.OpsRankingPage())
app.Get("/ops/api/latency", handler.OpsLatencyAPI(qdb, ch, rdb, pool))
app.Get("/ops/api/ranking", handler.OpsRankingAPI(qdb))

/ops/api/latency returns: QuestDB table names and row counts, ClickHouse row counts, Redis DB size, PostgreSQL pool stats, Redpanda consumer group lag with topic names. This is a full infrastructure inventory that directly aids targeted attacks.

Fix: Wrap in SessionAuth + RequireSessionRole("admin").

C7: ClickHouse Has Empty Password

CRITICAL .env:31, docker-compose.dev.yml:98

The default ClickHouse user operates with no password. The clickhouse-users.xml network restriction allows connections from the entire Docker subnet 172.0.0.0/8. Any container on any bridge network on the host can query, insert, or drop all market data tables with no authentication.

C8: Prometheus Metrics Exposed Externally

CRITICAL go-api/cmd/server/main.go:358

http.ListenAndServe(":2112", mux)

Bound to 0.0.0.0:2112 and published via Docker. The /metrics endpoint leaks rate limit counters, session counts, active WebSocket connections, audit buffer stats. For a financial SaaS, this is operational intelligence for an attacker.

Fix: Bind to 127.0.0.1:2112. Let Prometheus scrape via internal Docker network only.

I15: /udf/history Bypasses All Auth and Tier Limits

IMPORTANT go-api/cmd/server/main.go:199-204

The TradingView UDF endpoints are entirely unauthenticated. /udf/history proxies to the same QuestDB and ClickHouse backends as the paid /v1/ohlcv endpoint, returning full OHLCV data with no tier enforcement, no rate limiting, and no history depth limits. Anyone who reverse-engineers the UDF URL gets free access to data that paying customers pay for.

Fix: Gate the UDF group behind SessionAuth, or apply the same tier-based limits from OHLCVCached to UDFHistory.

I16: Rate Limiter Fails Open

IMPORTANT go-api/internal/middleware/ratelimit.go:39-42

This is a documented design decision, but in a financial SaaS context where tier enforcement is revenue-critical, any Redis disruption (OOM, network partition, or key eviction under allkeys-lru) makes all rate limits disappear. A free-tier user becomes unlimited.

I18: Plugin Report Accepts Unauthenticated Data

IMPORTANT go-api/internal/handler/plugin.go:159-208

POST /v1/plugin/report is outside the JWT group. An unauthenticated attacker can write arbitrary strings into the reports table and inject content into structured log output. If logs are forwarded to a SIEM, this is log injection.

Other Security Findings

ID	Severity	Finding
I17	IMPORTANT	`TokenRefresh` does not check if API key is still active before issuing new JWT
I19	IMPORTANT	`generateSecureToken` ignores `rand.Read` error — potential zero-entropy token
M8	MINOR	Failed login attempts not audited (OWASP A09)
M6/M7	MINOR	CSRF + session cookies missing `Secure` flag

Infrastructure Review

Docker Compose, Dockerfiles, schemas, monitoring, networking

C11: go-api/Dockerfile References Go 1.25 (Does Not Exist)

CRITICAL go-api/Dockerfile:1

FROM golang:1.25-alpine AS builder

Go 1.25 does not exist (latest stable is 1.24.x as of April 2026). This causes docker build to fail with an image-not-found error. The CI job and dev compose api service both use this Dockerfile. The API image cannot be built.

Fix: Change to golang:1.24-alpine.

C12: Postgres Audit Log Partitions Only Cover Through 2025-06

CRITICAL schema/postgres.sql:91-95

Only two partitions exist: audit_log_2025_01 (Jan 2025) and audit_log_2025_06 (Jun 2025). Today is 2026-04-12. Any audit write with created_at ≥ 2025-07-01 will fail with a PostgreSQL partition constraint violation. All current audit logging is broken.

Fix: Create partitions for the current date range:

CREATE TABLE audit_log_2025_07 PARTITION OF audit_log
  FOR VALUES FROM ('2025-07-01') TO ('2026-01-01');
CREATE TABLE audit_log_2026_01 PARTITION OF audit_log
  FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

I11: Alertmanager Has No Active Receiver

IMPORTANT monitoring/alertmanager/alertmanager.yml:57

The telegram-ops receiver has no telegram_configs (commented out). Alerts fire in Prometheus, reach Alertmanager, and are recorded only in the in-memory UI — no Telegram, no email, no PagerDuty. A QuestDBDown or APIDown critical alert will silently expire after 5 minutes without operator notification.

I12: No Memory/CPU Limits on Any Container

IMPORTANT docker-compose.dev.yml (all 18 services)

No service has deploy.resources.limits. On a single-host deployment, a runaway ClickHouse query or Redpanda log storm can OOM the entire host and take down all 18 containers simultaneously.

I13: Parser Missing depends_on for Outbound Redpanda

IMPORTANT docker-compose.dev.yml:247

The parser publishes to idxmdp-redpanda:9092 but only depends on questdb. If outbound-redpanda is not yet healthy at parser startup, initial publish attempts fail silently.

Other Infrastructure Findings

ID	Severity	Finding
I14	IMPORTANT	`metric-worker` has no Docker health check
M9	MINOR	QuestDB ILP port 19009 exposed to 0.0.0.0 with no auth
M10	MINOR	Staging/prod compose references non-existent `Dockerfile.drain` and `target: runtime`
M11	MINOR	Production QuestDB volume mount path wrong (`/var/lib/questdb` vs `/root/.questdb`)
M12	MINOR	ClickHouse image tag `24.3` is floating — should pin to patch

Action Plan

Prioritised fix schedule with effort estimates

Before Monday Trading (Tonight)

These items are either actively losing data, actively exploitable, or blocking builds. Fix before 08:45 WIB Monday.

#	Action	Files	Est.
1	C1 Revoke Telegram bot token via @BotFather, replace with placeholder in `.env.example`	`.env.example`	2 min
2	C4+C5 Fix ch_drain: disable auto-commit, commit after successful insert, move `buf.rows.clear()` into Ok arm	`ch_drain.rs`	30 min
3	C3 Add auth to `/ops/*` — wrap in `SessionAuth + RequireSessionRole("admin")`	`main.go`	15 min
4	C9 Validate `metric` param — whitelist `{"hd","rsi"}`	`data.go:281`	5 min
5	C10 Return 400 on bad date in `MetricHistory`	`data.go:322`	5 min
6	C12 Create Postgres audit log partitions for 2025-07 through 2027-01	`postgres.sql`	10 min

This Week

Reliability and defense-in-depth improvements. Schedule across sprint.

#	Action	Files
7	C2 Scrub `config.toml` — placeholder values only, rotate `bridge2025!`	`config.toml`, `.env`
8	C6 Fix `db_writer` — return Result from flush, preserve buffer on failure	`db_writer.rs`
9	C7 Set ClickHouse password, restrict network access	`.env`, `clickhouse-users.xml`
10	C8 Bind Prometheus metrics to `127.0.0.1:2112`	`main.go`, `docker-compose`
11	C11 Fix Go version in Dockerfile (`1.25` → `1.24`)	`go-api/Dockerfile`
12	I1 Add drop counters to aggregator OHLCV channel	`aggregator.rs`
13	I4 Fix orderbook depth consistency (0-based everywhere)	`ch_drain.rs`
14	I5 Implement `ch.QueryMetrics()` for MetricHistory CH window	`clickhouse/client.go`, `data.go`
15	I7 Add reconnect loop to `SubscribeTierConfig`	`tier_subscriber.go`
16	I15 Add auth/tier limits to UDF endpoints	`main.go`, `udf.go`

Backlog

#	Action	Files
17	I6 Remove `unsafe` from kafka_producer.rs	`kafka_producer.rs`
18	I9 Bound warm-up query or stream response	`warmup.rs`
19	I10 Compute hotCut once before goroutines	`hdchart.go`
20	I11 Wire Alertmanager Telegram receiver	`alertmanager.yml`
21	I12 Add container resource limits	`docker-compose.dev.yml`
22	I21 Optimize engine hash map keys	`hd.rs`, `rsi.rs`
23	MINOR Remaining 12 minor findings	Various

Review methodology: 5 parallel agents, each with isolated context, ran for ~4-5 minutes each. Total analysis time: ~5 minutes wall clock. Cross-domain findings were de-duplicated manually. Confidence scores ranged from 80-100% for critical findings.