When Postgres isn't the right choice
published:
I've been spending most of my recent unemployment working on the Microwave Propagation prediction website.
The propagation map ingests HRRR weather every hour and re-scores the entire CONUS grid across many amateur radio microwave bands. That is 95,000 grid points * 23 bands * 19 forecast hours = ~41.5 million score values per chain. For a long time those lived in a Postgres table called propagation_scores.
The shape mismatch
Each hourly run did this:
DELETE FROM propagation_scores WHERE valid_time = $1;
INSERT INTO propagation_scores (lat, lon, band_mhz, valid_time, score, factors) ...
475,000 rows per forecast hour (at the time it was only 5 bands, now it would be 2,185,000 per forecast hour), funneled through four indexes (PK UUID, uniqueness tuple, valid_time, (band_mhz, valid_time)), plus a JSONB factors blob for the analysis hour. Scoring + upsert took about 4m 40s per forecast hour. Across 19 hours that's more than an entire hourly cron cycle, so we had to back off to once every 3 hours just to keep up.
I was originally running the Postgres server and app on a dell r630 server I have running proxmox on some raid5 SAS spinny drives. It quickly became obvious IOPS were the primary issue. The first mitigation I applied was to move the database to an nvme backed server.
The problem wasn't Postgres. Postgres is an amazing database when you use it for what it's good at. The problem is trying to store ephemeral data that rotates once an hour. AUTOVACUUM couldn't keep up even on the faster nvme, so ANALYZE never got run. It compounded every hour.
The map is a canvas heatmap. It wants one dense (lat, lon) -> score array for a single (band, valid_time). We were storing it as half a million row tuples with MVCC bookkeeping, toast pages, and a B-tree index per column we'd never query by. Every write rematerialized metadata the reader throws away.
Before rewriting the store, we tuned what was there:
- DELETE + INSERT ... with no ON CONFLICT. The chain worker rewrites a full (valid_time, all bands) slice every hour, so conflict detection was pure waste. Skipping it shaved time off.
- Skip factors on forecast hours. factors is a ~200-byte JSONB per row describing the ten scoring components (rain, humidity, refractivity, etc.). We only show it on the analysis hour (f00) when the user clicks a cell. Passing factors: nil for f01–f18 skips a JSONB encode plus a toast write for roughly half the volume.
- UNLOGGED table. Writes bypass the WAL entirely. On unclean shutdown the table truncates — fine, because PropagationGridWorker rebuilds it from HRRR every hour. Durability was never the point, this was a cache with extra steps.
These helped, but the per row overhead of a row oriented store against a dense numeric grid is a ceiling you can't tune past. I was still running into issues with Postgres not being able to keep up.
Enter the binary file
Each (band_mhz, valid_time) grid now lands in one file at /data/scores/{band}/{iso}.ntms that's mounted over NFS by the pods:
magic : 4 bytes "NTMS"
version : 1 byte 0x01
band_mhz : 4 bytes uint32 LE
valid_time : 8 bytes int64 unix seconds LE
lat_min : 4 bytes float32 LE
lon_min : 4 bytes float32 LE
step_deg : 4 bytes float32 LE
n_rows : 2 bytes uint16 LE
n_cols : 2 bytes uint16 LE
scores : n_rows * n_cols bytes (uint8, 0–100; 255 = no-data)
33-byte header, dense uint8 array. A full CONUS grid serializes to ~93 KB per band per hour. A cell at (lat, lon) is at byte offset row * n_cols + col, so a point lookup is arithmetic plus a single File.read/1. No index, no query planner, no lock manager.
Writes go through the temp-then-rename pattern:
tmp = path <> ".tmp." <> unique_suffix()
File.write!(tmp, binary, [:binary])
File.rename!(tmp, path)
With Postgres off the hot path, the full f00–f18 chain dropped from ~170 min to ~45–60 min, so we ran the cron back up to hourly. The 15-minute prune cron reaped expired files with rm which is instant.
The f00 factor breakdown
One thing binary files don't handle well is the analysis-hour factor breakdown (the 10-component panel you see when you click a cell). Those factors are heterogeneous Elixir maps, not a numeric grid.
The solution was a second file format, ProfilesFile: one compressed ETF file per valid_time at /data/scores/profiles/{iso}.etf.gz, storing the enriched HRRR grid keyed by {lat, lon}. On a click, we reload the relevant cell's profile and rescore on demand. Same atomic-rename write pattern, different payload — ETF where heterogeneity matters, raw binary where density does.
Summary
The whole propagation project has been fun to write and tweak continuously. I continue to find and fix various bottlenecks and limitations. Each individual component in the stack is great at what it does, but sometimes when you start throwing millions of rows per hour and then deleting them it gets a little fussy.