How I Tried to Break My Own Encrypted Journaling App — Ten Times
An authorized, ten-round penetration test of MoodHaven Journal — 65+ targets, 41 confirmed-and-fixed vulnerabilities, and the bespoke tooling it took to prove the encryption actually holds.
Contents
- The short version (read this part even if you read nothing else)
- Why bother attacking your own app?
- How the testing actually worked
- The headline findings (in plain language)
- The ninth round — when the custom attack tool started finding bugs by itself
- The tenth round — attacking the ninth round’s fixes, then a clean pass
- The tools I built to do this
- What the testing didn’t find (and why that counts)
- The ten rounds at a glance
- Lessons worth keeping
- Why this is a portfolio piece, not a postmortem
An authorized penetration test of MoodHaven Journal, run across ten iterative rounds with a Kali attack box, Windows and Ubuntu victims, and Claude Code as the orchestrator. 65+ targets tested; 41 real vulnerabilities confirmed through the seventh round, all 41 fixed; an eighth round that found the flagship encryption feature had never actually engaged — now verified working on the installed Windows build; a ninth round that turned a custom attack tool on the app and found six more; and a tenth round that red-teamed the ninth round’s fixes themselves, found a bug in each one, fixed both, and then — after an independent verification hunt came back clean — closed the campaign.
The short version (read this part even if you read nothing else)
I build MoodHaven Journal, a private journaling app that keeps everything on your own computer — no accounts, no cloud, no servers reading your entries. The whole point of the app is privacy, so “trust me, it’s secure” was never going to be good enough. I needed to actually try to break it.
So I built a small attack lab — a dedicated attacker machine running Kali Linux, plus two “victim” machines (a Windows 11 PC and an Ubuntu PC) running the real, installed app — and ran a structured penetration test against it, the same kind of adversarial testing a security firm would do. I did it not once but ten times in a row, fixing what I found between each round and then attacking the fixed version again — and the final round attacked the previous round’s fixes specifically. And to do it properly I had to build my own instrumentation — including a from-scratch reimplementation of the app’s own encrypted sync protocol — because off-the-shelf scanners simply can’t speak to a custom local-first app like this. (There’s a whole section on the tools further down; building them turned out to be half the work.)
Here’s the part that matters:
- 65+ specific attacks were attempted. Each one was a real probe, not a checklist tick.
- 41 of them found a genuine vulnerability through the seventh round. Some were serious (a way to read your data, a way to lose your edits silently). Some were small.
- All 41 were fixed, with the code changes tied to specific releases and pull requests.
- The eighth round found the big one: a flagship “encrypted at rest” feature that had never actually engaged. That fix is now verified working end-to-end on the installed Windows build — fresh setup completes the migration, the database is genuinely encrypted on disk, the app unlocks cleanly, and the sync server starts. (Re-validation on Linux is still pending, and I say so wherever it comes up.)
- The ninth round turned my own custom attack tool on the app and found six more issues — including one that surfaced live during the attack, not in code review. All six are fixed and committed, proven by standalone reproductions and regression tests.
- The tenth round did the thing that finally let me stop: it red-teamed the ninth round’s fixes themselves — because a fix is new, untested code — and found a real bug in each of the two trickiest ones. Both were fixed, and then an independent verification hunt came back clean (no new high-severity finding, every happy path intact). That’s what closed the campaign.
- The remaining attacks failed — which is its own kind of good news. They proved that defenses I’d designed actually held up under attack.
The most interesting finding wasn’t a single bug. It was a pattern: round after round — including the most serious findings — uncovered a new problem that an earlier round’s fix had accidentally introduced. The eighth round delivered the sharpest example of all: a flagship “encrypted at rest” feature, added to fix an earlier round’s “your database is readable” finding, that had never actually engaged in any build, on any operating system. The ninth round kept the pattern alive — the very fixes for the eighth round’s encryption work introduced two new data-loss-class bugs of their own. And the tenth round proved the pattern was the method, not an accident: I deliberately attacked the ninth round’s two fixes, and each one had a bug hiding in it. Fixing a security issue is itself a code change, and any code change is new ground that hasn’t been attacked yet. That insight — that you can’t fix your way to “done” in one pass, and that a fix’s own existence is not evidence it works — is the spine of this whole project.
It also tells you how a project like this ends honestly. You don’t reach zero bugs; you converge. The externally-reachable attack surface — what someone on your network or hitting your app from outside can touch — went to zero. The bugs that remained were local-access and lockout-class: they need a thief who already has your unlocked machine or a device you once trusted, and the worst they do is deny you access, not hand your journal to anyone. And the one invariant the whole app exists to protect — that nobody, in any round, could read journal content they weren’t meant to — held across all ten rounds. When the tenth round’s fresh hunt for new high-severity bugs came back empty while the happy paths still worked, that was the signal the loop had terminated, not a failure to keep finding things.
If you’re a hiring manager or a non-technical reader, you can stop here with a fair picture: this was rigorous, honest, finish-the-job security work, done with a modern AI-assisted workflow on bespoke tooling I built for the job — including the humbling discovery that a feature I’d shipped and believed in was silently inert, and a final round that proved the test loop terminates when you keep attacking your own fixes until a clean pass comes back. The rest of this post is the detailed version for people who want to see the actual vulnerabilities, the tools, and how the testing worked. The plain-language story stands on its own; the deep-dives below are folded away behind “For the technically inclined” toggles, so you can open exactly as much depth as you want and no more.
Why bother attacking your own app?
I designed MoodHaven to be secure from day one: every journal entry is encrypted on your device with AES-256-GCM before it ever touches disk, the encryption key is derived from your password and never stored, and there’s no server in the middle that could be breached.
But there’s a trap in that sentence. I designed it to be secure. Reading your own code and nodding along is not the same thing as attacking it. You see what you intended to build, not what you actually built. The gap between those two is exactly where vulnerabilities live.
After I shipped a major peer-to-peer sync feature — letting two of your own devices exchange entries directly over your home network, with no cloud — that gap started to bother me. Sync means a network protocol, a pairing handshake, cryptographic key exchange — a lot of new surface. I’d reasoned about it carefully. I hadn’t attacked it.
So I decided to treat my own app the way an external security auditor would: assume nothing, try everything, and only believe a defense works after I’ve failed to break it.
How the testing actually worked
This is the part I’m most proud of from an engineering standpoint, so I want to be concrete about the setup.
The lab
My laptop (Linux) — Claude Code, acting as the test orchestrator
│
├── attacker: Kali Linux — the standard penetration-testing toolkit
├── victim 1: Windows 11 — the real app, installed from the production installer
└── victim 2: Ubuntu — same app, different OS, to catch platform-specific bugs
The attacker and victim machines share a local network, because a lot of the most interesting tests target the peer-sync feature, which is a real network protocol. I drove everything from one terminal on my laptop.
A few deliberate choices here matter:
- I tested the installed app, not a dev build. The thing I attacked is the same artifact a real user downloads. Several of the most important findings only exist in the packaged build (file permissions, encryption-at-rest, OS-level key storage).
- Two operating systems, on purpose. Some of the worst bugs in this whole campaign were Windows-only — for example, a file-locking quirk that silently broke the database encryption migration on Windows while working perfectly on Linux. If I’d only tested one OS, I’d have shipped a serious bug believing it was fixed.
Why use an AI orchestrator (Claude Code)
The novel part of this workflow is that I used Claude Code as the orchestrator running the campaign. In plain terms, that meant the AI could:
- Read the entire codebase and flag plausible weak points before any machine was touched.
- Connect to the attacker and victim machines, write small exploit scripts, run real attack tools, and read back the results.
- Run several independent investigations in parallel — for example, one thread auditing the network protocol while another checked how encryption keys are wiped from memory and a third reviewed access-control rules — then pull the findings back together.
- Write the actual fix, in the same session, while the full context of the bug was still in view.
What it could not do is just as important, and I’ll be honest about it:
- It can’t click around a remote app’s window. Tests that needed the app to be in a specific on-screen state (a pairing dialog open, the lock screen showing) had to be set up by hand. A couple of tests are still marked “deferred” for exactly this reason.
- It refuses to actually steal secrets, even in authorized testing. When a test would have involved exfiltrating a real encryption key, the safety system blocked it regardless of my stated intent. That’s a sensible guardrail — but it means “the key is sitting in memory” had to be confirmed by other means rather than by walking out the door with it.
So this was not “AI does security for you.” It was AI as a tireless, fast, well-read collaborator that kept attack state straight across multi-day sessions, with a human keeping it pointed at the right targets and making the judgment calls.
For the technically inclined — the method, one sentence per step
- Read first. Static analysis of the code to generate candidate weaknesses.
- Actually attack each candidate — a real exploit attempt, not just “the code looks risky.”
- Watch the wire. Capture real network traffic; it reveals things code review never will.
- Look in memory. Dump the running process in locked, unlocked, and just-locked states and search for key material.
- Distinguish “the risky pattern exists” from “the attack actually works.” Many candidates were already defended.
- Fix in the same session, while the context is warm.
- Run the whole thing again against the fixed build — because fixes introduce new code, and new code hasn’t been attacked.
The lab connectivity used Tailscale for remote SSH from the laptop to both victims, with the two victims also sharing a 192.168.1.0/24 LAN adapter for the direct network attacks the peer-sync tests required.
The headline findings (in plain language)
You don’t need to read code to understand what was at stake. Here are the most significant confirmed issues, described for a general reader, with the technical detail tucked into a toggle for anyone who wants it.
Your data was encrypted — but the story around it wasn’t
The journal entries themselves were always properly encrypted. But when I copied the app’s database file off the victim machine and opened it with a standard database tool, I could still read a lot: which days you wrote, your daily mood scores, and — most sensitively — your tag names. Tags like “therapy,” “anxiety,” or a person’s name are themselves revealing, even if the entry text is scrambled. Someone who stole the database could reconstruct a detailed behavioral profile without decrypting a single word.
There’s a sting in the tail here, and it earns its own section below. Several rounds later, the eighth round of testing discovered that this very fix — the database encryption everyone, including earlier pentest rounds, believed was working — had never actually engaged (and has since been fixed and verified). See “The flagship encryption feature that never turned on” at the end of this section.
For the technically inclined — what sat in the plaintext SQLite file, and the offline-cracking path it opened
Entry content was AES-256-GCM ciphertext, but the surrounding metadata sat in a plain, readable SQLite file:
bookstable: names, colors, emojis, creation datestagstable: all tag names (can reveal sensitive categories like “therapy,” “anxiety”)entry_tags: which entries carry which tagssettingstable: all preference keys and valuesjournal_entries: mood score (1–5), creation/update timestamps, privacy mode, book assignment, pin status
Worse, the settings table held the password_hash and password_salt rows in plaintext. Those are readable with nothing more than file access, and they enable an offline dictionary attack: the hash is PBKDF2-HMAC-SHA256 with 600,000 iterations — slow by design, impractical against a strong unique password, but crackable against common or short passphrases with a GPU and hashcat. The hash parameters are stored alongside the hash, so a targeted attack needs no guessing about the KDF.
Fix: full database encryption with SQLCipher, keyed from the user’s password. With the whole file encrypted at rest, the metadata leak and the offline-cracking path close at once. Shipped in v1.7.1. (Read on for the sting: years of rounds later, this fix turned out never to have engaged.)
A single root cause hiding behind three “separate” problems
Three of the findings looked like three independent tasks: the readable database, the exposed password hash, and a way to reset the app’s brute-force lockout by deleting a file. But all three had one root cause — the database wasn’t encrypted at rest. Fixing that one thing resolved all three.
For the technically inclined — why the lockout-bypass finding is less critical than it looks
The lockout (5 failed password attempts → 30-second ban) is persisted in pw_lockout.json in the same AppData directory as the database. Deleting the file resets the counter — confirmed by writing a fake lockout valid until 2099 and then removing it in one command.
But this finding is less critical than it looks in isolation, and the calibration matters. The lockout only protects against online brute force — guessing passwords through the running app. An attacker with file access doesn’t need the app at all: they take the plaintext password hash and crack it offline, on their own machine, with no rate limit possible no matter what the lockout file says. Deleting pw_lockout.json adds nothing to an attacker who already holds the database.
So all three findings collapse into one: encrypt the database (SQLCipher), and the hash is no longer readable, which kills both the offline-crack path and the relevance of the lockout file. The priority list that looked like three urgent tasks was one. Always ask which findings share a root cause before scheduling them as independent work.
Edits that silently disappeared
The sync feature decides which version of an edited entry “wins” by comparing timestamps. The comparison treated the timestamp as plain text — and in text, "9999-12-31" sorts as larger than any real date. A compromised device could stamp an entry with a far-future date so that every future edit you made would silently lose and never survive a sync. The app looked fine; your changes just wouldn’t stick.
For the technically inclined — lexicographic compare on updated_at, and the future-timestamp guard
Last-write-wins used updated_at > local.as_str() — a lexicographic string compare, not a date compare (sync.rs, conflict.rs). "9999-12-31" beats "2026-06-04T10:00:00Z" because '9' > '2'. A compromised trusted peer (not an arbitrary attacker) sends an entry with updated_at: "9999-12-31"; every later edit loses permanently. Impact: silent, permanent data loss.
Fix: parse updated_at as RFC 3339 before comparing, and reject any value more than MAX_FUTURE_SECS ahead of local time — which also closes the future-date poisoning path. Shipped in commit 3cd3a60.
Secrets leaking over the local network
This one I’d never have caught by reading code — I only saw it by capturing the actual network traffic with Wireshark. The device-discovery feature was broadcasting each device’s full public encryption key across the local network every 30 seconds. Because the sync encryption key is derived from both devices’ public keys, anyone passively listening on the same network could collect both keys and decrypt all the sync traffic — with no pairing and no password.
Exhibit 2 — The public key is off the wire, both at discovery and in the sync stream. Two captures from the fixed build: the mDNS service record it broadcasts on the LAN (only a device name and type — no public key, no key hint, where the vulnerable build carried a
pubkey_hint=field and the UDP fallback broadcast the full Ed25519 key), and a live capture of an actual sync connection showing the plaintext handshake giving way to opaque AES-256-GCM frames the instant the ECDH key is established. (Anonymized: device IDs and IPs redacted — real LAN IPs rewritten to RFC 5737 documentation addresses.)
View raw records — mDNS TXT (fixed vs. vulnerable) + on-wire handshake→ciphertext
# FIXED build (current):
_moodhaven._tcp.local TXT
device_id=d2be…1240f device_name="Study PC" device_type=desktop version=1.8.0
# no public_key, no pubkey_hint
# VULNERABLE build (pre-PT3, for contrast):
_moodhaven._tcp.local TXT
device_id=d2be…1240f pubkey_hint=<first 8 chars of pubkey> # ← public-key prefix leaked
And here is the other half of “watching the traffic” — a capture of the fixed build’s actual sync connection, so you can see exactly where plaintext stops and ciphertext begins:
# Live sync capture, fixed v1.8.0 build. attacker 192.0.2.10 ↔ victim 192.0.2.20, TCP port 44950.
# 17 packets, ~42 KB total. Frames shown by role, not raw bytes.
# ── plaintext handshake (by design — no secret in it) ──
→ Hello { did: d2be…1240f, eph_pub: <X25519 ephemeral pub> }
← Ok { name: <redacted>, eph_pub: <X25519 ephemeral pub>, challenge: <32-byte challenge> }
→ Auth { signature: <Ed25519 sig over "moodhaven-hello-auth-v1:"||challenge> }
# device IDs + per-connection ephemeral X25519 keys are visible here, and nothing else;
# the static public key is NOT on the wire (that was the leak this section is about)
# ── after the ECDH key is established, every frame is opaque ──
← [4-byte len][12-byte nonce][AES-256-GCM ciphertext] # manifest, entries, etc.
→ [4-byte len][12-byte nonce][AES-256-GCM ciphertext]
... (all remaining frames identical in shape; payload unreadable without the session key)
A passive listener now sees the handshake metadata and a stream of indistinguishable ciphertext blocks — and, the point of this whole section, never the static public key needed to derive the session key. (The raw pcap was captured on the attacker box, which is a party to the connection, then run through the anonymization pipeline below.)
For the technically inclined — the UDP broadcast, the mDNS hint, and the QR/PIN coupling
The UDP discovery fallback (run_udp_discovery) broadcast to 255.255.255.255:4243 every 30 seconds with the full Ed25519 public key in the payload. The sync transport key is SHA-256("moodhaven-sync-v1:" + sorted(pubKeyA, pubKeyB)) — both keys are required, and both were now on the wire. Any LAN host that captured one probe from each device could precompute the transport key and decrypt all sync traffic, with no pairing and no authentication.
The mDNS service TXT record made it even easier: it carried pubkey_hint= — the first 8 characters of the base64url public key — passively delivered to every device on the LAN. The documented security model explicitly said the public key is only shared during pairing; the discovery path violated that invariant.
A related finding from the same round: the pairing QR code embedded the PIN, defeating the “read the PIN off the other screen separately” out-of-band design — a screenshot would capture both at once.
Fixes: strip public_key and pubkey_hint from all discovery payloads (now only device_id, device_name, device_type, version — enough for detection, insufficient for key derivation), and remove the PIN from the QR so it must still be typed by hand. Shipped in PR #122.
The “fixed it, then broke it” problem — twice critical
This is the most important pattern in the whole campaign, so it gets its own spotlight.
- When I shipped the database encryption fix above, the next round found that the encryption migration could fail on Windows in a way that left the app silently running on the old, unencrypted database — so a user who upgraded believing their data was now encrypted might not actually be. A critical bug, living inside the fix for an earlier critical bug.
- The recovery code I then added to handle that failure had its own critical bug: under a precise crash timing, it could lock the user out of their database entirely, requiring a factory reset and losing data. A critical bug inside the fix for the fix.
Both were eventually caught — and the second one only surfaced when I installed and ran the actual build and force-killed it at the wrong millisecond. No amount of code reading would have reliably found it. That’s the whole argument for round-after-round testing on real machines, in one example.
For the technically inclined — the Windows handle quirk and the null-salt recovery regression
The migration failure (PT3). encrypt_in_place exported the DB to an encrypted temp file (moodhaven_enc.db), then renamed it over the original. On Windows, SQLite in WAL mode can retain the SHM file handle after the Connection is dropped, so the rename failed: moodhaven.db kept its plaintext SQLite magic bytes, moodhaven_enc.db existed but was never renamed, and no db_state.json was written. Same migration worked on Linux. Fix: retry loops (5×50 ms) for SHM/WAL removal and for the rename. Shipped in PR #122.
The orphaned-file gap (PT4). If a crash hit between the export step and writing db_state.json, the encrypted file existed but the state file didn’t — and the startup recovery check was gated on db_state.json saying encrypted=true, so it never fired. Every launch silently opened the plaintext DB and ignored the encrypted copy forever. Fix: check for moodhaven_enc.db unconditionally, before the state-file gate; write db_state.json atomically (.tmp then rename()); complete the rename.
The recovery regression (QA pass). The orphan-recovery path then promoted the encrypted file but wrote {encrypted: true, salt: null} — it had no salt to record. The next unlock called db_salt() → None → “Database encryption record is missing,” a permanent unlock failure requiring factory reset. Surfaced only by installing the build and force-killing it between export and salt-write. Fix: encrypt_in_place now pre-writes {encrypted: false, salt: Some(salt)} before creating the encrypted file; recovery branches on salt.is_some() — complete the migration if the salt is known, discard the orphan otherwise. Shipped in PR #127.
Locked, but not really locked
In later rounds, the testing fanned out across the app’s full command surface and found a class of access-control gaps: a number of commands could be triggered while the app was still locked. Depending on the command, that meant a locked-session attacker with access to the app could enumerate voice memos, start or accept a device pairing, list trusted devices, or even — in the browser version of the app — read and write journal data and analytics while the lock screen was up.
For the technically inclined — the missing require_unlocked guards and the default-allow browser gate
Roughly six voice-memo commands (list_voice_memos, get_voice_memo, delete_voice_memo, patch_voice_memo_transcription, transcribe_voice_memo, link_voice_memo_to_entry), four peer-pairing commands (peer_generate_pairing_token, peer_accept_pairing, peer_get_trusted, peer_revoke_device), and two sync helpers (upsert_entry_from_sync, get_entry_timestamps) lacked the require_unlocked guard. store_voice_memo is intentionally pre-auth — the Wear OS watch needs to deliver audio while the app is locked — but the review-path commands in the same module had never been audited separately.
The browser/PWA build was worse. Its lock gate was default-allow: a LOCK_GATED_COMMANDS set that listed only seven commands, so every other IndexedDB-backed command (all journal reads/writes, settings, books, analytics, export, time capsule, StillHaven) ran regardless of lock state — roughly 40 data commands leaking while locked. Content stayed AES-encrypted, but metadata, analytics, and writes did not.
Fixes: add lock guards to every sensitive native command; expand the browser shim’s gate to the full data surface — with a dozen new tests asserting each category refuses to run while locked. The deeper lesson is that a default-allow gate is the wrong shape; the consistent fix this release was to enumerate the data commands rather than invert to default-deny mid-release. Shipped in PR #133.
The encryption key, written out in plaintext on every unlock
The recurring theme across the last several rounds was memory hygiene: making sure secret keys are actively wiped from memory after use, not just dropped and left for the system to maybe-overwrite-eventually. Round after round found one more place this was missed. The deepest one: the database key was protected in its original form, but the code built a SQL string containing that key in full and never wiped the string. So on every unlock, the single most sensitive secret in the app — the key to the entire database — was left sitting in memory.
For the technically inclined — Zeroizing on the key but not the string built from it
The key bytes were wrapped in Zeroizing, but format!("PRAGMA hexkey = '{}'", hex) produced a fresh, unzeroized String holding the full key in hex. Four sites, all now Zeroizing. This was the same class of bug found one layer shallower in earlier rounds: PT4 wrapped verify_password and unlock_app; PT5 swept the rest of the PBKDF2 call sites and caught three more (TOTP secret encryption in two_factor.rs, export payloads in data_management.rs, and media key derivation in media.rs), changing derive_key’s return type from [u8; 32] to Zeroizing<[u8; 32]>. The broader lesson: protecting the key isn’t enough — you have to protect every string you build out of it. Shipped across PRs #124, #125, and #133.
Exhibit pending — the one empirical check still owed here. The live memory-dump test against the latest build (dump the just-locked process and grep for key-shaped material to confirm the Zeroizing wipes actually clear it at runtime) is the single remaining check from this round, and it isn’t done — I was away from the lab machine when the rest of this was captured. So I’m not showing a dump here, and I’m not claiming one. The zeroization itself is code-verified, not yet runtime-verified: every key site above is wrapped in Zeroizing, with tests over the call sites, and earlier rounds’ dumps (PT4/PT5) came back clean. The fresh live dump is a deferred, GUI-required residual — I’ll add the capture when it exists rather than imply it already does.
The full-database restore that anyone with an old key could trigger
The new-device setup flow lets a fresh install pull your entire database from an existing device over your home network. The receiving side asked; the serving side just… served. There was no prompt and no approval on the device that held the data — so a device that had been paired once, then lost or stolen, could quietly pull your whole journal whenever your real device was running.
Exhibit 3 — The restore consent gate, probed both ways. A still-trusted attack client completes the full handshake and sends a
RestoreRequest. Unarmed, the gate blocks it before any database bytes leave disk. Armed (a human flips the one-shot window in Settings → Devices), the server does start streaming — and the probe aborts on the first chunk by design, exfiltrating nothing — then a re-probe is rejected again because the arm is one-shot. Full-database transfer now requires a present human, not just possession of an old key. (Anonymized: device IDs and IPs redacted; the probe never writes the stream to disk. The field extract below is hand-built from the run; the underlying raw capture —e3_armed.pcap— still has to go through the anonymization pipeline before the binary itself is ever published.)
View raw capture — restore gate: unarmed reject, armed allow (probe aborts), re-probe reject (anonymized)
# Probe = the custom v2 sync-client emulator running with a still-trusted Ed25519 key
# (the same e3 tool that surfaced the non-blocking-socket bug). attacker 192.0.2.10 → victim 192.0.2.20.
# ── Run 1: source device NOT armed ──────────────────────────────────────────
→ Hello { did: d2be…1240f, eph_pub: <X25519 ephemeral pub> }
← Ok { name: <redacted>, eph_pub: <X25519 ephemeral pub>, challenge: <32-byte challenge> }
→ Auth { signature: <Ed25519 sig over "moodhaven-hello-auth-v1:"||challenge> } # valid — key still trusted
# handshake completes: Ed25519 identity proven + X25519 ECDH session key derived
→ RestoreRequest { } # sent inside the AES-256-GCM frame
← [GATE BLOCKED] Restore not authorized # → UNARMED-REJECT
<connection closed; zero database bytes left disk>
# ── Run 2: user armed restore via Settings → Devices (one-shot, 5-min window) ─
<handshake identical to Run 1 — same trusted key, same ECDH>
→ RestoreRequest { }
← [GATE OPEN] server began streaming DB: seq=0 total_bytes=249856 # → ARMED-ALLOW
-> ABORTING (not exfiltrating) # probe aborts on the first chunk by design
<nothing exfiltrated; the emulator never writes the stream to disk>
# ── Run 3: re-probe immediately after Run 2 (arm is one-shot) ────────────────
→ RestoreRequest { }
← [GATE BLOCKED] Restore not authorized # → UNARMED-REJECT (window already consumed)
For the technically inclined — trust-at-pairing is not authorization-to-exfiltrate
After the Ed25519 handshake, if the first message was a RestoreRequest, the server read the whole SQLCipher file off disk and streamed it — no prompt, no approval. The problem: a device paired once still holds a valid trusted keypair, so a lost/stolen/compromised peer can complete the handshake on its own and pull the full database whenever the source device’s sync server is up. Trust established at pairing time is not authorization to exfiltrate everything later.
Fix: mirror the pairing model — the serving device must explicitly arm restore (Settings → Devices → “Set up a new device”) for a single 5-minute, one-shot window. Unarmed RestoreRequests are rejected. Full-DB exfiltration now requires a live, present human on the source device, not just possession of an old key. Shipped in PR #133.
The flagship encryption feature that never turned on
This is the finding I least wanted to write and most needed to.
Several rounds earlier, the “your database is readable” problem was fixed by adding SQLCipher — encrypting the entire database file at rest. It was the headline security feature of that whole stretch of work. It was documented. It passed subsequent pentest rounds. I believed it. Everyone reviewing it believed it. The trouble is that none of that is evidence the code actually does what it says.
The eighth round confirmed, on a real installed build, that the database was never encrypted on any install, on any operating system. Every copy of the app had been quietly running on a plaintext database the whole time — the exact problem the SQLCipher feature was supposed to have solved. The encryption migration was firing on first unlock, failing its own verification check, and silently falling back to the original plaintext file. Because the fallback was silent and the app kept working normally, nobody noticed.
The cause was a mismatch in how the same key was applied on the way in versus the way out. The code encrypted the database with a raw key, but read it back with a command that quietly ran that key through a key-derivation step first — producing a different key. So the file was written with one key and every read asked for a transformed version of it. The result was “file is not a database” on every reopen, the verification step failing, and the fallback to plaintext.
Two independent investigations landed on the same root cause. To be sure it wasn’t a misreading, I built a minimal standalone program that did nothing but encrypt a database one way and reopen it the other — and reproduced the exact failure in isolation. The same reproduction, run against the corrected key handling, now opens cleanly.
There was no test covering the encrypt-then-reopen round trip. That single missing test is the whole reason a non-functional security feature shipped and survived multiple rounds of adversarial review.
Exhibit 1 — Encryption-at-rest, finally real. This is a before/after of the database file’s header bytes: the “before” capture shows the readable SQLite magic string (
SQLite format 3) of the inert build, and the “after” shows the high-entropy SQLCipher header of a database written and reopened with the corrected key handling — taken from the installed Windows build that now verifies clean end-to-end. (Capture will be anonymized: any file paths redacted.)
View raw capture — before/after xxd of the DB header (anonymized)
# BEFORE (inert build — plaintext DB on disk):
$ xxd moodhaven.db | head -1
00000000 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00 SQLite format 3.
# └──────── readable "SQLite format 3" magic — file is NOT encrypted ────────┘
# AFTER (corrected key handling — installed Windows build, written and reopened cleanly):
$ xxd moodhaven.db | head -1
00000000 bd bc 97 0c 23 d5 0e 0f 1e db 42 21 3d b7 be 42 ....#.....B!=..B
# └──────── no "SQLite format 3" magic — high-entropy SQLCipher ciphertext ──┘
# And the encryption-state file, post-migration:
$ cat db_state.json
{"encrypted":true,"salt":"<16-byte base64 salt>"}
# └──────── encrypted=true, real salt present, no orphaned moodhaven_enc.db ──┘
For the technically inclined — the raw-key vs. KDF-key PRAGMA mismatch, and the standalone repro that proved it
The migration encrypted via ATTACH DATABASE ... KEY "x'<hex>'" — the x'...' literal form, which SQLCipher treats as a raw 256-bit key with no KDF. But every unlock and verify reopened the database with PRAGMA hexkey = '<hex>', which SQLCipher interprets differently: it decodes the hex to 32 bytes and then runs PBKDF2 over them, deriving a key that is not the raw key the file was written with.
write: ATTACH ... KEY "x'<hex>'" → raw key K
read: PRAGMA hexkey = '<hex>' → PBKDF2(decode(hex)) ≠ K
result: "file is not a database" → first-unlock verify fails → silent fallback to the plaintext file
This was confirmed two ways: a standalone cargo reproduction (case D = the app’s read path fails; case E = PRAGMA key = "x'..."' succeeds), and a trace into the vendored SQLCipher C source (the raw-key branch requires the literal x'...' wrapper; hexkey pre-decodes to 32 bytes, fails the raw-key length test, and falls through to PBKDF2).
Fix: the three read-path pragmas — in apply_key, and in encrypt_in_place’s verify step and final open — now all use PRAGMA key = "x'<hex>'", the same raw x'...' literal form as the encryption path. It is fully backward-compatible: existing files were always raw-keyed, so only the readers were wrong; they now open the already-encrypted files with no re-encryption needed. A regression test now covers the full encrypt (via ATTACH + sqlcipher_export) → close → reopen (via PRAGMA key) round trip. Commit e6fb416 on PR #133.
Status — verified end-to-end on the green Windows installed build: root cause confirmed two independent ways; fix applied and proven by the standalone reproduction and the new regression test; and — the part I withheld claiming until it was true — re-validated on a real installed Windows build. After a fresh setup the migration now completes: db_state.json reads {"encrypted": true} with a real salt, there is no orphaned moodhaven_enc.db left behind, the on-disk database header is high-entropy ciphertext (no SQLite format 3 magic), the app unlocks cleanly, and the peer-sync server starts. Linux (Ubuntu/“purple”) re-validation of the same migration is a deferred residual — still pending, not yet claimed. See commit e6fb416, PR #133, branch fix/security-pt6-acl-lockguard.
The irony is total and worth sitting with: an earlier round fixed the readable-database problem by adding SQLCipher, and the eighth round found that the fix never took effect. The earlier round wasn’t wrong about the design — it was wrong to believe the design was running. That gap, between “we wrote the fix” and “the fix executes correctly on a real machine,” is the entire reason this campaign keeps going. And it’s exactly why I refused to call this one done until the corrected build was reinstalled from scratch and the encrypted-on-disk, clean-unlock result was confirmed by hand on Windows.
One more from the same round — the Windows “Erase & Start Fresh” failure
The eighth round also surfaced a real Windows-only bug in “Erase & Start Fresh”: the app held its own open handle to the SQLite database file, and Windows refuses to delete or rename a file that a process still has open — so the factory reset failed outright. Fix: release the app’s own open database connection before touching the file (and surface the real error instead of swallowing it). Commit b142f31. A close cousin of the Windows file-locking quirk that broke the encryption migration in an earlier round — same operating system, same lesson about open handles.
And the lab itself grew this round: a third machine, an Ubuntu victim, was brought online so the campaign now runs across all three targets — Kali attacker, Windows victim, Ubuntu victim. At this point the campaign was far from finished — two more rounds followed before it closed.
The ninth round — when the custom attack tool started finding bugs by itself
The eighth round fixed the encryption-at-rest feature. The ninth round did something different: instead of reading the code looking for what might be wrong, I pointed a piece of instrumentation I’d built — a from-scratch reimplementation of the app’s own encrypted sync protocol (described in the next section) — directly at the running app and watched what broke. Six issues came out of it, and the most interesting one I’d never have found by reading code at all.
The bug the attack tool found that code review couldn’t
My custom sync client connected to the app, completed the full cryptographic handshake, and then — on Windows — the app kept dropping the connection partway through. Not rejecting it, not erroring: silently dropping trusted peers mid-handshake. That’s the kind of thing that doesn’t show up in source review because the code looks correct; it only manifests as a timing-dependent failure on one operating system, under real network conditions, against a real client. Watching my own tool get hung up on it is what exposed it. Fix: force each accepted connection back into blocking mode (a default that differs subtly across platforms) so the read loop behaves identically everywhere. Committed in 949f9a9.
Two new data-loss bugs hiding inside the eighth round’s encryption fixes
The “fixed it, then broke it” pattern struck again — twice — and both new bugs lived inside the encryption work from the previous round:
- The recovery path could destroy data. When the encryption migration is interrupted at exactly the wrong moment, a recovery routine cleans up the half-finished encrypted file. The bug: it could promote that half-finished file into place without first checking that it actually opened with the key — so a crashed or corrupt migration could overwrite the user’s good database with a broken one. Fix: never promote the temporary encrypted file until it has been re-opened and key-verified; if it fails verification, the original untouched database is preserved. Committed in
0774a3e, with two new regression tests (one proving a corrupt temp file leaves the original intact, one proving a valid temp file is promoted only after it verifies). - The “Sync from Another Device” restore could permanently lock you out of the restored copy. When a fresh device pulls your whole database from an existing one, it receives the encrypted database bytes — but the transfer was never sending the small piece of data needed to derive the decryption key (the database’s salt). The restored device would receive a perfectly good encrypted database it could never open: every unlock attempt failed with “encryption record is missing.” The feature was non-functional end-to-end. Fix: the restore protocol now carries the encryption state (encrypted flag + salt) alongside the data, and the restored device writes a matching state file so the same password derives the same key. Committed in
07e9d44.
Both of these are fixed and committed, and proven by reproductions/regression tests — and both got attacked again in the very next round. That tenth round red-teamed these two fixes specifically and found a real bug in each (covered below): the recovery fix could be fooled into accepting an empty-database decoy, and the restore-salt fix wrote an attacker-supplied salt with no validation. Both follow-on bugs were fixed in turn. What’s still honestly outstanding is the live, GUI-driven re-validation — actually restoring onto a fresh device over the network and force-killing a real migration on the installed builds. That’s a residual, deferred item; the code paths are correct and tested, and I won’t pretend the on-real-hardware re-run has happened when it hasn’t.
Three more, smaller but real
- Two more locked-while-unlocked gaps. The access-control sweep from earlier rounds had missed two commands — one that reports database statistics, one that regenerates two-factor backup codes — which could still run while the app was locked. Both now require an unlocked session. And the restore-arming window (the “I’m setting up a new device right now” switch from an earlier round) is now cleared whenever the app locks, so walking away re-secures it. All committed in
949f9a9. - The pairing QR code rendered nothing. Not a vulnerability, but a real correctness bug found in the same pass: the QR code on the device-pairing screen silently failed to draw (a dynamic module load that broke in the production build), leaving a perpetual spinner. The whole point of the QR code is to let you pair without typing — so this quietly forced everyone onto manual PIN entry. Fix: render it with a statically-bundled component instead of a runtime import, plus a regression test that asserts the QR actually appears. Committed in
4443a2b.
For the technically inclined — the non-blocking accept, the deferred key-verified promotion, the restore-salt protocol change, and the lock-guard additions
Non-blocking accepted socket (949f9a9). On Windows, a TcpStream accepted from a listener could end up in non-blocking mode, so the sync read loop hit WouldBlock mid-handshake and the connection was torn down before the v2 key exchange finished. The fix calls set_nonblocking(false) on every accepted stream (plus a read_timeout for liveness), making the post-handshake blocking read loop behave identically on all platforms. This surfaced live via the e3 sync-client emulator — the standalone client repeatedly failed to complete a handshake that the source code said should succeed, which is what pointed at the socket mode rather than the protocol.
Deferred, key-verified promotion in the recovery path (0774a3e). encrypt_in_place exports the plaintext DB into a temporary encrypted file, then promotes it over the original. The regression: the startup recovery/promotion path could rename the temp file into place before confirming it opened with the derived key. If the migration had crashed mid-export, the temp file was truncated/corrupt — and promoting it overwrote a good plaintext DB with an unopenable one. The fix defers promotion until apply_key re-opens the temp and key-verifies it; a crashed/corrupt temp is discarded and the original is preserved. Two tests were added (corrupt-temp-preserves-original; valid-temp-promotes-after-verify); the suite went to 172 passing.
Restore-salt transfer (07e9d44). “Sync from Another Device” streamed only the encrypted SQLCipher database bytes and never the source’s db_state.json salt. On the restored device db_salt() returned None, so verify_password failed with “Database encryption record is missing” — a permanent, un-unlockable state. The fix propagates the source’s encryption state (encrypted + salt) through the restore completion so the restored device writes a matching db_state.json and derives the same SQLCipher key via PBKDF2(password, salt). The salt is not a secret — it already lives in plaintext in db_state.json on every device; the same password across a user’s devices is the security boundary, exactly as the peer-sync model intends.
Lock guards + restore-arm clear-on-lock (949f9a9). get_data_stats and regenerate_backup_codes were missing the require_unlocked guard and now have it; the restore-arm state is cleared on lock_app so a momentary “set up a new device” window doesn’t survive the session locking. One honestly-deferred LOW remains: the restore-arm window’s clear-on-lock is in, but per-device targeting (scoping an armed window to a specific incoming device) is still pending, and a couple of Zeroizing sweeps on TOTP/hardware-key material are noted as deferred — not claimed as done.
QR render fix (4443a2b). The pairing screen used a dynamic import('qrcode') + toDataURL path that failed silently under Tauri’s production chunk loading (CJS/ESM interop), leaving a spinner and no code. Swapped to the already-bundled qrcode.react <QRCodeSVG> (static, synchronous), with a render regression test over the pairing tab. Not a security finding — a correctness regression with a security-adjacent effect (it forced manual PIN entry).
The tenth round — attacking the ninth round’s fixes, then a clean pass
Every round so far had attacked the app. The tenth round attacked the previous round’s patches — on the premise that has driven this entire campaign: a fix is new, untested code, and the code most likely to hide a fresh bug is the code I wrote last, while convinced I’d just made things safer. So I took the ninth round’s two trickiest fixes — the recovery path and the restore-salt transfer — and red-teamed them as if they were a stranger’s pull request. Each one had a real bug in it.
The recovery fix that could be handed an empty-database decoy
The ninth round’s recovery fix was careful to never promote a half-finished encrypted database until it verified the file opened with the key. But the check it used to decide whether the user’s original database was still good plaintext had a hole. SQLCipher’s connection-open call creates the file if it’s missing, and an unkeyed SELECT count(*) succeeds against a freshly created empty database. So a local attacker could delete or swap the original, and the recovery probe would happily auto-create an empty database, see that it “opened,” and accept that empty decoy as the user’s real journal — silently substituting an empty journal for the real one. Rated HIGH (local-access).
Fix: open existing-setup databases with no-create semantics, so a missing file is an honest error rather than a fabricated empty decoy; and require the probe to find a real journal_entries table before it accepts a file as the user’s original, so a zero-table file is never mistaken for the genuine database. Two regression tests now assert the decoy is neither fabricated nor accepted. Committed in 2334269.
The restore-salt fix that wrote an attacker-controlled salt unchecked
The ninth round’s restore-salt fix made “Sync from Another Device” actually work by carrying the encryption salt along with the database. But it wrote the incoming salt and encrypted flag straight to the new device’s state file with no validation, and the integrity checksum covered only the database bytes — not the state file. A trusted-but-compromised source peer could therefore send a garbage or wrong salt and permanently lock out the freshly restored device (the same password would derive the wrong key, forever), and a local attacker could swap just the salt past the checksum. Rated HIGH (lockout / availability).
Fix: validate the incoming salt at the moment of receipt — encrypted: false must carry no salt; encrypted: true must carry a standard-base64, exactly-16-byte salt; anything else is rejected and the restore aborts before any state file is written. The integrity digest was rebound to cover the database bytes and the state JSON together, so the salt can no longer be tampered with independently, and a missing checksum now discards the restore instead of proceeding unverified. Seven new tests; the same-password happy path is preserved exactly. Committed in fa2d299.
The part that actually closed the campaign
Finding a bug in each of the two fixes I’d specifically set out to attack was, oddly, the encouraging result — it meant the method was working. The decisive moment came after. With both follow-on bugs fixed, I ran an independent verification hunt — a fresh-context pass looking for new high-severity issues and re-checking that every happy path still worked — and it came back clean. No new HIGH. No journal-content exposure in any round. The same-password restore, the encrypt-and-reopen round trip, the peer handshake, the lock guards — all still intact.
That clean pass is what let me stop, and it’s worth being precise about why it’s a legitimate stopping point rather than fatigue dressed up as confidence:
- The externally-reachable surface converged to zero. Everything an attacker on your network — or hitting your app without first owning your machine — could reach was closed across the ten rounds. The bugs that remained at the end were all local-access or lockout-class: they require a thief who already holds your unlocked device, or a device you once paired and then lost. That’s a real but fundamentally different threat tier, and I flag each remaining residual — the deferred live memory dump, the Linux re-validation, the per-device restore-arm scoping, the last
Zeroizingsweeps — at the point it comes up rather than burying it. - The worst-case shifted from confidentiality to availability. The tenth round’s two findings could deny you access (an empty decoy, a lockout) — they could not read your journal. That direction of travel matters: a campaign that keeps finding “attacker can read your data” bugs hasn’t converged; one whose residuals are “attacker who already owns your laptop can make you re-set-up” has.
- The one invariant held all ten rounds. Across every round, on every OS, no attack ever exposed journal content to a party not meant to see it. The zero-knowledge core — entries are AES-256-GCM ciphertext, the key derives from your password and is never stored — was never broken. Everything fixed was around that core, never through it.
Convergence, not perfection, is the honest claim. You do not attack your way to a proof that zero bugs remain — no finite test can show that. What you can do is drive the reachable surface to zero, watch the residuals migrate from “reads your data” to “needs your unlocked laptop and only annoys you,” confirm the central invariant never bent, and then run one more independent hunt and have it come back empty. When that happens, the loop has terminated honestly. That’s where the tenth round left it.
For the technically inclined — the create-on-missing decoy, the unvalidated salt, and the integrity rebind
Recovery empty-DB decoy (2334269). Two coupled defects in the 0774a3e recovery refactor. First, Database::new and the startup recovery probe opened databases via SQLCipher’s Connection::open, which creates-on-missing — and an unkeyed SELECT count(*) succeeds against a fresh empty DB — so the “is the original a readable plaintext database?” probe accepted an auto-created empty file as the user’s original, and Database::new fabricated empty DBs for missing-but-expected files. Fixes: open existing-setup DBs (db_state.encrypted || salt present) with SQLITE_OPEN_READ_WRITE and no CREATE, so a missing file errors ("database file missing") instead of fabricating a decoy (the genuine fresh-install path still creates); and promote_pending_tmp’s original-is-plaintext probe now opens without CREATE and requires a real journal_entries table, so an empty/zero-table file is never accepted — the tmp is left untouched rather than discarded. Two more fixes rode along: a revert-to-encrypted:false path now preserves the existing salt (it had written salt: None, stranding a recoverable encrypted DB), and moodhaven_enc.db-wal/-shm are cleaned before the atomic promote. +2 tests; 174 pass.
Unvalidated restore salt + integrity rebind (fa2d299). The 07e9d44 change wrote the attacker-controlled salt/encrypted from RestoreEnd to db_state.json with no validation, and the SHA-256 covered only the DB bytes — so a compromised source peer could send a garbage salt for a permanent lockout, and a local attacker could swap just the salt past the checksum. Fixes: validate_restore_salt() at receipt (encrypted:false ⇒ salt:None; encrypted:true ⇒ Some + standard base64 + exactly 16 bytes; true+None rejected), with do_full_restore_client aborting (drop handle, remove tmp) before writing any companion file on violation; restore_integrity_digest() = SHA-256(db_bytes || dbstate_json) so the writer and both verifiers (peer_apply_and_restart and the lib.rs startup check) recompute a bound digest and re-validate the salt before applying, and a missing checksum now discards rather than proceeding unverified; plus a cleanup of stale pending/.tmp/.sha256/.dbstate at restore start. Auth/consent ordering and serde-default wire-compatibility preserved; the same-password happy path is unchanged. +7 tests; 181 pass.
The verification hunt. After both fixes, an independent fresh-context pass hunted for new high-severity findings and re-exercised the happy paths (same-password restore, encrypt→close→reopen round trip, peer handshake, lock guards). It produced no new HIGH and confirmed the paths intact — and added regression coverage rather than new findings (updater-integrity and peer-sync password-mismatch tests in 6b678f5). That empty result against intact happy paths is the close-out signal.
The tools I built to do this
Here’s the thing nobody tells you about pentesting your own software: for an app like this, the tools don’t exist yet. The off-the-shelf security scanners — the ones that crawl a website or hammer a public API — have nothing to say about a local-first desktop app that speaks its own private, encrypted protocol between two of your own machines. There’s no URL to point them at. So a real test of this app meant building the instrumentation myself, and that turned out to be as much of the work — and as much of the skill on display — as finding the bugs. The honest meta-lesson of the whole campaign is that the gap between “I designed it securely” and “I proved it” was only closable with bespoke tooling, and that tooling caught bugs the generic scanners never could have.
In plain terms, here’s what I had to build:
- A real attack lab. A dedicated Kali Linux attacker machine and two “victim” machines (Windows and Ubuntu) running the actual installed app, plus an occasional real Mac when it was on the network — all driven from a single orchestrator over SSH, so one command could build, install, attack, and tear down across every machine.
- A from-scratch clone of the app’s own encrypted sync protocol. This is the centerpiece. To test the network features properly, I reimplemented the app’s entire secure handshake — the cryptographic key exchange, the device-identity challenge-and-signature, the derived session key, and the encrypted message framing — as a standalone attack client. That let me act like a trusted device and probe exactly one thing at a time. It’s also the tool that caught the dropped-connection bug live.
- A pairing-screen fuzzer and a traffic-anonymization pipeline. One tool hammered the device-pairing server with malformed and oversized requests to confirm it holds up. Another captured the real network traffic off the wire and then carefully scrubbed it — stripping IP addresses, device IDs, keys, and PINs — so the captures could become honest, blog-safe evidence without leaking anything.
- A multi-OS build-and-install harness. Getting the installed app (not a dev build) onto each operating system and into a testable on-screen state was its own engineering problem — automating a Windows GUI from a headless session, getting the encrypted-database build to compile correctly, and installing and running the real
.msiand.debartifacts a user would actually download. - An AI-orchestrated testing loop with a verification gate. The whole campaign ran as fan-out investigations that were then adversarially verified — every candidate finding re-checked from scratch against the source before it counted — with a security-review step acting as the gate and a background health monitor watching the lab. The point of the verification gate is that an attack tool throwing an error isn’t a finding until it can be re-derived from the code.
None of this is the app. It’s the scaffolding around the app — and building it well is the difference between “I ran a scanner” and “I actually attacked this.”
For the technically inclined — the v2 sync-client emulator, the fuzzer + pcap anonymization, and the orchestration
The lab. Orchestrator (laptop, Claude Code) drives everything over SSH: red (Kali, attacker — holds the probe library at ~/pt6/ and a stable Ed25519 pentest identity in red_peer_key.bin, 0600, so it can be added to a victim’s trusted_devices.json), green (Windows 11, installed release .msi), purple (Ubuntu 22.04, .deb/AppImage), and an opportunistic macbook when it’s on the LAN. Red’s network probes are OS-agnostic — the same script hits green, purple, or the Mac; only the victim IP and the device-derived port change. Sync port = 44000 + (first-4-hex of device_id % 1000); pairing port uses the 43000 base.
The custom v2 sync-client emulator (e3_restore_gate.py). A full reimplementation of MoodHaven’s peer-sync v2 protocol in Python (using cryptography), enough to be a trusted client and exercise the restore consent gate:
- Load a stable Ed25519 identity (
red_peer_key.bin) so the victim recognizes the device. - Compute the victim’s sync port from its
device_id. - Handshake: send
Hello { did, eph_pub }with a fresh X25519 ephemeral public key → receiveOk { eph_pub, challenge }→ answer the Ed25519 challenge by signing"moodhaven-hello-auth-v1:" || challenge_bytesand sendingAuth { signature }. - Derive the session key exactly as the app does:
SHA-256("moodhaven-sync-v2:" || X25519_shared || sorted(static_pub_A, static_pub_B)). - Speak the framed transport —
[4-byte BE length][12-byte random nonce][AES-256-GCM ciphertext]— to send a single encryptedRestoreRequestand observe the consent gate: unarmed → encryptedErr/reject; armed → the server begins streaming, and the probe aborts immediately (it never exfiltrates). Awhoamimode prints red’sdevice_id+ public key so it can be added to the victim’s trusted list.
Reimplementing the protocol (rather than driving the app’s own client) is what made it a probe: it can send one malformed or out-of-sequence message at a time, with an attacker’s identity, and observe the server in isolation. It’s also what surfaced the non-blocking-socket bug — the emulator’s handshake kept failing where the source said it should succeed, isolating the defect to the accepted socket’s mode rather than the protocol logic.
The pairing fuzzer (e1_pairing_probes.sh) + pcap pipeline. e1 nmaps the pairing port range, then fires oversized bodies (2 MB > the 1 MB cap → expect HTTP 400), malformed JSON (expect 400, server survives), a 5-attempt PIN brute force (expect 429 + server closes on the 5th), and a post-lockout probe (expect connection refused). Packet captures run on red (which is a party to the TCP connections, so no span port or ARP trickery is needed; it sees mDNS multicast natively). The anonymization pipeline keeps exhibits real but blog-safe: tcprewrite/bittwiste rewrite L2/L3 (real LAN IPs → RFC 5737 documentation IPs, MACs → placeholders), and because those tools don’t touch application-layer strings, the published figures are hand-built tshark field extracts with sed redaction of device IDs (xxxx…xxxx), public keys (len=32, ab12…ef90), challenge nonces, the 6-digit PIN (######), and ciphertext (short prefix + length only). A pre-publish checklist greps the final files for any real 192.168.1.x, full device ID, full key, or PIN before anything ships.
The build/install harness. The hardest automation problem was getting a GUI app into a testable state on a headless-driven Windows box: solved with interactive Scheduled Tasks (LogonType Interactive, RunLevel Highest) that build the release .msi, install it elevated via msiexec, and launch the installed exe onto the logged-in RDP desktop. A hard-won build lesson is baked in: do not set OPENSSL_DIR/OPENSSL_STATIC — rusqlite’s bundled-sqlcipher-vendored-openssl must compile its own vendored OpenSSL (pointing it at a prebuilt OpenSSL was what first masked the SQLCipher readback bug), and nasm is required on Windows for the vendored asm but not on Linux. The rule throughout: build from the PR branch and test the installed artifact, not tauri dev.
The orchestrated loop + health monitor. Each round is static-analysis → live-exploit → packet-capture → memory-forensics → fix → re-test, with subagents fanning out per surface (lock guards / crypto-zeroization / network-injection / browser-shim parity). Every candidate finding is adversarially verified by a fresh-context pass that must quote the motivating file:line from source or the finding is suppressed — the same discipline the cso (“Chief Security Officer”) security-review skill formalizes as a confidence gate, and the de-flaker that keeps transient network noise from becoming a “finding.” A proactive health monitor (harness-monitor.sh) tails per-victim phase status and distinguishes an infrastructure failure (victim never came up) from a genuine security PASS/FAIL, so a flaky SSH hop never gets misread as a clean result.
What the testing didn’t find (and why that counts)
A good penetration test is not measured only by the holes it finds. The roughly two dozen attacks that failed are evidence that the defenses I’d built actually work under fire:
- Cross-site scripting via book and tag names didn’t work — React escapes text by default, and the app never uses the unsafe raw-HTML escape hatch (the two raw-HTML sinks that do exist are run through DOMPurify).
- A denial-of-service flood against the sync engine didn’t work — an Ed25519 challenge-response rejects untrusted devices before any large data is read, and a hard frame cap blocks memory-exhaustion payloads. This round I fuzzed the sync frame parser directly: 4 GB and 256 MB length prefixes, truncated frames, a garbage HELLO, zero-length frames, and random bytes. Every one was rejected cleanly — the oversized prefixes hit
Frame too large (limit 16777216)(the 16 MiB cap), the malformed HELLO came backmissing field did— and the app survived all of them with no panic and no out-of-memory. - Settings injection from a malicious peer didn’t work — only a single, explicitly allowlisted preferences blob is allowed to sync; credentials and auth secrets are blocked at the data layer.
- Brute-forcing the recovery key is infeasible — its 24 characters from a 32-symbol alphabet give 120 bits of entropy (32^24 ≈ 1.3×10^36 combinations); at realistic cracking speeds that’s on the order of ~10^27 years to exhaust, ~10^26 to clear even half the keyspace.
- Pulling secrets out of the shipped binary didn’t work — no hardcoded keys or passwords; the release build is stripped and hardened (
strip,lto,panic = "abort"). - Lifting the device’s signing key off disk didn’t work — I checked, and the Ed25519 private key (
peer_key.bin) isn’t sitting in the app’s data directory at all; it’s held in the OS keyring. File-system access alone doesn’t hand an attacker the device identity key. - Memory dumps after the fixes came back clean — the key-shaped hex strings that appeared in earlier dumps were gone. (One honest caveat: the live memory-dump pass against the latest build — the empirical re-check of key-zeroization on the just-locked process — is the one test still pending; see the note below.)
Confirming that a defense works is a different kind of value from finding a hole, but it’s real value. It turns “I think this is safe” into “I tried to break this and couldn’t.”
The ten rounds at a glance
| Round | Focus | Confirmed | Status |
|---|---|---|---|
| PT1 | Sync engine, browser build, conflict resolution | 3 | Fixed (commit 3cd3a60) |
| PT2 | Encryption at rest, key files, lockout | 4 | Fixed (v1.7.1) |
| PT3 | The encryption migration + live network traffic | 6 | Fixed (v1.7.2, PR #122) |
| PT4 | Memory forensics, startup recovery, binary hardening | 5 | Fixed (v1.7.3, PRs #123, #124) |
| PT5 | Completeness sweep: file paths, reset, every key path | 3 | Fixed (v1.7.4, PR #125) |
| QA | Running the real build through first-time setup | 2 | Fixed (v1.7.5, PR #127) |
| PT6 | Access-control audit across the full command surface | 7 | Fixed (PR #133) |
| PT7 | Verify the prior fixes, then hunt again | 11 | Fixed (PR #133) |
| PT8 | Prove the encryption actually works + Windows reset + Ubuntu victim | 2 | SQLCipher fix verified end-to-end on green Windows (commit e6fb416); Linux re-validation pending |
| PT9 | Turn the custom sync-client tool on the running app | 6 | Fixed + committed (949f9a9, 0774a3e, 07e9d44, 4443a2b) |
| PT10 | Red-team the PT9 fixes; then an independent verification hunt | 2 | Both fix-bugs fixed (2334269, fa2d299); verification hunt clean → campaign closed |
| Total | 65+ targets | 41 through PT7 (all fixed) + 2 in PT8 + 6 in PT9 + 2 in PT10 | All fixed; PT8 verified on green Windows; PT10 verification hunt clean → closed |
A note on PT7 that captures the spirit of the whole thing: before hunting anything new, I pointed the testing at the previous round’s pull request with one job — prove it fixes what it claims. That verification pass found two commands the project’s own documentation said were protected and which, in fact, were not. A fix’s own documentation is not evidence the fix is complete. You verify against the code.
PT8 took that same instinct to its logical extreme — and it paid off twice over. First, instead of trusting that the flagship SQLCipher encryption was working because it was documented and had passed earlier rounds, I set out to prove it on a real machine. It wasn’t working at all: the database had been plaintext on every install the whole time. Then, holding to the same standard, I refused to mark the fix done until the corrected build was reinstalled from scratch on Windows and confirmed encrypted-on-disk with a clean unlock — which it now is. The lesson PT7 hinted at, PT8 made undeniable: you don’t verify a fix against its documentation, you verify it against a running build — and ideally against a minimal reproduction that can’t lie to you.
PT9 is where that discipline turned into automation. The custom sync-client emulator built to probe the app became a tool that found a bug on its own — a Windows-only dropped-connection defect that no amount of source review would have reliably surfaced — and the round closed out six issues in total, including two data-loss bugs that the eighth round’s own encryption fixes had introduced.
PT10 is where the loop terminated, honestly. I red-teamed PT9’s two trickiest fixes as their own attack surface and found a bug in each — the recovery probe accepting an empty-database decoy, and the restore-salt transfer writing an attacker-controlled salt unchecked. Both fixed. Then an independent verification hunt for new high-severity issues came back clean while every happy path still worked. That’s the close-out signal: you don’t reach zero bugs, you converge — the externally-reachable surface goes to zero, the residuals settle into local-access / lockout-class items, the no-journal-content-exposure invariant holds, and a fresh hunt comes back empty. Every fix is new attack surface; you keep attacking your own fixes until a clean pass comes back.
Lessons worth keeping
These generalize past this one app, which is really why I’m writing them down.
-
Network capture beats code review for whole classes of bugs. The leaked-public-key findings were invisible in the source and obvious the moment I watched the actual traffic. Static analysis has a ceiling; a packet capture doesn’t.
-
Fix the root cause, not the symptoms. Three “separate” findings were one. Always ask which findings share a cause before scheduling them as independent work.
-
A fix is new, untested code — and “we wrote it” is not “it works.” Every round found something introduced by the previous round’s fixes — including two critical bugs that lived inside earlier fixes, a flagship encryption feature that turned out never to have engaged at all, and then two more data-loss bugs hiding inside the fixes for that. Re-test after you patch, and verify the fix actually executes correctly on a real build, not just that the code was written.
-
The browser build needs the same discipline as the native build. Some of the worst access-control gaps were in the browser version, where there’s no shared type system to enforce parity with the backend. You have to audit them together.
-
When you fix a class of bug, sweep the whole codebase. Fixing key-wiping in two functions wasn’t enough; later rounds found the same mistake in three, then four more places. Do the completeness pass.
-
At some point you have to install the thing and break it. The most serious bugs survived multiple rounds of code review and showed up only when the real build ran on a real machine — including a security feature that looked correct in the source but was silently inert in production for its entire life. Code review can’t observe runtime state. Build it and break it. And when a result still surprises you, write a ten-line standalone reproduction that proves the mechanism — it can’t lie to you the way a passing-looking review can.
-
For a custom app, you have to build your own tools — and that’s where the real signal is. No off-the-shelf scanner can speak a private encrypted desktop protocol. Reimplementing the app’s own sync handshake as a standalone attack client wasn’t a side quest; it was the thing that found a bug code review never would have, and it’s the clearest signal of the work. If the instrumentation doesn’t exist, building it well is the test.
-
You converge, you don’t reach zero — and a clean independent pass is how you know. The honest end of a campaign like this isn’t “no bugs left,” which no finite test can prove. It’s that the externally-reachable surface has gone to zero, the residuals have migrated from “reads your data” to “needs your already-unlocked machine and only denies you access,” the one invariant that matters never bent, and a fresh, independent hunt for new high-severity issues comes back empty while the happy paths still work. The tenth round is what produced that pass — by attacking the ninth round’s own fixes first.
Why this is a portfolio piece, not a postmortem
I’m sharing this because it demonstrates the way I like to work, and because “we take security seriously” should mean something concrete.
What I want it to show:
- A security mindset by default. The app was designed to be private; this campaign was about proving it, adversarially, rather than asserting it.
- Engineering rigor. Real lab, real installed builds, two victim operating systems, network captures, memory forensics, custom protocol tooling, and — the part most teams skip — re-testing every fix instead of trusting it.
- Follow-through. Every confirmed finding through PT7 was fixed, each tied to a specific release or pull request; the eighth round’s flagship encryption fix is now verified end-to-end on the installed Windows build; the ninth round’s six fixes are committed and proven by reproductions and regression tests; and the tenth round red-teamed those fixes, fixed the bug it found in each, and closed on a clean independent verification pass. Found and fixed is the only standard that counts — and I’d rather tell you exactly where a fix stands than round it up to “done.”
- Tooling as a deliverable. The hardest and most telling part wasn’t the bugs — it was building the lab, the from-scratch sync-protocol attack client, the fuzzer and the anonymization pipeline, and the orchestrated, adversarially-verified loop that drove it all. Bespoke instrumentation was required, and it caught what generic scanners couldn’t.
- A modern, AI-assisted workflow used honestly. The AI orchestrator made the campaign faster and more thorough; I’ve been equally clear about what it couldn’t do and where a human had to stay in the loop.
The honest takeaway is the one I started with, only sharper now: I thought I’d built something secure, and I had — mostly. I even thought I’d fixed the parts that weren’t, and on one flagship feature I was wrong about that for a long time. Trying hard to break it — on real machines, with tools I had to build myself and reproductions that can’t be argued with — is what turned “mostly” into something I can actually stand behind. The defenses I couldn’t break are the ones I now trust; and the feature I thought was protecting me, but wasn’t, is exactly the kind of thing this whole process exists to catch.
The full technical findings and the code diffs for the fixes are tracked across the project’s release notes and pull requests — commit 3cd3a60, PR #122 (PT3 fixes), and PR #133 (the latest rounds), with the SQLCipher key-application fix (commit e6fb416, verified on the installed Windows build), the Windows factory-reset fix (commit b142f31), the ninth round’s fixes — non-blocking sync socket + lock guards (949f9a9), key-verified recovery promotion (0774a3e), restore-salt transfer (07e9d44), and the pairing-QR fix (4443a2b) — and the tenth round’s fixes that red-teamed those: the recovery empty-DB-decoy fix (2334269) and the validated-restore-salt fix (fa2d299) — on the current security branch (fix/security-pt6-acl-lockguard, version 1.8.0). Found and fixed is the standard; where a residual is local-access or lockout-class, or a live GUI-driven re-run is still owed, I’ve said so in the open.
More writing
Changing a Password Is a One-Liner — Unless You Encrypt Everything
Crash-safe master-password rotation in a zero-knowledge journaling app: re-encrypting across two layers and two runtimes with a single atomic flip a kill -9 can't corrupt.
ReadSeven Months of Vibe Coding: How I Built a Privacy-First Journal App with an AI Pair Programmer
Building MoodHaven Journal from scratch as a non-developer — zero-knowledge encryption, a Wear OS companion, peer sync, and what it actually takes to learn while you ship.
ReadSo We Rooted the Phone and Went Deeper
The unrooted phone told us our encryption held. Root let us check whether we were telling ourselves the truth. We weren't, entirely — and that was the point.
Read