Engineering updates from the team

We write these for people who want to know what we're actually building and learning, not for press releases. Notes on architecture decisions, early access observations, and things that didn't work the way we expected.

Early access cohort one: six weeks in

Our first cohort of 12 teams has been running NessForge for six weeks. Three things stand out. First, teams with monorepos get less value out of the box than teams with discrete repositories — we knew this gap existed but it's wider than we expected. The root cause is that path-filter-based monorepos don't give us enough signal to build precise service graphs without additional configuration. We're working on a guided setup flow that closes most of this gap.

Second — and this one surprised us — the pre-merge risk scoring feature on PRs gets used significantly more than the post-incident root cause analysis. The use case that motivated us to build this tool turns out not to be the primary hook for early adopters. People like catching problems before the deploy, not investigating them after. We're adjusting roadmap priorities accordingly.

Third: nobody likes the alert format. The information is right but the presentation is too dense. We're redesigning the PagerDuty and Slack payloads to be more scannable at 2am. Update shipping next week.

On monorepo support: harder than we expected

We assumed monorepo support would be a configuration option — a few fields in the setup form to tell us which directories map to which services. It turned out to be an architectural decision that required us to think harder about what "service" means in our data model.

Monorepos with Bazel or Nx have explicit dependency graphs we can import directly. Those work well. Plain monorepos with path-based CI triggers — the much more common case — don't give us a graph. You have to infer it from directory structure, import patterns in the code, and CI job overlap. The inference is fuzzy by nature.

We now handle both modes. The graph-import mode produces precise service boundaries and accurate cross-service impact scores. The inference mode is useful but requires manual review of the initial service graph — there's a new UI for that in the setup flow. We're writing up the full technical approach in a longer post next week.

False positives and the confidence threshold problem

When we started logging which root cause hypotheses we surfaced and whether users confirmed them, our initial accuracy rate on the top-ranked hypothesis was about 62%. That sounds reasonable until you realize that a wrong answer during an incident is worse than no answer — it sends people in the wrong direction with false confidence.

We spent most of April tightening the confidence threshold logic. The current model suppresses hypotheses below a 70% confidence score and surfaces them as "possible contributing factors" rather than "root cause." The investigation view shows the raw evidence for all hypotheses above 40%, so engineers can reason from the data directly even when the model isn't confident. Top-ranked hypothesis accuracy is now 84% on our labeled test set.

The 16% that are wrong tend to cluster around two patterns: services with very short deploy histories (less than 30 pipeline runs) and cascades that cross more than three service hops. We're working on explicit handling for both. The short-history case is easier — we can fall back to a simpler heuristic model with wider confidence intervals. The multi-hop cascade case requires rethinking how we weight indirect evidence.

Building a service graph from CI logs

The component of NessForge we spent the most time on isn't the root cause analysis engine — it's the graph construction layer that the analysis depends on. Most CI pipelines don't explicitly declare a service dependency graph. They just build and deploy things. Reconstructing that graph from logs required building a pattern library that recognizes how different teams name and structure their CI jobs.

We currently handle 40+ pipeline step patterns: Helm install/upgrade, kubectl apply, ecs update-service, Docker push/pull, various Terraform apply patterns, and a long tail of custom deploy scripts. For each pattern, we extract the service name, the environment, and the artifact version. Getting the name extraction right — across teams where helm upgrade auth-service-prod and helm upgrade prod-auth are the same service — took about two months of iteration and a lot of early access data.

The current pattern library covers about 85% of the pipelines we've seen so far without manual configuration. For the remaining 15%, we have a guided annotation flow where you review the service graph NessForge inferred and correct any groupings. Once you've done that once, subsequent deploys are attributed correctly.