merge request
We built a code review AI Agent that reads every merge request, parses the diff, and posts concise, actionable inline comments, so reviewers get high-signal feedback before they touch a single line of code. For CEOs, we see this as a force-multiplier: the agent enforces basic standards, surfaces security and performance issues, and keeps human reviewers focused on design and edge cases instead of busywork.
This isn’t about replacing engineers; it’s about removing repetitive load, shortening time-to-merge, and creating an auditable trail of why a comment was made. We piloted the workflow on several repos and measured fewer trivial review cycles and faster merges.
Benefits
Looking to build something similar? Build AI Agent with us for the best results and support.
We built a code review agent because our teams were drowning in routine merge request work that added no strategic value. Every day we saw small, repetitive issues- style nits, missing null checks, trivial security smells come up again and again, while reviewers spent their time on low-signal feedback instead of design and architecture. The result: merge requests piling up, reviewers burning out, and meaningful releases delayed by avoidable cycles.
The problem is threefold. First, review bottlenecks slow delivery: a single stalled MR can block downstream work. Second, feedback is inconsistent across reviewers and teams, creating rework and confusion. Third, reviewer fatigue makes it harder to catch real risks consistently. That’s why we built an automated, LLM-powered MR workflow that parses diffs, runs targeted checks, and posts concise inline suggestions, freeing humans to focus on higher-value judgment calls.
The business impact is straightforward: faster time-to-merge, fewer reviewer hours spent on routine checks, and more consistent onboarding for new engineers because the agent enforces baseline standards. That consistency also reduces risk, security and performance weaknesses are flagged earlier, not after production incidents.
Use cases we prioritized:
In short, our code review agent targets the drudgery so humans can do the unique, high-impact work only people can do.
| Object | Purpose | Where Used |
|---|---|---|
| mergeRequestLink | Parsed MR URL components including domain, project path, MR ID, and MR URL. | Webhook parsing, project resolution, email summaries. |
| projectId | Canonical numeric GitLab project ID, resolved dynamically if missing. | All GitLab API calls including diff fetch and discussions. |
| getMRDataChanges | Complete MR payload from GitLab containing metadata and file diffs. | Validation, diff parsing, commit SHA extraction. |
| changes[] | Per-file diff objects with paths, diff hunks, and file state flags. | Split logic, skip rules, AI diff processing. |
| originalCode / newCode | Sanitized code blocks extracted from diff hunks without +/- markers. | LLM review prompt payload. |
| position[...] | Inline comment anchor including path, SHA references, and line numbers. | GitLab inline discussion posting. |
| notificationKey | Workflow outcome identifier (Reviewed, Blocked, Conflict, etc.). | Email templates, logs, webhook response. |
| gitlabMRReviewPrompt | Static or external system prompt defining AI review rules. | AI agent invocation. |
| canReview | Boolean toggle to skip review when disabled. | Pre-review gating logic. |
| has_conflicts | GitLab conflict indicator blocking automated review. | Validation gate. |
| merge_status | Mergeability status indicating if MR can be merged. | Validation before processing and commenting. |
Diffs are the agent’s map, we turn each @@ -a,b +c,d @@ hunk into a tiny, reviewable story that the code review agent can understand. The hunk header means: a is the old-file start line, b is how many lines that old hunk covers; c is the new-file start line, and d is the new count. From there we walk each hunk line-by-line: lines beginning with a space are context (advance both old and new counters), – lines advance only the old counter (they belong in originalCode), and + lines advance only the new counter (they belong in newCode). This produces sanitized originalCode and newCode blocks (no leading +/- or @@ markers) and an exact mapping from hunk offsets to old_line / new_line integers used for anchor placement.
Edge cases matter. Skip diffs where deleted_file == true (nothing to anchor), skip renamed_file entries that show no content change, and skip diffs that don’t start with @@ (binary or generated files). Those rules keep the agent from posting unusable comments.
Anchoring depends on SHA correctness: GitLab requires the right base_sha, start_sha, and head_sha together with the chosen old_line or new_line. If the MR advances between fetch and post, anchors fail (422). Re-fetch SHAs just before posting or skip posting if they’ve changed. This is why precise line mapping is essential and why SHA revalidation is baked into the workflow.
These tests catch the common failures that break inline comments and guard the agent from noisy mistakes.
Our code review ai agent succeeds or fails based on the prompt and how we shape its behavior, this section explains the goals, the payload we send, token strategy, and how we clean the agent’s output so it’s safe to post to a merge request.
We design the system prompt for determinism: the agent must return either Done or a short (1–3 line) actionable comment. We bias the agent toward safety and precision: language heuristics (file extension / path hints to choose idiomatic rules), focused security checks (SQLi, XSS patterns, unsafe eval usage), code-idiom suggestions (e.g., === vs ==), and performance anti-pattern detection (inefficient loops, unbounded allocations). Keep the model settings conservative (low temperature, small max_tokens) so output stays terse and parseable.
Each LLM call should include two parts: the system prompt + a user message containing (a) file path and language hint (from extension), (b) hunk metadata (hunk offsets, start lines), and (c) the sanitized originalCode and newCode blocks. This gives the agent the exact context to decide whether the change introduces a problem, and whether to anchor feedback to old_line or new_line. Keep the payload structured so parsing the agent output is trivial.
Send hunks (plus a handful of context lines) rather than full files. For most repos we recommend ~400–800 tokens per call and tight max_tokens for replies so the agent stays concise. For multi-hunk files you can either: (a) call the agent per-hunk (simpler anchors), or (b) consolidate hunks into a single prompt when cross-hunk reasoning is needed, but then ensure you include clear hunk boundaries and map responses to the correct hunk. Cache by file SHA to avoid repeated costs.
After the agent returns, normalize done → Done (case-insensitive), trim whitespace, and truncate multi-line answers to the first actionable sentence. If the output cannot be parsed or the agent returns verbose/ambiguous text, mark that file as agent_error and route it to a human-review notification (email/log). Finally, always re-validate MR SHAs before posting to avoid 422 anchor failures.
This combination of tight prompts, conservative token strategy, and robust post-processing lets us automate routine checks safely while keeping humans in the loop for ambiguous or high-risk findings.
Failures happen, we designed the workflow to fail loudly and safely. We classify errors into four buckets and handle each with clear retries, escalation, and audit logging.
Always surface errors in the execution stamp (executionId, errorMessage, status) so operators can triage quickly.
Always surface errors in the execution stamp (executionId, errorMessage, status) so operators can triage quickly.
We built the code review agent to deliver measurable outcomes: faster time-to-merge, fewer reviewer hours spent on trivial checks, more consistent feedback, and an auditable trail of why comments were made, all while reducing the chance that obvious security or performance issues reach production. Our risk posture is conservative: the agent flags routine problems and defers ambiguous or high-risk cases to humans; it never blocks merges.
Next step: pilot this on one repository or team, measure reviewer time and comments-per-MR, then iterate. Contact our AI Agent experts if you’d like us to run a two-week pilot and deliver a clear delta in reviewer time and comment quality.
Most organizations are sitting on 3–4 use cases that…
Microsoft isn't just adding AI features to Power Automate.…
The 45% gap between what's automatable and what's actually…
In 2026, your automation platform is no longer a…
We've been implementing these architectures for mid-market enterprises across…