Decisioning at the Edge: Policy Matching at Scale | Towards Data Science – Афганистан против Канады

This article was written in collaboration with César Ortega, whose insights and discussions helped shape the ideas presented here.

the right data product starts with sitting down with business partners to understand day-to-day workflows, handoffs, and bottlenecks. In this article, we discuss a challenge that doesn’t require a complicated solution, just a simple optimization problem. It’s a good example of how basic tools can still solve high-value problems. Specifically, we focus on optimizing the assignment of online insurance policies to trusted partners (independent insurance agencies: iia) at a global insurance company.

Independent insurance agencies are privately owned intermediaries that sell insurance policies from multiple insurers. Unlike large insurance companies, they don’t design products, set prices, underwrite risk, or pay claims; instead, they compare options across carriers, and place coverage that best fits the client’s needs, typically earning commissions for doing so. Here, the idea is to work together to deliver the best value for both the agency and the client.

Reducing complexity

Optimization in the real world is a spectrum. At one end are exact methods that can prove optimality, but they often can be computationally heavy at scale and can struggle as the problem grows in size and operational detail. At the other end are heuristics, ranging from simple rule-based baselines that are easy to explain but hard to maintain as complexity grows (often living in large excel sheets), to more advanced metaheuristics that scale well computationally but can be harder to justify, audit, or debug.

In practice, the most effective approach often sits in the middle: pragmatic “good-enough” formulations, built with carefully chosen constraints that reflect both business rules and real operational limits as human workload and service quality.

The goal is not theoretical perfection, but a solution that is deliverable, comparable against baselines, and easy to iterate. With a modular structure and a staged modeling strategy, we can start simple, measure impact with KPIs: tangible (time to assignment, optimal agency selection, etc.) and intangible (avoid unfair concentrations of policies in a few agencies, etc.), and evolve the system through small, safe improvements rather than waiting months for a textbook-optimal model.

Decisioning at the Edge: Policy Matching at Scale | Towards Data Science — **Figure 1:** A practical spectrum of assignment approaches: from exact optimization, to pragmatic “good-enough” models with measurable KPIs, to fast heuristics (rule-based baselines without optimality guarantees). Image by Author.

That’s why we chose a lightweight optimization formulation. It captures the constraints that matter (capacity, geographic eligibility, fairness, and bucket mix) and delivers a deterministic, auditable answer fast enough for real-time latency requirements. If needed, we can later extend the approach with decomposition techniques, stronger solvers, or heuristics without changing the system’s core contract.

The baseline

Historically, these digital policy-to-agency of assignments have been done manually, guided by non-standard criteria and individual judgment. While this approach sometimes works, this often resembled a round-robin approach: policies were distributed sequentially among available agencies (iia’s), with little consideration for differences in capacity, expertise, or expected performance.

Figura 2. Round-robin is a simple heuristic that assigns each new policy to the next agency in a fixed rotation. Image by Author.

While simple and seemingly fair, it often leads to delays, missed opportunities, and uncertainty about which agency (iia) is the best fit. The process also did not scale well, creating further assignment delays, and the outcomes did not consistently align with strategic goals such as profitability, quality, reproducibility, and transparency.

For this reason, we present how we solved an important problem using a lightweight integer programming approach that matches incoming online insurance policies to agencies in real time. The method maximizes a productivity score (reflecting how well an agency has performed in the past) while balancing agency capacity, fairness, and geographic admissibility constraints based on ZIP codes. We outline the mathematical formulation, the live-update logic, and the PuLP implementation.

Figure 3. PuLP overview: an open-source Python library for formulating and solving linear and integer optimization problems. AI-generated illustration created by the author with OpenAI.

What problem are we solving?

When a new online policy is purchased for a client, someone still has to decide which agency should handle it. We rely on agencies because they add value beyond the usual, such as advocating at claim time, servicing changes and renewals, cross-selling, and more. Importantly, agencies also originate demand: they bring new clients (and consequently new policies) into the funnel through their relationships and local presence, which compounds growth for the insurance company.

From a customer perspective, this matters because the agency is often the primary point of contact: the quality and speed of agency (iia) service can shape the overall experience, especially during high-stress moments like claims or urgent coverage changes.

Since agencies differ in licensing, geography, product strengths, sales reach, and day-to-day capacity, the “best” agency can vary from moment to moment. A real-time assignment optimization system routes each new policy to eligible, available agencies that are most likely to deliver value to both the business and the client, are treated fairly under clear rules, and are best positioned to drive future growth.

Good Old-Fashioned optimization

To create a clear assignment process, it’s essential to consider broader business goals: such as making sure the right agency handles the right type of policy to maximize key performance indicators (KPIs) like policy volume and quality. It’s also important that agencies understand how these decisions are made.

So, the implemented optimization algorithm should intelligently allocates policies to agencies based on KPIs, including the number and quality of policies they handle. Instead of relying on subjective or inconsistent human judgment, the algorithm uses real-time, data-driven decisions to optimize the policy assignment process efficiently and fairly.

The optimization model allocates policies to agencies based on measurable performance signals rather than subjective judgment. To make decisions reproducible, we translate agency performance into a numeric value the optimizer can use. This is done through productivity weights, where the key input is the swap ratio: a metric that captures how much value an agency brings per unit of policy it receives (for example loss ratio, tenure, premium, cross-selling, etc.).

In practice, the swap ratio allows the model to differentiate agencies that consistently deliver strong outcomes from those that underperform. Higher-value policies can then be directed toward agencies that have demonstrated the ability to handle them effectively, while still respecting capacity limits, geographic eligibility, fairness requirements, and bucket-mix constraints.

Rather than relying on static rules, the system recalculates decisions as constraints, ensuring that assignments remain aligned with current operational capacity and business priorities.

The system operates in two modes:

Batch mode: Optimizes based on historical allowances, providing a comprehensive review of past data to improve future allocations.
Online mode: Re-optimizes with each new incoming policy, including these new policies in the optimization process, then updates the inventory and refines the batch optimization accordingly.

In essence, the batch mode handles historical data to establish baseline rules and patterns, while the online mode ensures real-time adaptability by dynamically adjusting to new policies and conditions. This approach helps maintain optimal performance in a constantly changing environment.

The Solution: Optimization Algorithm

Given a set of agencies A and an incoming flow of policies P, we want to decide how many policies to assign to each agency and each policy category (Gold, Silver, Bronze) so that we maximize total productivity while adhering to certain constraints (agency capacity, ZIP code eligibility, , total count, penalties, etc.).

Objetive function:

x is the decision variable in the optimization problem and represents the number of policies assigned to agency a and category c, we only manage positive integer values only.
A: set of agencies (size |A| = m); a∈A.
C: set of categories {Gold, Silver, Bronze} (|C| = p = 3); c ∈ C.
The productivity weights w is one number per agency that estimates the benefit of sending one more policy to that agency. This is calculated with the time the agency have over the swap ratio.

Rules we must respect (constraints):

Logical constraints:

Logical constraints are the ones required for the model to be mathematically well-defined regardless of business context (e.g., variables are integers and totals balance).

Integrality & Non-negativity: you can’t send negative or fractional policies.

2. Global conservation: the total number of policies assigned across all agencies and buckets must equal the total inventory available for assignment in this run (the sum of all agency capacities).

Business constraints:

Business constraints encode domain policy choices or operational rules (e.g., per‑agency capacity, ZIP admissibility, bucket mix, online floors) that could change if the business rules change.

Per-agency capacity: an agency cannot receive more policies than it can currently handle (Ua), which corresponds to the sum of the rows in the policy assignment matrix.

2. ZIP admissibility: agencies are only licensed or authorized to service policies in specific geographic areas.

If a ZIP is inadmissible for agency a, lock its row total

By enforcing ZIP eligibility in the optimization, we ensure every assignment is operationally feasible, protecting service quality, because agencies are strongest in the regions where they have local presence and expertise.

3. Bucket bounds: business control that keep the monthly allocation balanced across policy tiers.

Without them, the optimizer might push almost everything into the most profitable tier, which can create risk concentration and operational strain. By setting minimums and maximums per bucket, you enforce a healthy mix that reflects risk appetite, service capacity, and strategic targets.

What’s Not in the batch

Batch mode is a full re‑optimization on a fixed inventory. It finds the best baseline allocation without reacting to a single new policy event. For that reason, we exclude the following “live” constraints that are only needed when a new policy arrives:

Per‑agency floors from the previous allocation. Floors are an online safeguard that prevents any agency from losing policies when a new one arrives. In batch we are computing the baseline itself, so there’s no “previous” baseline to protect.

ZIP lock is a live‑mode safety rule: when a single new policy arrives, if that policy’s ZIP is not allowed for agency A, we freeze agency A at cell level (Gold/Silver/Bronze) at its previous cell values so the new policy can’t be assigned there and we don’t move any existing policies away.
No headroom (“+1”) trick. Headroom is used in online mode to keep feasibility when adding exactly one new policy. Batch mode does not add a single policy; it allocates the entire inventory at once.
Bucket bounds still apply online: each new policy must keep Gold/Silver/Bronze totals within their min/max. These restrictions are updated on a monthly basis or as business requirements change.

Why this works

By separating the process into batch (global balance) and online (local adjustment), the system achieves both stability and responsiveness. Batch optimization provides a consistent, auditable reference point, while live decisioning handles real-time arrivals without disrupting the overall structure. This combination enables fast operational decisions while preserving fairness, capacity control, and alignment with strategic targets.

E2E Implementation

The end-to-end process involves more than encoding rules in an optimization model. In our AWS setup, Airflow orchestrates scheduled data pipelines that refresh intermediate tables on daily, weekly, and monthly cadences. These jobs pull upstream data, build curated datasets and live inventory tables, and store them in S3. The Optimization service reads the latest inputs from S3 and, when needed, calls a SageMaker endpoint to score candidates and select the best agency under the capacity, fairness, and ZIP-code constraints described earlier. External applications send requests through an HTTPS endpoint on API Gateway, which routes them via middleware responsible for authentication, validation, and request transformation before invoking the Optimization service (and SageMaker, if required). The response (containing the selected agency and decision metadata) is returned to the Contact Center and ultimately the end user. Finally, outcomes and logs are written back to S3, feeding Airflow-driven monitoring and retraining, and Jenkins redeploys updated components to close the loop.

Toy example

To exemplify the mechanics of the original production implementation in a simplified and self-contained manner we create a synthetic, runnable toy example demonstrating the core logic behind policy-to-agency assignment using linear integer programming with the PuLP library in Python.

The example sets up a small scenario with four agencies and three policy categories (“Gold,” “Silver,” and “Bronze”). Productivity scores and capacity limits are assigned for each agency, along with constraints such as ZIP code eligibility and minimum/maximum policy mix per category. The goal is to maximize the total productivity score while respecting these constraints.

While the example is synthetic and uses randomly generated weights and capacities, it effectively illustrates the fundamental optimization logic and workflow, including variable construction, constraint enforcement, and solution interpretation. This approach can be directly scaled and adapted to real-world data and business constraints as demonstrated in the full implementation.

Table 1: Baseline assignment (batch) and online assignment after one new policy.

In Table 1, we illustrate a simple iteration. Batch mode first computes a baseline monthly plan that allocates the initial inventory. Online mode then simulates incoming policies one at a time toward a target monthly total; each arrival triggers a re-optimization that preserves existing allocations and assigns only the incremental policy to an eligible agency (e.g., respecting ZIP admissibility). In this example, the new policy is a high-value (Gold) policy and its ZIP is admissible for A1, so the increment goes to A1. If the ZIP were inadmissible for A1, the policy would be routed to the best admissible agency instead. This process repeats until the monthly bucket target is reached.

Code

The code is available in this repository: Link to the repository

To run the experiments, set up a Python ≥3.11 environment with the required libraries (e.g., pulp, etc.). It is recommended to use a virtual environment (via venv or conda) to keep dependencies isolated.

Conclusion

Compared to a round-robin baseline that assigns policies with no intelligence, our approach uses a productivity matrix derived from an swap ratio to route policies where they are expected to create the most value. The optimization balances tangible metrics (the measurable value and capacity each agency can deliver) with intangible considerations (fairness, stability, and the trust agencies place in a predictable allocation process). In short, it replaces a blind rotation with a transparent, auditable decision rule that reflects both performance and operational constraints.

By making policy assignments more transparent and predictable, we’ve built trust and collaboration. Agencies (iia’s) now understand how decisions are being made, which has increased their confidence in the process.

This example shows how even a relatively small optimization problem can generate meaningful improvements. By starting with a simple, well-defined formulation, we create a solid foundation that delivers immediate value while enabling future evolution. The same framework can be extended through incremental iterations, incorporating richer signals, and more advanced decision logic. In practice, the greatest impact often comes not from building a complex system upfront, but from starting simple and improving continuously as the business learns and the data matures.

References

[1]PuLP documentation, “PuLP 3.3.0 documentation.” COIN-OR. https://coin-or.github.io/pulp/main/includeme.html