Idempotency and Deduplication in Workflow Automation: The Complete Guide to Preventing Duplicate Records, Double-Charges, and Repeated Emails
19 min read
By LogicLot Team · Last updated March 2026
Comprehensive guide to idempotency in automation workflows. Covers idempotency keys, event ID deduplication, upsert strategies, database-level uniqueness constraints, and how Stripe, Shopify, HubSpot, and Salesforce handle duplicate prevention. Real-world examples of duplicate order processing, double-charging, and repeated notification failures.
Duplicate records are the most common data quality problem in automated systems. A 2023 analysis by Gartner found that poor data quality costs organisations an average of $12.9 million per year, with duplicate records being a leading contributor. In automation specifically, duplicates arise from a predictable set of causes: webhook retries, workflow re-executions, overlapping scheduled runs, and partial failures that leave the system in an inconsistent state.
The consequences range from annoying to catastrophic. A duplicate CRM contact is a minor data quality issue. A duplicate charge on a customer's credit card is a customer service crisis. A duplicate order fulfilment ships two packages and doubles your shipping costs. A duplicate compliance notification can trigger regulatory scrutiny. The solution is idempotency—designing every step in your automation to be safe when executed more than once.
This guide explains idempotency in practical terms, walks through real-world duplicate scenarios, covers implementation strategies from idempotency keys to database constraints, and provides platform-specific guidance for Zapier, Make, n8n, Stripe, HubSpot, and Salesforce.
What is idempotency and why it matters for automation
An operation is idempotent if executing it multiple times produces the same result as executing it once. The concept comes from mathematics (multiplying by 1 is idempotent—the value never changes no matter how many times you multiply) and has been adopted in distributed systems engineering as a fundamental reliability pattern.
In HTTP terms: GET, PUT, and DELETE are designed to be idempotent. Reading data (GET) does not change it. Replacing a resource (PUT) with the same data produces the same state. Deleting a resource that is already deleted produces the same state (the resource does not exist). POST is not idempotent by default—calling `POST /api/orders` twice with the same data creates two orders.
For automation workflows, idempotency means: if your workflow runs twice for the same trigger event, the end state is identical to running it once. No duplicate records, no double charges, no repeated emails, no extra tasks.
The real-world cost of non-idempotent automation
Duplicate order processing. An e-commerce store uses a webhook from Shopify to trigger order fulfilment in their warehouse management system. The webhook handler times out (Shopify's 5-second limit), Shopify retries, and the handler processes the retry as a new order. Two packages ship. The customer receives duplicates. The business absorbs the cost of the extra shipment, return handling, and customer support. Shopify's webhook documentation explicitly warns that webhooks may be delivered more than once and that handlers must be idempotent.
Double-charging customers. A subscription platform uses Stripe to process recurring payments. A `payment_intent.succeeded` webhook triggers an internal workflow that creates an invoice and sends a receipt email. The webhook is delivered twice due to a network timeout on the first delivery. Without deduplication, two invoices are created, two receipt emails are sent, and the customer's billing history shows a phantom duplicate. While the customer is not actually charged twice (Stripe handles payment idempotency), the internal records are incorrect, causing confusion for accounting and customer support.
Repeated notification emails. A workflow sends a welcome email when a new user signs up. The workflow platform retries the email step after a temporary SMTP failure, but the first attempt actually succeeded—the response just arrived late. The user receives two identical welcome emails. At scale (thousands of signups per month), this creates a perception of unprofessionalism and can impact email deliverability scores if recipients mark duplicates as spam.
Duplicate CRM contacts. A form submission triggers contact creation in HubSpot. The workflow runs, creates the contact, and then fails on the next step (Slack notification). The platform retries the entire workflow, including the contact creation step. Without deduplication, a second contact is created with the same email. Sales reps see duplicate records, activity history is split, and lead scoring is inaccurate.
The five causes of duplicates in automation
Understanding why duplicates happen is the first step to preventing them.
1. Webhook retries
The most common cause. Webhook providers deliver events with at-least-once semantics—they guarantee the event will arrive at least once, but it may arrive more than once. If your endpoint does not respond within the provider's timeout window (5 seconds for Shopify, 20 seconds for Stripe, 10 seconds for GitHub), the provider marks the delivery as failed and schedules a retry. The retry delivers the same event payload with the same event ID.
Retries also happen when your endpoint returns a 5xx error, even if processing partially completed. The provider cannot know whether your system processed the event—it only knows the response indicated failure.
2. Workflow platform retries
Zapier, Make, and n8n retry failed workflow steps. The retry behaviour varies: some platforms retry only the failed step, others retry the entire workflow from the beginning. If Step 3 of a 5-step workflow fails, and the platform retries from Step 1, Steps 1 and 2 execute again. If Step 1 creates a CRM contact and Step 2 sends an email, the retry creates a second contact and sends a second email.
3. Manual re-runs
A team member debugging a failed workflow clicks "Run again" or "Replay" in the workflow platform UI. The workflow executes with the same trigger data, creating duplicates of any records produced by the original run. Manual re-runs are especially dangerous because the person triggering them may not realise that the original run partially succeeded.
4. Scheduled workflow overlap
A workflow runs on a schedule—every 15 minutes, for example—to sync new records from System A to System B. If one execution takes 20 minutes (due to a large batch or slow API), the next scheduled execution starts before the first completes. Both executions process overlapping time windows, and records that fall in the overlap are processed twice.
5. API polling race conditions
A workflow polls an API for new records using a "created after" timestamp. If two polling cycles overlap (due to variable execution times), both may fetch records created in the overlapping window. Without deduplication, those records are processed twice.
Integrate.io's analysis of webhook best practices emphasises that both duplicates and gaps are expected in distributed systems. Designing exclusively for one without addressing the other leaves vulnerabilities.
Idempotency keys: the primary defence for API calls
An idempotency key is a unique identifier that you send with an API request. The server stores the key along with the response. If the same key is sent again, the server returns the stored response instead of processing the request again. No duplicate is created.
How Stripe implements idempotency keys
Stripe provides the canonical implementation. Include an `Idempotency-Key` header with any POST request:
``` POST /v1/charges Idempotency-Key: order_12345_charge_attempt_1 ```
Stripe stores the idempotency key for 24 hours. If the same key is sent again within that window, Stripe returns the original response (success or error) without processing the request again. This prevents double-charging even if your automation retries the charge request.
Key generation strategy: The idempotency key should be deterministic—derived from data that is unique to the operation and constant across retries. Good patterns include:
- `{order_id}_{action}` — e.g. `order_12345_charge`
- `{workflow_run_id}_{step_id}` — e.g. `run_abc123_step_create_charge`
- `{event_id}_{action}` — e.g. `evt_xyz789_fulfil`
Do not use random UUIDs generated at request time. A new UUID on each retry defeats the purpose—each retry gets a unique key and is processed as a new request.
Idempotency key support across platforms
| Platform / API | Header or parameter | TTL | Documentation | |----------------|-------------------|-----|---------------| | Stripe | `Idempotency-Key` header | 24 hours | Stripe idempotent requests | | PayPal | `PayPal-Request-Id` header | Varies | PayPal API idempotency | | Square | `idempotency_key` in request body | Varies by endpoint | Square idempotency | | Adyen | `Idempotency-Key` header | Supported | Adyen idempotency | | Twilio | Not natively supported | N/A | Use deduplication at your layer | | HubSpot | Not natively supported | N/A | Use upsert by email instead |
For APIs that do not support idempotency keys natively, implement deduplication at your automation layer (see below).
Implementing idempotency keys in workflow platforms
**In Zapier:** Zapier's built-in app connectors do not expose idempotency key headers. Use a Code step (JavaScript or Python) to make HTTP requests directly with custom headers, or use the Webhooks by Zapier app to send custom POST requests with an `Idempotency-Key` header.
**In Make:** Use the HTTP module to make custom API requests with arbitrary headers. Set the `Idempotency-Key` header using a value derived from the trigger data (e.g. the webhook event ID or a formula combining unique fields).
**In n8n:** Use the HTTP Request node with custom headers. Or use a Code node to construct the request with the idempotency key. n8n's expression system allows you to reference the workflow execution ID (`$execution.id`) as part of the key, though note that retried executions may get a new execution ID depending on how the retry is triggered.
Event ID deduplication for webhooks
Every major webhook provider includes a unique identifier in each event. Storing and checking these identifiers before processing prevents duplicate handling.
Provider event IDs
- **Stripe:** The `id` field in the event object (`evt_1Nq5dF2eZvKYlo2CzQ3U9FXh`). Unique across all Stripe events.
- **Shopify:** The `X-Shopify-Webhook-Id` header. Unique per delivery.
- **GitHub:** The `X-GitHub-Delivery` header (UUID). Unique per delivery.
- **HubSpot:** Event objects include unique identifiers.
- **SendGrid:** Each event includes an `sg_event_id`.
- **PayPal:** The `id` field in the webhook event.
Redis-based deduplication
Redis is the most popular choice for high-throughput deduplication because of its sub-millisecond response times and built-in TTL support.
Pattern: Use the `SET` command with `NX` (only set if not exists) and `EX` (expiration in seconds):
``` SET webhook:{event_id} 1 NX EX 604800 ```
If the command returns `OK`, the event is new—process it. If it returns `nil`, the event has already been seen—acknowledge with 200 and skip processing.
TTL selection: 7 days (604800 seconds) is a safe default. It covers the retry window of all major providers (Stripe retries for up to 3 days, Shopify for up to 48 hours). For high-volume systems, 3 days may be sufficient and reduces memory usage.
Memory estimation: Each Redis key uses approximately 50-100 bytes. At 10,000 webhook events per day with a 7-day TTL, you store approximately 70,000 keys using 3.5-7 MB of memory. Redis handles this trivially.
Database-based deduplication
For systems that do not use Redis, a database table provides durable deduplication with the added benefit of an audit trail.
Table structure:
- `event_id` VARCHAR PRIMARY KEY
- `provider` VARCHAR (stripe, shopify, github)
- `event_type` VARCHAR (payment_intent.succeeded, orders/create)
- `received_at` TIMESTAMP DEFAULT NOW()
- `processed_at` TIMESTAMP NULL
- `status` VARCHAR (pending, completed, failed)
- `payload_hash` VARCHAR (optional, for verification)
Deduplication logic: Attempt an INSERT. If it succeeds (no primary key violation), the event is new—process it and update `status` to `completed`. If it fails with a unique constraint violation, the event is a duplicate—skip processing.
Cleanup: Schedule a job to delete rows older than your retention period (7-30 days) to prevent the table from growing indefinitely. Use a partial index on `status = 'failed'` to efficiently query events that need investigation.
Composite deduplication keys
Sometimes the event ID alone is not sufficient. Consider a workflow that processes a Stripe `invoice.payment_succeeded` event and performs two actions: creates a record in the accounting system and sends a receipt email. If the workflow fails after the first action and is retried, you need to deduplicate each action independently.
Use composite keys: `{event_id}:{action}`. Store `evt_123:create_record` and `evt_123:send_receipt` as separate deduplication entries. On retry, the first action is skipped (already completed) while the second action proceeds.
Upsert strategies: create-or-update as a deduplication pattern
An upsert (update-or-insert) operation checks whether a record with a given key exists. If it does, the record is updated. If it does not, a new record is created. Upserts are inherently idempotent for the same input—running an upsert twice with the same data produces the same result.
Database-level upserts
SQL databases support upserts natively:
PostgreSQL: `INSERT INTO contacts (email, name) VALUES ('user@example.com', 'Alice') ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;`
MySQL: `INSERT INTO contacts (email, name) VALUES ('user@example.com', 'Alice') ON DUPLICATE KEY UPDATE name = VALUES(name);`
The `ON CONFLICT` / `ON DUPLICATE KEY` clause requires a unique constraint or primary key on the deduplication field (email in this case). Without the constraint, the database cannot detect duplicates.
CRM upserts
**HubSpot:** The Contacts API supports "create or update" by email. If a contact with the specified email exists, it is updated. If not, a new contact is created. This is the recommended approach for automation workflows that create contacts—always use the upsert endpoint instead of the create endpoint.
**Salesforce:** Salesforce supports upsert by external ID. Map a field in your automation (e.g. a form submission ID or CRM sync ID) to an External ID field in Salesforce. Use the PATCH method with the external ID to upsert: the record is created if the external ID is new, or updated if it exists.
**Pipedrive:** Pipedrive's person API supports search-then-create-or-update. Search by email first; if found, update. If not found, create.
Email platform upserts
**Mailchimp:** Use the PUT endpoint for list members: `PUT /lists/{list_id}/members/{subscriber_hash}`. The subscriber hash is the MD5 hash of the lowercase email address. PUT creates or updates the member—idempotent by design.
**SendGrid:** The "Add or Update Contacts" endpoint accepts contacts and deduplicates by email. Sending the same contact twice updates the existing record.
Database-level uniqueness constraints as a safety net
Even with application-level deduplication, database constraints provide a final safety net. If your application logic has a bug that allows a duplicate to slip through, the database constraint prevents the duplicate record from being created.
Unique constraints
Add unique constraints on the fields that must be unique:
- `email` for contacts
- `order_id` + `line_item_id` for order line items
- `external_reference_id` for records synced from external systems
- `event_id` for processed webhook events
When your application attempts to insert a duplicate, the database returns a constraint violation error. Your application should handle this error gracefully—log it, skip the insert, and continue processing.
Partial unique indexes
Sometimes uniqueness applies only to a subset of records. For example, you may want to enforce uniqueness on `email` only for records with `status = 'active'` (allowing multiple archived records with the same email). PostgreSQL supports partial indexes:
``` CREATE UNIQUE INDEX idx_contacts_email_active ON contacts (email) WHERE status = 'active'; ```
Advisory locks for critical operations
For operations that must not run concurrently (e.g. processing a specific order), use advisory locks. PostgreSQL's `pg_advisory_xact_lock()` acquires a lock for the duration of the transaction. If another process attempts to acquire the same lock, it waits. This prevents race conditions where two webhook deliveries for the same event are processed simultaneously.
Handling 409 Conflict responses
Some APIs return HTTP 409 Conflict when you attempt to create a resource that already exists, or when an idempotency key has already been used with different request parameters.
When 409 means "already done": If you are retrying a create operation and receive 409, it typically means the original request succeeded and the record already exists. Treat this as success—do not retry again. Your automation should extract any relevant data from the 409 response (some APIs return the existing record in the response body) and continue to the next step.
When 409 means "conflict": If you send an idempotency key with request parameters that differ from the original request, Stripe and similar APIs return 409 to indicate that the key was already used with different data. This is a programming error—your idempotency key generation logic is producing the same key for different operations. Fix the key generation.
In workflow platforms: Configure error handling to treat 409 as a non-error. In Make, use an error handler on the HTTP module that checks the status code and continues if it is 409. In n8n, use an IF node after the HTTP Request that checks `$json.statusCode === 409` and routes to a "skip" path. In Zapier, use a Code step with try/catch to handle the 409 response.
Reconciliation: catching what deduplication misses
Deduplication prevents duplicates, but it does not prevent gaps. A webhook that never arrives, a workflow that silently fails, or a race condition that causes both processes to skip an event—these create missing records. Periodic reconciliation catches them.
How reconciliation works
1. Define the source of truth. Choose one system as authoritative. For e-commerce, this is usually the payment processor (Stripe) or the e-commerce platform (Shopify). For CRM, it is usually the CRM itself. 2. Compare records. Query the source system for all records in a time window (e.g. the last 24 hours). Query the target system for corresponding records. Identify discrepancies: records in the source that are missing in the target (gaps), and records in the target that are missing in the source (orphans). 3. Backfill gaps. For missing records, trigger the creation process. Use the same idempotent logic so that if the record was actually created but the reconciliation query missed it (due to eventual consistency), the backfill is a safe no-op. 4. Investigate orphans. Records in the target that do not exist in the source may indicate a bug in your automation or a record that was deleted in the source. Investigate and clean up as appropriate.
Scheduling reconciliation
For critical data flows (payments, orders, compliance), run reconciliation daily. For less critical flows (marketing contacts, analytics), weekly is sufficient. Schedule reconciliation during off-peak hours to reduce API load.
Google Cloud Workflows documentation discusses patterns for ensuring exactly-once execution in cloud workflows, but acknowledges that reconciliation remains necessary for distributed systems where true exactly-once delivery is impractical.
Platform-specific deduplication guidance
Zapier
Zapier handles some deduplication automatically through its trigger deduplication feature: Zapier stores the IDs of previously triggered records and skips duplicates. However, this only applies to polling triggers, not webhook triggers.
For webhook triggers: implement deduplication in a Code step. Check the event ID against a stored list (use Zapier's Storage by Zapier or an external store). If the ID has been seen, use a Filter step to stop the Zap.
For action steps: when creating records in CRMs or databases, use the "Find or Create" action pattern where available (e.g. HubSpot's "Create or Update Contact"). Where not available, add a "Find" step before "Create" and use a Filter to skip creation if the record exists.
Make (formerly Integromat)
Make provides several tools for deduplication:
Data Stores: Make's built-in Data Store module can store processed event IDs. Use a "Search Records" operation at the beginning of your scenario to check for the event ID. If found, use a Router with a "No further modules" route. If not found, add the event ID to the Data Store and continue processing.
Error handling: Configure error handlers on modules that may produce duplicates. If a "Create Contact" module fails with a duplicate error (409 or similar), the error handler can route to an "Update Contact" module instead.
Aggregators: For batch operations, use the Array Aggregator to collect items and the Set module to deduplicate by a key field before processing.
n8n
n8n provides the most flexibility for custom deduplication logic:
Code node: Write custom JavaScript to check a deduplication store (Redis, database, or n8n's built-in static data) and skip processing if the event has been seen.
IF node: After a "Check if exists" query, route to different branches based on whether the record exists.
Merge node: Use the Merge node with "Remove Duplicates" mode to deduplicate items in a batch by a key field.
Built-in retry: n8n has per-node retry settings. When configuring retries, ensure that the retried node is idempotent. If it is not, implement deduplication logic before the node.
Observability: monitoring deduplication health
Deduplication logic, like any code, can fail. Monitor these metrics to detect issues early:
- Deduplication hit rate: The percentage of incoming events that are identified as duplicates and skipped. A baseline rate of 1-3% is normal (from webhook retries). A sudden spike above 10% indicates a delivery problem—the source system is retrying heavily, or your handler is responding too slowly.
- Deduplication miss rate: The number of duplicate records that slipped through your deduplication logic and were created in the target system. This should be zero. If it is not zero, you have a bug in your deduplication logic or a race condition.
- Reconciliation delta: The number of discrepancies found during each reconciliation run. A consistently high delta indicates systematic gaps in your automation.
- 409 response rate: The percentage of API calls that return 409. A high rate means your automation is frequently attempting to create records that already exist—either the deduplication logic is not catching them, or the workflow is being triggered more often than expected.
- Idempotency key collision rate: How often the same idempotency key is used with different request parameters (resulting in an error). This should be zero—any non-zero rate indicates a key generation bug.
Track these metrics in your monitoring system (Datadog, Grafana, or platform-native dashboards) and set up alerts for anomalies.
Designing idempotent multi-step workflows
In a multi-step workflow, each step that has side effects must be individually idempotent. The workflow as a whole is only idempotent if every step is idempotent.
Step-by-step idempotency patterns
Create contact: Use the CRM's upsert-by-email endpoint. If the contact exists, it is updated. If not, it is created. Same input, same result.
Send email: Use a composite deduplication key: `{recipient_email}:{template_id}:{trigger_event_id}`. Check if an email with this key has already been sent (store in database or email platform tags). If yes, skip. Alternatively, some email APIs support idempotency keys directly.
Create task in project management: Use an external reference ID. Before creating, search for a task with the same external reference. If found, skip or update. Asana, Monday.com, and ClickUp support external IDs or custom fields for this purpose.
Create invoice: Use an idempotency key derived from the order ID and invoice type. Stripe supports idempotency keys on invoice creation. For other billing systems, check for an existing invoice with the same order reference before creating.
Send Slack notification: Slack does not prevent duplicate messages natively. Use a deduplication store to track `{channel}:{trigger_event_id}`. If already sent, skip. Alternatively, use Slack's `update` method to overwrite an existing message instead of posting a new one.
The saga pattern for rollback
When a multi-step workflow fails partway through and the completed steps have side effects, you need a strategy for recovery. The saga pattern defines compensating actions for each step:
- Step 1: Create CRM contact. Compensating action: Delete the contact.
- Step 2: Create project. Compensating action: Archive the project.
- Step 3: Send welcome email. Compensating action: None (email cannot be unsent—this is why email should be the last step).
If Step 2 fails, run the compensating action for Step 1 (delete the contact). This returns the system to a consistent state. In practice, most automation teams prefer idempotent at-least-once processing over saga-based rollback, because compensating actions add complexity and some operations (like sending emails) cannot be compensated.
Production checklist
- Use idempotency keys for every API call that creates a resource. Derive keys deterministically from the trigger event.
- Deduplicate webhook events using provider event IDs. Store in Redis (with TTL) or a database table (with cleanup job).
- Prefer upserts over create-then-update. Use CRM upsert endpoints, database ON CONFLICT clauses, and email platform PUT methods.
- Add database uniqueness constraints on critical fields as a safety net. Handle constraint violations gracefully in application code.
- Treat 409 Conflict as success when retrying create operations. Extract data from the response if needed.
- Run periodic reconciliation to catch gaps. Compare source and target systems daily for critical data.
- Design each workflow step to be independently idempotent. The workflow is only safe if every step is safe.
- Monitor deduplication health. Track hit rates, miss rates, 409 rates, and reconciliation deltas. Alert on anomalies.
- Order side effects strategically. Put irreversible operations (email sends, external notifications) last in the workflow.
- Document your deduplication strategy. Every team member who modifies workflows should understand how duplicates are prevented.
Experts on LogicLot design idempotent automation systems that handle retries, duplicates, and partial failures correctly. If you are experiencing duplicate records, double-charges, or repeated notifications in your current automation, post a Custom Project or book a Discovery Scan for a targeted assessment and fix.