Back to Blog

AWS Tagging Governance with IaC: The Hidden Disadvantages of IaC-Only Tag Governance (Part 3)

Part 3 of the IaC series: why strong Terraform/CloudFormation baselines and Tag Policies still leave critical runtime governance gaps.

IaC governance disadvantages

This is part 3 of the IaC tagging governance series.

In part 1 (Terraform), we covered provider defaults, module contracts, and plan-time controls, then highlighted runtime blind spots such as drift tug-of-war, child-resource propagation gaps, and out-of-band creation.

In part 2 (CloudFormation), we covered stack-level and resource-level tagging, cfn-lint/cfn-guard validation, and CloudFormation Hooks for deployment-time enforcement.

Those two parts show that IaC is a strong baseline. This part explains why IaC + Tag Policies alone still fails as a full lifecycle governance model in real production environments.

Enforcing tags natively through Terraform default_tags, CloudFormation controls, and AWS Tag Policies appears to provide an end-to-end model. In practice, relying exclusively on IaC creates operational gaps, enforcement boundaries, and post-deploy governance failures.

Area What IaC + Tag Policies do well Where they fall short
Deployment Consistent tags on resources created through pipelines. Cannot see console/one-off API paths.
Taxonomy Standard keys and value formats across accounts. No guarantee that every runtime resource actually complies.
Drift handling Can detect differences between code and state. Easily devolves into tug-of-war with remediation tools.

Disadvantage 1: Drift Tug-of-War After Deployment

The most disruptive disadvantage of managing tags strictly through static IaC files is configuration drift. In mature cloud environments, organizations frequently deploy reactive automation tools-such as AWS Config automated remediation, Cloud Custodian, or custom Lambda functions-to fix missing tags, append cost-center data, or apply operational metadata after a resource is provisioned.

When these external systems modify a tag, the resource's actual state in AWS diverges from the IaC codebase. The next time a developer runs terraform apply or updates a CloudFormation stack, the IaC tool detects this drift and attempts to forcefully revert the tags back to what is strictly defined in the static file. This creates a perpetual tug-of-war between infrastructure code and governance bots.

To stop Terraform from perpetually destroying and recreating tags, engineers often clutter their configurations with lifecycle { ignore_changes = [tags] } blocks. This severely degrades the integrity of the IaC state file, undermining the "Infrastructure as Code" philosophy.

Operational systems often change tags later:

  • remediation automation
  • incident response scripts
  • migration tooling
  • manual emergency changes

Drift smell

If you see many ignore_changes = [tags] blocks, it usually means IaC is being forced to ignore governance reality instead of working with it.

Disadvantage 2: Service-Created Child Resources Stay Partially Governed

Another major weakness appears when AWS services create additional resources after the initial IaC transaction is already complete. Terraform and CloudFormation can tag the parent resource deterministically, but the service can later create child artifacts through internal control-plane actions that are not fully covered by the original tag contract. This is especially common in autoscaling, managed networking, backups, and restore workflows.

Many AWS services create resources after the IaC transaction:

  • EC2 instances
  • volumes
  • snapshots
  • scaling/replacement artifacts
  • load balancers and their components

Why this becomes a governance problem:

  • the child resource is created outside the original module/template execution step
  • propagation behavior differs by service and by resource type
  • some children require separate explicit tag configuration paths
  • lifecycle events (replace, failover, restore) can generate new untagged artifacts later

Even with clean template/module tagging, child resources can remain partially tagged or untagged, especially when created asynchronously by managed services. The operational impact is serious: cost attribution becomes incomplete, ownership becomes ambiguous, ABAC rules fail unpredictably, and cleanup automation misses orphaned assets because the expected metadata was never present.

Disadvantage 3: Tag Policies Are Guardrails, Not Runtime Governance

AWS Tag Policies are highly valuable for standardizing taxonomy (approved keys, allowed value formats, and naming consistency). They reduce entropy across accounts and help platform teams move from free-text tagging to controlled vocabulary.

However, Tag Policies are not a full lifecycle enforcement engine. They define policy expectations, but they do not automatically guarantee every resource is corrected in real time after creation, replacement, restore, or manual changes.

Tag Policies are valuable for key/value consistency, but they do not replace:

  • create-time enforcement in all flows
  • post-create remediation
  • dynamic context tagging

In practice, mature teams still need:

  • API-level preventive controls for non-pipeline creation paths
  • event-driven remediation for late-created or modified resources
  • periodic coverage scans to detect long-tail noncompliance
  • ownership workflows to resolve policy exceptions quickly

Teams still need controls for resources created outside ideal provisioning paths and for resources that change after initial creation. Without those controls, policy documents exist, but operational compliance remains inconsistent.

Disadvantage 4: Tag Policy Quotas

AWS Organizations Tag Policies are notoriously difficult to scale for large enterprises. AWS limits the maximum size of a single tag policy document to 10,000 characters. Because Tag Policies require you to explicitly list every single resource type you want to enforce, and because many AWS services lack wildcard support (for example, you cannot use *:* to blanket-target all resources), you burn through this character quota incredibly fast.

Including all supported resource types for just a single tag consumes over 4,600 characters. Consequently, even with the 10k limit, you can realistically only fit about two mandatory tags per policy document. While AWS allows up to 10 policies per Organizational Unit (OU), hitting a hard cap of ~20 enforced tags across an entire enterprise creates a severe architectural bottleneck for mature FinOps practices.

Small example (valid pattern) that shows why policy management becomes heavy at scale.

Goal:

  • require two tags across the organization: CostCenter and Project
  • keep different allowed values per account

To do this, you typically define required keys at the Organization root, then attach separate account-level policies for values.

Organization root tag policy (required keys only):

JSON
{
    "tags": {
        "CostCenter": {
            "tag_key": {
                "@@assign": "CostCenter",
                "@@operators_allowed_for_child_policies": ["@@none"]
            }
        },
        "Project": {
            "tag_key": {
                "@@assign": "Project",
                "@@operators_allowed_for_child_policies": ["@@none"]
            }
        }
    }
}

Account tag policy (example for one account values):

JSON
{
    "tags": {
        "CostCenter": {
            "tag_value": {
                "@@assign": [
                    "Production"
                ]
            }
        },
        "Project": {
            "tag_value": {
                "@@assign": [
                    "A"
                ]
            }
        }
    }
}

In real environments, you must repeat similar account-level policies for each account/OU that needs different allowed values. This quickly increases policy maintenance overhead and consumes policy size/quota budget.

Disadvantage 5: The "Missing Tag" Loophole and SCP Breakages

Perhaps the most glaring architectural flaw in AWS Tag Policies is how they handle omission. If a user creates an AWS resource without providing any tags at all, the tag policy is frequently bypassed entirely. Tag policies generally only trigger if the tag key is physically present on the payload but the value is non-compliant.

To force builders to include tags at creation time, organizations are forced to rely on aggressive Service Control Policies (SCPs). A platform team might write a Deny SCP for actions like ec2:RunInstances if specific tags are absent. However, this routinely breaks IaC deployments. Many AWS services and third-party modules provision resources first, and then make a secondary API call to apply tags later. The SCP blindly blocks the initial creation, causing the entire Terraform or CloudFormation pipeline to crash.

Furthermore, some resources-like SageMaker notebooks-cannot be tagged at creation time at all, requiring retroactive tagging efforts that inherently violate strict SCPs.

Disadvantage 6: Out-of-Band Resource Creation Bypasses IaC Contracts

Even if your Terraform/CloudFormation pipelines are strict, not all infrastructure is created through those pipelines. Engineers still use console workflows, one-off CLI scripts, SDK automations, and emergency operational actions. These paths bypass module contracts, policy checks in CI, and code review controls by design.

Common out-of-band sources:

  • manual console creation during incidents
  • temporary migration scripts and ad-hoc SDK utilities
  • third-party tools creating resources directly via API
  • service-side operations triggered by external control planes

Without event-based controls, these resources can stay untagged (or incorrectly tagged) until periodic audits catch them. That delay creates real operational risk.

Disadvantage 7: Overcorrection via Strict Deny Policies

After seeing drift and out-of-band gaps, many organizations overcorrect by writing broad Deny policies for any create call that is missing required tags. This feels safe, but in practice it often blocks valid deployments because some services apply tags in follow-up calls, and some resources support partial or delayed metadata at creation.

Why strict deny-only models become unstable:

  • deployment behavior differs by AWS service and API path
  • some managed workflows create then tag
  • transient race conditions can fail otherwise healthy releases
  • teams respond by adding many exceptions, weakening policy clarity

Aggressive deny policies therefore shift governance pain into delivery pipelines. Instead of consistent compliance, teams get failed releases, emergency policy bypasses, and recurring conflict between security and platform engineering.

What These Disadvantages Cause in Practice

  • lower cost allocation coverage
  • unknown owner incidents
  • ABAC access failures from missing tags
  • slower remediation because source of truth is fragmented
  • repeated policy exceptions that reduce trust in governance controls

Better Model: Layered Governance

Use IaC and Tag Policies as baseline, then add:

  1. central rules for calculated/normalized tags
  2. event-driven enforcement for out-of-band creation
  3. scan-based remediation for residual drift
  4. KPI tracking to measure sustained control

Final Takeaway

Part 1 and part 2 established the same core pattern in two different IaC systems: great deployment-time control, incomplete lifecycle control.

IaC and Tag Policies are necessary building blocks, not a complete governance runtime. The real gap is lifecycle continuity: what happens after deployment and outside deployment pipelines.

Key takeaway

AWS-native tools define “how things should look on paper”; you still need a continuous runtime layer to keep reality aligned with that design.

Seeing drift despite strong IaC discipline?

TagOps helps teams close runtime governance gaps without breaking delivery velocity.

×