prepme.io

Terraform Interview Questions and Answers

This guide is for DevOps, platform, SRE, and cloud engineers preparing for an interview where infrastructure as code with Terraform is on the table — whether that is a junior screen on HCL syntax or a senior round on state architecture, module design, and CI pipelines. You will get the concepts interviewers actually probe, the commands and trade-offs that separate a real practitioner from someone who has only run terraform apply on a tutorial, and a bank of real interview questions with model answers.

Terraform questions cluster around four areas: the language (HCL, variables, outputs, data sources), state (the single hardest concept to get right in production), composition (modules, workspaces, multi-environment layout), and operations (drift, import, providers, and how plan/apply behaves inside CI/CD). We cover all four, keep the commands current with recent Terraform releases, then show how to prepare by actually breaking and fixing infrastructure rather than memorizing definitions.

Where Terraform shows up in DevOps and platform interviews

Almost every DevOps, platform, and cloud-infrastructure interview has a Terraform component, because IaC is how modern teams provision and version their cloud. At the junior level you will be asked to explain the language and the basic workflow: what a resource is, the difference between terraform plan and apply, and why you commit configuration to git. At mid-to-senior level the questions shift to judgment — how you structure state for ten teams, how you keep environments from clobbering each other, how you recover from a corrupted state file, and how you wire terraform into CI without letting two pipelines apply at once.

Interviewers use Terraform as a proxy for whether you understand declarative, idempotent systems. A strong candidate talks about the desired-state model: Terraform compares your configuration to recorded state to the real provider API, then computes the minimal set of create/update/delete actions. That mental model — config, state, real world, and the diff between them — is the through-line behind state management, drift, and import questions, so anchor your answers to it.

  • Junior: HCL syntax, resources and providers, plan vs apply, variables and outputs.
  • Mid-level: remote state, locking, modules, data sources, environment structure.
  • Senior: state architecture at scale, drift remediation, import strategy, CI/CD safety, provider/version pinning, blast-radius control.

HCL fundamentals: resources, variables, outputs, data sources

HCL (HashiCorp Configuration Language) is declarative — you describe the end state, not the steps. The core building block is the resource block, which maps to a single object the provider manages (an EC2 instance, an S3 bucket, a Kubernetes namespace). Each resource has a type and a local name, and you reference its attributes elsewhere with aws_instance.web.id; those references are what build Terraform's dependency graph. You rarely need explicit ordering, but depends_on exists for the cases where a dependency is not expressed through an attribute reference.

Variables parameterize a configuration (input), outputs expose computed values (output, often consumed by other modules or printed after apply), and locals name intermediate expressions to avoid repetition. Data sources are read-only lookups — they fetch information about existing infrastructure you did not create, such as the latest AMI or an existing VPC, without managing it. A frequent point of confusion interviewers test: a resource creates and owns an object; a data source only reads one. Knowing count and for_each (and why for_each with a map or set is usually preferable, because each instance is keyed by a stable identifier instead of a numeric position, so removing a middle element does not shift indices and force re-creation) signals real fluency.

  • resource — creates and manages an object; its attributes feed the dependency graph.
  • variable / output / locals — inputs, exposed values, and named intermediate expressions.
  • data source — read-only lookup of infrastructure Terraform does not manage.
  • for_each over count when iterating — stable keys avoid spurious re-creation when the collection changes.

State management: remote state, locking, and why state matters

State is the most important — and most failure-prone — concept in Terraform. The state file (terraform.tfstate) is Terraform's record of which real-world objects map to which resource blocks, including their last-known attributes. Without it Terraform could not tell the difference between create, update, and delete; it would not know the ID of the EC2 instance it already made. This is why losing or corrupting state is so painful: Terraform forgets it owns infrastructure and may try to recreate it.

In any team setting state must be remote, not on a laptop. A remote backend (S3, Terraform Cloud/HCP, GCS, azurerm) stores state centrally so everyone plans against the same truth, and provides state locking so two engineers — or two CI runs — cannot apply concurrently and race each other into corruption. On the S3 backend, native lockfile locking (use_lockfile = true, GA since Terraform 1.10/1.11) is now the default approach; the older pattern paired S3 with a DynamoDB table, but the dynamodb_table argument is deprecated. Mentioning the native lockfile is a quick way to show your knowledge is current rather than several versions old.

State can contain secrets in plaintext (a database password attribute, a generated key), so remote backends should be encrypted at rest and access-controlled, and tfstate should never be committed to git. Know the recovery tools cold: terraform state list and state show to inspect, state mv to refactor without destroying, state rm to drop a resource from state without touching the real object, and how to surgically fix state rather than blow it away.

  • State maps config to real objects — it is how Terraform knows whether to create, update, or destroy.
  • Remote backend + locking is mandatory for teams; locking prevents concurrent applies from corrupting state.
  • S3 backend: native lockfile (use_lockfile) is the modern default — DynamoDB-table locking is the deprecated legacy pattern.
  • State may hold secrets in plaintext — encrypt at rest, restrict access, never commit tfstate to git.
  • Surgical tools: state list, state show, state mv (refactor), state rm (forget without deleting), import (adopt existing).

Modules, workspaces, and multi-environment structure

A module is just a directory of .tf files you can call from elsewhere with the module block, passing inputs and reading outputs. The root module is your entry point; child modules are reusable components (a 'vpc' module, an 'eks-cluster' module). Good module design is a senior signal: keep them small and composable, expose a tight set of variables, version them (a git ref or a registry version), and avoid baking environment-specific values inside. DRY matters, but over-abstraction — a single module with thirty toggle flags — is a real anti-pattern interviewers like to surface.

For multiple environments there are two common approaches, and you should be able to argue between them. Terraform workspaces give you multiple named state files from one configuration (terraform workspace new staging), which is lightweight but risky: a single misstep can apply prod changes from the wrong workspace, and configuration cannot truly diverge per environment. The more common production pattern is directory-per-environment — separate root modules (envs/prod, envs/staging) each with their own backend/state and tfvars, calling shared child modules. This gives stronger isolation and blast-radius control at the cost of some duplication. Workspaces shine for ephemeral, identical copies (per-developer or per-PR stacks), not for prod-vs-staging divergence. Note that CLI workspaces are distinct from HCP Terraform / Terraform Cloud workspaces, which are a heavier, separately-configured unit — interviewers occasionally probe that distinction.

  • Module = a directory called with inputs/outputs; version it and keep it composable, not a mega-module of flags.
  • Workspaces = multiple state files from one config — good for ephemeral identical copies, risky for prod/staging.
  • Directory-per-environment = separate root + backend per env — stronger isolation, the common production layout.
  • tfvars files supply per-environment values; never hardcode environment specifics inside shared modules.

Drift, import, providers, and the plan/apply lifecycle in CI

Drift is when the real infrastructure no longer matches state — someone clicked in the console, an autoscaler changed a value, or another tool edited the resource. terraform plan detects drift by refreshing state against the provider and showing the difference; the fix is either to update your config to match reality or re-apply to push the resource back to your declared state. To inspect drift without proposing config changes, run terraform plan -refresh-only (and apply -refresh-only to reconcile state) — the standalone terraform refresh command is deprecated because it rewrote state silently with no review step.

terraform import brings an existing, unmanaged object under Terraform control. The modern, reviewable form is the import block (Terraform 1.5+): you declare the resource address and real ID in config, and terraform plan -generate-config-out=generated.tf can even scaffold a starter resource block for you to clean up. The classic gotcha applies mainly to the older CLI command (terraform import ADDRESS ID), which only writes the object into state and generates no configuration — so if you forget to author a matching resource block, the next plan proposes to destroy what you just imported.

Providers are the plugins that translate HCL into API calls (aws, google, kubernetes, helm). Pin provider and Terraform versions with required_providers and a committed .terraform.lock.hcl so every machine and CI runner resolves identical versions — version drift between a laptop and the pipeline causes maddening, non-reproducible plans. In CI the safe pattern is: terraform fmt -check and validate on every PR, terraform plan -out=tfplan as a reviewable artifact on the PR, and terraform apply tfplan only after merge/approval, applying that exact saved plan so what was reviewed is what runs. Remote state locking is what keeps two pipeline runs from applying simultaneously; -auto-approve belongs only behind a gated, single-writer job.

  • Drift = real world diverges from state; plan (or plan -refresh-only) reveals it, then reconcile config or re-apply. Standalone terraform refresh is deprecated — use apply -refresh-only.
  • import adopts existing resources into state — modern import blocks can scaffold config via -generate-config-out; the legacy CLI import writes only state, so author matching config or it gets destroyed.
  • Pin versions: required_providers + committed .terraform.lock.hcl for reproducible plans across laptop and CI.
  • CI lifecycle: fmt/validate on PR, plan -out as a reviewed artifact, apply the saved plan post-approval, rely on state locking for single-writer safety.

Common interview questions & answers

What is the Terraform state file and why does it matter?

State is Terraform's record of the mapping between resource blocks in your configuration and the real objects they manage, along with their last-known attributes. It is how Terraform decides whether each resource needs to be created, updated, or destroyed on the next apply — without it, Terraform cannot tell that the EC2 instance it already provisioned exists. Because state can also contain sensitive attributes in plaintext, it must be stored in an encrypted, access-controlled remote backend rather than on a developer's laptop or in git.

What is the difference between terraform plan and terraform apply?

plan refreshes state against the provider, compares it to your configuration, and prints the set of actions Terraform would take — what it would create, change, or destroy — without making any changes. apply executes those actions. The safe production pattern is to run plan -out=tfplan to capture a reviewable artifact, then apply that exact saved plan, so what is reviewed is precisely what runs and nothing changes in between.

How do you manage remote state for a team, and why is locking important?

Use a remote backend such as S3, Terraform Cloud/HCP, GCS, or azurerm, so everyone and every CI run plans against the same shared state. Locking ensures only one apply can mutate state at a time; without it, two concurrent applies can race and corrupt the state file, leaving Terraform's record inconsistent with reality. On the S3 backend the current approach is native lockfile locking (use_lockfile = true, GA since Terraform 1.10/1.11) — the older S3-plus-DynamoDB pattern still works but the dynamodb_table argument is now deprecated. The backend should also be encrypted at rest and access-restricted because state can hold secrets.

What is the difference between a resource and a data source?

A resource block creates, updates, and owns an object — Terraform manages its full lifecycle including deletion. A data source is a read-only lookup that fetches information about infrastructure Terraform does not manage, such as the most recent AMI or an existing VPC, so you can reference its attributes without taking ownership. Using a data source is how you wire your config to objects another team or process controls.

When would you use count versus for_each?

Both create multiple instances of a resource, but count uses a numeric index while for_each iterates over a map or a set of strings keyed by stable identifiers. for_each is usually preferable because the resources are addressed by key rather than position: if you remove an element from the middle of a count list, every subsequent index shifts and Terraform destroys and recreates those resources. for_each keeps each instance pinned to its key, so removing one does not disturb the others.

How do you bring an existing resource that was created manually under Terraform management?

Use Terraform's import. The modern, reviewable approach (Terraform 1.5+) is an import block that declares the resource address and its real ID, planned like any other change; you can even run terraform plan -generate-config-out=generated.tf to scaffold a starter resource block. The older terraform import ADDRESS ID command only writes the object into state and generates no configuration, so with that route you must hand-author a matching resource block — otherwise the next plan sees config that says nothing should exist and proposes to destroy the resource you just imported.

What is drift and how do you detect and remediate it?

Drift is when the real infrastructure no longer matches Terraform's state — typically from a console edit, another tool, or an autoscaler changing a value. terraform plan detects it by refreshing state against the provider and showing the difference, and terraform plan -refresh-only lets you inspect drift without proposing config changes. You remediate by deciding the source of truth: update your configuration to match the real change, re-apply to revert the resource to your declared state, or run terraform apply -refresh-only to reconcile state if the object was legitimately changed out of band. Note the standalone terraform refresh command is deprecated in favor of the -refresh-only flags, which show changes before writing them.

How should you structure Terraform for multiple environments like dev, staging, and prod?

The common production approach is directory-per-environment: separate root modules (for example envs/prod and envs/staging), each with its own backend and state and tfvars, all calling shared versioned child modules. This gives strong isolation and limits blast radius. Workspaces are an alternative that produce multiple state files from one configuration, but they are risky for prod-versus-staging because a single mistake can apply the wrong environment and the config cannot truly diverge — workspaces fit ephemeral, identical copies like per-PR stacks better.

How do you handle secrets in Terraform?

Avoid hardcoding secrets in .tf files or committing them; mark variables and outputs as sensitive so they are redacted in plan and CLI output, and source secret values from a manager like Vault, AWS Secrets Manager, or SSM via data sources at runtime. Critically, remember that resolved secret values still land in the state file in plaintext, so the real protection is an encrypted, access-controlled remote backend rather than the sensitive flag alone.

What is the .terraform.lock.hcl file and why pin provider versions?

The dependency lock file records the exact provider versions and checksums Terraform selected, so every machine and CI runner resolves identical providers. You declare acceptable ranges in required_providers and commit the lock file. Pinning prevents version drift between a developer laptop and the pipeline, which is a common cause of non-reproducible plans where the same code produces different diffs in different places.

How would you run Terraform safely in a CI/CD pipeline?

Run terraform fmt -check and validate on every pull request, generate terraform plan -out=tfplan as an artifact for human review on the PR, and only run terraform apply tfplan after approval and merge, applying the exact saved plan so reviewed equals executed. Rely on remote state locking to enforce a single writer, restrict apply to a gated job rather than scattering -auto-approve, and inject credentials through short-lived CI identities (such as OIDC) rather than long-lived static keys.

What do terraform state mv and terraform state rm do, and when would you use them?

state mv renames or moves a resource within state — essential when refactoring, such as moving a resource into a module, so Terraform updates its bookkeeping instead of destroying and recreating the object. (For refactors expressed in config, moved blocks are the modern, reviewable equivalent.) state rm removes a resource from state without touching the real infrastructure, which you use when you want Terraform to forget an object — for example before handing it to another configuration. Both edit state surgically, which is far safer than deleting the whole state file.

What does terraform taint (or the modern -replace flag) do?

It marks a resource for forced re-creation on the next apply even though its configuration has not changed, useful when a resource is in a bad runtime state that Terraform cannot detect — a corrupted instance or a failed bootstrap. Modern Terraform prefers terraform apply -replace=ADDRESS over the older, deprecated taint command because the replacement is shown in the plan and reviewed before it runs, rather than being a hidden state mutation.

How does Terraform decide the order in which to create resources?

Terraform builds a dependency graph from the references between resources — when aws_instance.web references aws_subnet.main.id, it knows the subnet must exist first — and creates independent resources in parallel. You almost never specify order manually; for the rare dependency that is not expressed through an attribute reference, depends_on adds an explicit edge. Understanding that the graph, not file order, drives execution is the key insight.

What is the difference between Terraform and a configuration management tool like Ansible?

Terraform is a provisioning tool built around declarative, immutable infrastructure — it creates and manages the existence of cloud objects and tracks them in state. Configuration management tools like Ansible focus on configuring software on already-existing machines and are often procedural and mutable. They are complementary: a common pattern is Terraform to provision the servers and network, then a configuration tool or cloud-init to install and configure software on them.

Practice this for real, from your target job

Reading about it only gets you so far. Paste a job description into prepme and get hands-on k3s/Terraform labs, auto-graded exams, and an architecture round — generated for that exact role and scored 0–100. Generating a briefing is free.

FAQ

What level of Terraform knowledge do interviewers expect?+

It depends on seniority. Junior roles expect comfort with HCL, the plan/apply workflow, and basic variables and outputs. Mid-level adds remote state, modules, and environment structure. Senior roles probe judgment: state architecture at scale, drift and import strategy, CI safety, and version pinning. Match your prep depth to the role described in the job posting.

Should I memorize Terraform commands or focus on concepts?+

Concepts first, but know the key commands cold because interviewers ask you to walk through real scenarios. You should be able to explain when to reach for state mv versus state rm, an import block versus a fresh resource, or plan -out versus apply -auto-approve. Understanding why each command exists is what turns a memorized list into a convincing answer.

How do I practice Terraform for an interview without a cloud account?+

The most effective practice is hands-on: provision real infrastructure, intentionally break it, and fix it, so the workflow becomes muscle memory. prepme generates Terraform labs from a real job description and runs them as live containers in your browser, so you can practice the plan/apply lifecycle and state operations without setting up your own AWS account or risking a bill. Each lab is AI-graded out of 100 with feedback.

Is Terraform vs OpenTofu likely to come up, and how should I answer?+

It can come up as a current-events or judgment question. OpenTofu is the community-driven, open-source fork of Terraform created after HashiCorp moved Terraform to the Business Source License; it is largely compatible with Terraform configurations and tracks the same HCL and core workflow. A good answer notes the licensing background, that HCL and the plan/apply workflow carry over, and that the choice often comes down to licensing posture and ecosystem rather than day-to-day usage — while flagging that the two are now diverging on some newer features, so version compatibility is worth checking.

How is Terraform usually tested in an interview — verbal or hands-on?+

Both. Many interviews start with verbal questions on state, modules, and the lifecycle, then add a practical component: reading a plan output and explaining what it will do, debugging a broken configuration, or designing a multi-environment layout on a whiteboard. Practicing against a real plan and apply, plus an architecture round, prepares you for both formats.

What is the single most common Terraform interview mistake?+

Underestimating state. Candidates often explain syntax fluently but stumble on what state is, why it must be remote and locked, and how to recover when it is corrupted or out of sync. Because state is the heart of how Terraform works, a shaky answer here undercuts an otherwise strong interview — make sure you can discuss remote backends, locking, secrets in state, and the surgical state commands.

Related guides