Terraforming 1Password

jbergknoff · on Jan 27, 2018

Great post. It's always good to see more examples of people putting these tools to work.

With that said, I consider getting the AMI id dynamically to be an anti-pattern which undermines the principles of infrastructure-as-code. Specifically, it introduces an implicit build variable "time of `terraform apply`", which is not tracked in version control. Happily, because of Terraform's design, this sort of thing mostly won't cause your infrastructure to drift into unexpected states (e.g. production instance 1 running AMI X and production instance 2 running AMI Y). Within an environment, things should be consistent, but your staging environment may run AMI X while production is running AMI Y, and you wouldn't know from looking at your Terraform definitions.

I previously wrote about similar ideas in the context of pinning dependency versions, where wildcards can and often do get you into bad states. https://jonathan.bergknoff.com/journal/always-pin-your-versi...

juliangamble · on Jan 28, 2018

Interesting point. Here is the counter-argument.

With a Continuous Delivery Tool Like Go-CD (Not Continuous Integration) - you solve this problem with connected pipelines (value-streams.)

In this case - the ami-id is resolved dynamically, but the particular version comes from the pipeline, is entirely repeatable and traceable (and dare I say - immutable).

The scenario being that the first pipeline builds the ami, then stores the ami ID as a value/variable/text-file that is then passed on to the terraform apply pipeline. You can repeat the terraform apply with the same ami-id. You can run a new ami build, get a new terraform apply with the ami from that build.

With respect sir - I believe there is room for more nuance in your claim that 'dynamic ami-ids in terraform are an anti-pattern' - this is a solveable problem. This has been solved.

jasonlotito · on Jan 28, 2018

We just store the AMI in a terraform file itself, and update it as needed. I mean, if you are already updating a text file, might as well update a a text file called ami.tf with the new AMI ID and store that in source control as part of the build process. This ensures that the terraform plan is an easy to read source of truth for what is supposed to be up in production.

_ix · on Jan 27, 2018

This is a really important point. Our company is using Terraform in a limited way, but Puppet is our primary automated configuration management tool. Similarly in Puppet, `ensure => latest,` on package resources isn't necessarily dangerous, but it can add a lot of confusion without intentional commits against the control repositories.

jo909 · on Jan 27, 2018

Puppet rules are typically applied constantly in short intervals. The installed version should always be the same everywhere, the latest available from the repository. The risk is more that you can get updates installed at inopportune times. Of course what's right is that the version change is not reflected in the configuration management, but this isn't normally a problem for minor version changes.

_ix · on Jan 27, 2018

I think the default settings for puppet agents are typically 30 minutes if I'm not mistaken. I've inherited a bit of a DevOps mess with some 600-700 nodes in various states of management.

In previous positions, there was a great hew and cry that the run intervals were increased from 30 minutes to 60 minutes... eventually every four hours for production resources.

In my current position, production nodes are provisioned to run puppet once daily as a rule, triggered by cron jobs at a pseudorandom minute between 0200 and 0400 defined at server provisioning time.

How do these intervals compare to what you've seen?

jo909 · on Jan 27, 2018

Well I've seen many environments, and some only ran puppet manually whenever they would before had used ssh and made a change directly. Another allowed one group of admins to just continue to log in and make small config changes, then got alerts and diffs of those changes from running puppet in noop and incorporated those changes into their git after the fact.

It's a tool to be used however you see fit and what you describe sounds reasonable, but the most common (because default) setup I've seen is the 30 minute interval.

Which is why I would argue "time of last puppet run" is mostly fairly recent and consistent for all machines, but terraform apply is most often not run automatically, not even at daily intervals.

ComputerGuru · on Jan 27, 2018

> The installed version should always be the same everywhere,

Agreed.

> the latest available from the repository.

Highly controversial statement. It depends on the policy of the pancake maintainer. I’ve seen too many subtle bugs introduced by changes in configuration file behavoriour, new defaults, etc.

I would say if you can guarantee “latest version” means “this version plus security patches” or the maintainer is absolutely pedantic about semantic versioning, have at it. Otherwise, consider the pros and cons of stability vs being up-to-date, and make a judgement call accordingly.

snuxoll · on Jan 28, 2018

You should be mirroring repositories locally and pushing out new copies after they've been thoroughly tested if this is a concern to you. This is where tools like Katello/Red Hat Sattelite shine, you take snapshots of your upstream repositories and promote them through your various lifecycle environments to test the packages before they even hit your production systems.

mi100hael · on Jan 28, 2018

AMIs in general are a pain in the ass in AWS, though. If you're using custom-built encrypted AMIs in multiple accounts, the easiest way is to build the AMI once in each account, which then leaves you with a different AMI ID in each account. So now your template with hardcoded ID that works in one account won't work in another.

See: https://github.com/hashicorp/packer/issues/4772

Since you're probably also rebuilding AMIs on a regular basis to keep up with upstream updates/patches, your best bet is to use the latest version of an AMI in your template and be sure that you're also pruning old/incompatible AMIs from accounts regularly.

geerlingguy · on Jan 27, 2018

The code comparison between CloudFormation’s abysmal JSON formatting and Terraform’s DSL is a bit disingenuous.

CloudFormation has supported YAML for at least a year or two now, and it’s leagues more readable and compact, not to mention maintainable—you can even add comments to your code with YAML (something that is impossible with the old JSON format).

I’ve spent a lot of time working between the two, and while Terraform does have a lot to offer and is often a very valid option, CloudFormation has its virtues, especially the fact that almost all bleeding edge AWS features are first available to be managed via CloudFormation (sometimes with weeks or months of lead time), and many bugs can be more readily ironed out with AWS support (assuming you have it).

Again, not saying don’t use Terraform, just that I don’t think the decision is quite as black and white as this blog post seems to make it.

djhworld · on Jan 27, 2018

Personally I find YAML for cloudformation worse than JSON because of the whitespacing requirements. JSON isn't much better mind, especially once your template reaches to hundreds of lines with nested objects.

Recently I've come around to using Troposphere [1] to write cloudformation templates, it's actually very pleasant to use. You just write your infrastructure in python, and it will generate you a template at the end. The developers seem to respond quickly to changes AWS make to the CF templating language too.

[1] https://github.com/cloudtools/troposphere

cle · on Jan 27, 2018

I'm sorry, what? You don't like YAML because it uses whitespace as a delimiter instead of curly braces? That is probably the least important feature that YAML adds for CF templates.

YAML, despite its warts, is much more readable and maintainable for CF templates than JSON, particularly when you are doing non-trivial things and need to use a lot of intrinsic functions and string manipulation. Or you want to put comments in your template.

I've worked on huge JSON and huge YAML CF templates (I'm talking templates that are thousands of lines long--in YAML). YAML is without a doubt easier to maintain.

I can't recommend any CF libraries for generating templates either, unless you want to wait around for new CloudFormation features to be implemented or suffer from half-broken existing implementations (or waste time hunting down bugs and submitting patches). Sometimes it makes sense to use a template engine like Jinja or ERB. But I'd stay away from libraries that generate CF templates--they're mired with missing edge cases and they're mostly an unnecessary dependency.

MichaelRenor · on Jan 27, 2018

One supports inline comments, the other doesn’t. That should be enough for anyone to choose one over the other.

sbennettmcleish · on Jan 29, 2018

Absolutely agree! Have been smashing out YAML templates for 12mths with pleasure (3yrs of JSON templates beyond that), comments are the winning proposition, allows neat sectioning of various parts.

geerlingguy · on Jan 27, 2018

Doesn’t Python also have some pretty strict whitespace requirements?

arusahni · on Jan 27, 2018

+1 for Troposphere. It reduces the noise from your CF template while bringing the flexibility and ease-of-use of Python. I've also discovered that it's far easier to get new developers spun up on than Terraform.

GhostVII · on Jan 27, 2018

JSON is a subset of yaml, so if you want to just use JSON it should still work.

eropple · on Jan 27, 2018

> The code comparison between CloudFormation’s abysmal JSON formatting and Terraform’s DSL is a bit disingenuous. CloudFormation has supported YAML for at least a year or two now, and it’s leagues more readable and compact, not to mention maintainable—you can even add comments to your code with YAML (something that is impossible with the old JSON format).

There are also decent tools atop CloudFormation for using an actual programming language instead of whatever HCL wants you to think it is. The Cfer[0] project that I contribute to (and use as the underpinnings for the Auster[1] cloud workflow tooling) is the thinnest possible wrapper around CloudFormation that we could come up with. But it just by existing lets us do things like...y'know...if statements and loops...without breaking our backs. Code reuse is better than whatevery module reuse, too; I wrote a gem, which this reminds me that I need to open source, that lets me roll out a standardized three-tier network of varying size (important because AWS tends to work best with /24 subnets but hey, you might want more than three of them per tier) without really thinking about it too hard.

(Back when Terraform was new, I tried writing a halfway decent DSL on top of it; turns out "it's really just JSON under the hood" was untrue and no testing had been done on that path. I assume it has since. But HCL still exists, and HCL is still pretty awful.)

I would say not to use Terraform if you're using only AWS, because Terraform has a nice habit of hosing your state when you look at it funny and I've had it literally regress to the point of making states in version X unreadable in version X+1. But those issues are separate from the clunky DSL.

[0] - https://github.com/seanedwards/cfer

[1] - https://github.com/eropple/auster

wgjordan · on Jan 28, 2018

> The Cfer project that I contribute to [...] is the thinnest possible wrapper around CloudFormation that we could come up with.

I've found preprocessing CloudFormation YAML using a standard template language (e.g., ERB) to be the thinnest possible wrapper around CloudFormation for providing if statements and loops without breaking our backs. Cfer looks nice and lean, but it still adds another domain-specific language on top of the stack while ERB is part of Ruby's existing standard library.

Granted, using an existing template preprocessor or a lightweight DSL can both work well and I think it's largely a matter of preference as to which feels thinner/easier to work with.

I agree with your assessment of HCL, a range of tooling choices for CloudFormation are largely possible because CloudFormation runs on standard JSON/YAML documents rather than the domain-specific, vendor-specific Hashicorp Configuration Language that lacks as robust tooling and support across languages/IDEs.

eropple · on Jan 28, 2018

It is a DSL, but it's literally as thin as it gets. =) `method_missing` calls define properties. I'd take YAML+ERB over HCL for sure, though. And Cfer treating everything as objects gives us a really simple packaging mechanism for reuse--they're just gems.

Objects just work way better for reuse and aggregation than text, IME.

solatic · on Jan 27, 2018

> CloudFormation has its virtues

Which seem to mostly revolve around AWS vendor lock-in. Which, you know, can be a valid business decision - accepting vendor-exclusive tooling for better vendor support to try and simplify initial deployment and lower near-term, initial ramp-up costs - but it has its downsides too, like losing control of costs over the long term and making advanced work (like multi-cloud deployments to improve reliability) more fragile, difficult, and sometimes de-facto impossible.

Seems to me like the question of CloudFormation vs. Terraform has more to do with whether the business makes a strategic decision to allow for vendor lock-in or not, rather than mostly aesthetic discussions over the merits of YAML vs HCL.

wgjordan · on Jan 27, 2018

> Seems to me like the question of CloudFormation vs. Terraform has more to do with whether the business makes a strategic decision to allow for vendor lock-in or not, rather than mostly aesthetic discussions over the merits of YAML vs HCL.

While I agree that CloudFormation is a vendor-specific tool which makes Terraform the obvious choice to avoid vendor lock-in more generally, in this particular case, the author is fully entrenched in the AWS ecosystem without any hint of desire to avoid lock-in, and gave as reason #1 for his decision a mostly aesthetic discussion over the merits of YAML vs HCL.

geerlingguy · on Jan 27, 2018

I’m assuming someone making the decision of CF or TF would already have sold their souls to AWS at that point.

querulous · on Jan 27, 2018

unless your vendor goes out of their way to be compatible with aws apis no tooling is really portable. true that you can write terraform targetting both aws and gce, but there's no non-trivial terraform plans you can apply to both aws and gce

roustem · on Jan 27, 2018

I am sorry but I just could not grasp YAML for some reason. I am always confused by its indentation and maps vs lists notation.

_ix · on Jan 27, 2018

I am not ashamed to admit that I felt the same way for a time. I don't know when things changed, but it started to click recently, and I actually like yaml. I guess that it helps that it has become rather ubiquitous in our build tools between cloud-init, puppet hieradata, and ansible roles/playbooks.

What hasn't been helpful is what others here and elsewhere have reminded me time and time again... "JSON is a subset of yaml"

chucky_z · on Jan 28, 2018

The documentation is actually quite good!

http://yaml.org/spec/1.2/spec.html

They provide a huge amount of examples, and YAML is actually I think a lot more powerful than most folks realize.

sbennettmcleish · on Jan 29, 2018

Initially I was that way too, only took a couple weeks and a few reference templates that demonstrated the constructs to get past that and haven't done JSON since. Definitely more efficient to develop with.

solidsnack9000 · on Jan 27, 2018

The real advantage of CloudFormation comes in combining it with the AWS APIs -- you can define your infrastructure as code in Python, NodeJS, Java, Ruby, &c.

* troposphere (Python): https://github.com/cloudtools/troposphere

* cloudform (JavaScript, TypeScript): https://github.com/bright/cloudform

Driving infrastructure with a general purpose programming language actually works pretty well in practice.

As things have developed, I've come to question infrastructure-specific external DSLs, like Salt, Terraform, even Chef to some extent. Infrastructure is not any less programmable than payments, machine learning, graphics, app servers. Why can't the tooling be provided as libraries, consumable from ordinary programming languages?

roustem · on Jan 27, 2018

I believe there was a case last year when one of the new AWS features was available in Terraform before it made it to CloudFormation :)

wgjordan · on Jan 27, 2018

> I believe there was a case last year when one of the new AWS features was available in Terraform before it made it to CloudFormation :)

Yes, this has been known to happen, here is one example: https://stackoverflow.com/a/42142791/2518355 - this AWS feature was added Sep 21 2016, Terraform resource released support on May 11 2017, CloudFormation resource released support on Jun 6 2017.

However, several points should be added to this comparison:

- CloudFormation supports Custom Resources so you can always implement new AWS features yourself with a few lines of JavaScript, without waiting for the official resource implementation to be published. - CloudFormation resources have extensive documentation, and are officially maintained + supported by AWS.

oblio · on Jan 27, 2018

It's still the case. I can't find it on mobile, but I asked a StackOverflow question and later answered it myself after discovering something like this buried in the AWS documentation.

oblio · on Jan 27, 2018

Yup, found it: https://stackoverflow.com/questions/45858031/how-to-get-elas...

Terraform supports it, CloudFormation doesn't...

Thaxll · on Jan 27, 2018

CloudFormation is really bad compared to TF.

oblio · on Jan 27, 2018

Because...?

rgoodwintx · on Jan 27, 2018

Side note, for anyone wondering like I was, the visualization tool is Cloudcraft: https://cloudcraft.co/ . I'd love to have something similar for doing isometric views of any kind of diagramming.

smsm42 · on Jan 27, 2018

I am not sure I get the point of making it (fake-)3D. Except for all text to be diagonal and me having to tilt my head to read it, and cute pseudo-3D pictures of variously formed boxes, I don't see any advantage. It's still essentially 2D - there's no third dimension I could usefully explore - it's just presented in a visually cute but informationally cluttering way. Is there any advantage to this form of presentation?

epistasis · on Jan 27, 2018

It's funny, I came to the comments to specifically complain about that graphic. There are significant disadvantages to the isometric view for that data, and absolutely no advantage that I can see, other than it looks pretty if you're not trying to get information out of it.

I think that Tufte needs to make a resurgence with this generation of designers.

sneak · on Jan 27, 2018

Take it up with BeOS? :D

epistasis · on Jan 27, 2018

Hah, I think icons are the place isometric views work really well, actually! Lots of room for highly differentiated objects with an isometric view. However, text and interconnections and maps are really not great at these isometric angles. And I do really like the icons used in the figure in the original post, just not the whole.

paulddraper · on Jan 28, 2018

AWS has sometimes used in their diagrams, so everyone wants it when creating theirs. I work on https://www.lucidchart.com (which has "only" 2D diagrams) and it's a common request.

You're correct that 2D makes for greater readability and information density.

learn_more · on Jan 27, 2018

Have a look at https://www.pathwaysystems.com/video/#racks1

The video is a bit slow because it's meant to be instructional. There are other videos on the same page which show how the model (graph) can be viewed logically instead of isometrically (what we call spatially).

I'm the CTO, happy to answer questions.

shaunparker · on Jan 27, 2018

The product looks pretty interesting, but I don’t see any pricing. What would it cost for a single user?

learn_more · on Jan 27, 2018

Our marketing is currently geared towards teams, so our pricing starts at 12k USD/yr for a 10 user subscription. We haven't yet rolled out official individual pricing so I don't want to jump the gun here by announcing prematurely, but if you start a free trial and mention you're seeking a single user, we can discuss further via email.

casey_lang · on Jan 27, 2018

Every time I see Cloudcraft, I wish it was available for GCP. I think it's a fantastic way to visualize interconnected systems

user5994461 · on Jan 27, 2018

Ah. I wondered what made these terrible diagrams. They look cute but the text and the symbols are incomprehensible.

jchw · on Jan 27, 2018

Looks mostly comprehensible to me. A lot of the icons are obvious if you deal with this stuff a lot, like the ELB, Redis icons, and the ones with instance types written on them. Bit too zoomed out to read the small text, though.

philwelch · on Jan 27, 2018

If they didn't have the precious isometric projection and just showed a regular 2D diagram it would be more legible.

alexeldeib · on Jan 27, 2018

Love the visuals. Readability is less than great in some cases. I'd probably model my team's infrastructure this way and keep it as a reference to onboard new members and problem solve.

Is the readability just due to the size that the images get reproduced at? If that's the case, it looks like the text isn't great on the "standard" size people reproduce the graphic in (landing page of your site), but it looks nice inside the editor (second section on your site, very crisp). Optimizing it for size, or scaling up the representation somehow would make this a lot more pleasant to look at for me.

philsnow · on Jan 27, 2018

Here I am wishing that CloudFormation had one killer feature that would have allowed us to use it at work: the ability to adopt existing resources into a CF stack. When we were starting on the path of "hey maybe all our infra shouldn't be pointy clicky", we chose between CloudFormation, Terraform, and making something in-house. Out of those three, Terraform was the clear winner for us at the time, but it has not been without issues.

Nobody else in this whole thread seems to be complaining about state management. I think it's insane that Terraform encodes where a given resource is in your filesystem / module hierarchy into the state JSON structure. (Unless something has changed since I last looked,) if you want to move things around in your .tf source, terraform can only apply that by tearing down the old resource and recreating it.

For our large setup, in order to adopt Terraform, we've had to spend a ton of time upfront thinking very hard about how all of our .tf sources are going to look, and it's delayed our deployment by months.

steveh73 · on Jan 27, 2018

You can move state around with `terraform state mv`. It's a bit tricky and has some gotchas but works (which applies to terraform as a whole really)

raziel2p · on Jan 27, 2018

Terraform allows you to make modifications to the state file yourself, both with terraform commands like terraform state mv, or manually if you're brave enough to edit the JSON. It requires confidence in using the tool, of course. But it also encourages you to create your cloud resources in a way where it's safe to let Terraform destroy and re-create most of it at any time.

bchallenor · on Jan 27, 2018

`terraform state mv` is indeed the trick. It took me a while to understand it, but this blog post helped. [1] It leads you through refactoring some resources into a module.

The key takeaway for me was "we really only need to consider the nodes that map to the physical resources of our infrastructure when we are planning our state surgery. This means we can ignore all of the nodes that correspond to data sources, variables, and providers."

So after a refactor, this is what I do now: (1) run plan to get the names of everything terraform wants to delete and recreate; (2) pair all the resource nodes manually and translate them to state mv commands; (3) re-run plan and verify that terraform is now convinced there is nothing to do.

It would be nice if terraform could do this for me, of course, but I find that it is generally possible to avoid delete and recreate if all I've done is a refactoring.

[1] https://ryaneschinger.com/blog/terraform-state-move/

_ix · on Jan 27, 2018

I'm not so sure about this. My experience has been limited, but so far, Terraform has been a treat to work with. I didn't think Terraform cared about where the resources were defined in a the plan step. I'd love for someone with further experience to weigh in here.

Also, Terraform state on the file system? Do you have the luxury of solo development without the need for a Terraform remote backend?

ben0x539 · on Jan 27, 2018

I think the complaint is that you define a resource in some module, then do a bunch of refactoring with the result that the very same resource is now defined in another module, terraform will need to be explicitly told that it's still the same resource or else it will destroy and recreate it on application.

That is, it's about the file system layout of the .tf tree, not the state.

inferiorhuman · on Jan 27, 2018

There's also tf import and tf delete to manually import resources to your state and remove them respectively.

jstop107 · on Jan 28, 2018

You can use outputs in CloudFormation to adopt resources from existing CF stacks. https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...

querulous · on Jan 27, 2018

i heard this third hand, but apparently cfn was slated to get this feature but it was nixed because of concerns with increased support requests. the official stance from amazon now is that you should be able to trivially destroy/recreate resources and cfn isn't going to adopt anything that makes it easier to create stateful deployments

it does suck to have to copy the contents of an s3 bucket just to move a bucket inside of a template tho, for sure

curun1r · on Jan 27, 2018

Their AWS bill seems like it would be a lot higher than it needs to be. They're spinning up always-on staging, testing and development environments in 3 different regions. I know they've got autoscaling configured, so their production environment should be significantly larger than the others, but it still should be possible to be much more economical while accomplishing all of those non-production workloads.

The whole point of infrastructure-as-code should be the ability to spin up environments on-demand, do work, and then spin down. There's no reason to spin up an always-on shared development environment when developers can easily spin up their own environment when they need to do testing and kill it when they're done. Most development tasks can be tested in a single AZ, let alone region. Similarly, QA shouldn't need an always-on, 3-region setup.

roustem · on Jan 27, 2018

You are right. Our AWS bill could certainly be lower and we will have to start optimizing it at some point.

It is does require extra time/effort though. We deployed 1password.ca and 1password.eu just a few months ago and never really got to that.

The only thing we "optimize" at the moment is the smaller number and size of EC2 instances in non-production environments.

curun1r · on Jan 27, 2018

FWIW, I wasn't trying to be critical. Or, at least, I was trying to be constructively critical. Getting to a Terraform setup is an excellent start...lots of companies can't get that far. I guess I'd just encourage you to view it as the beginning of a longer process that allows you to take full advantage of the fact that AWS bills by the hour and lets you launch as much infrastructure as you want.

Always-on non-production environments are, to my mind, a vestige of the time where you had physical servers that needed to be provisioned by a person and cost roughly the same amount when switched off. Or at least the time where OPS built and maintained each AWS VM by hand. On-demand just offers so much more flexibility...stuff like the example from my response to a sibling comment--you should be able to type a single command and spin up an environment from a pull request any time a code reviewer wants to do testing to ensure that what code looks like it does is what it actually does.

The more you can leverage the work you've done to get to where you are, the more you can drive down your AWS costs while giving greater flexibility and isolation to your non-prod workload.

blantonl · on Jan 27, 2018

In cloud development environments, you need to account for multi-AZ deployments and the challenges associated with such.

curun1r · on Jan 27, 2018

Sure, but in every environment? If a developer is pushing a feature that's mostly business logic, it's probably okay to test in a single-AZ environment. Given their setup, it looks like they still have QA and a final go-no-go test in staging. Issues from a multi-AZ or multi-region setup, should be infrequent enough that you don't need to catch them at the development stage.

I'd wager that they'd have far more issues from sharing a development environment between developers who are all pushing unreleased code to the same environment than they'd have from differences with the production environment. It should be pretty easy for a developer to spin up a simplified environment on demand from a branch or specific commit to do their testing. As a bonus, anyone reviewing code can spin up a similar environment from a pull request to ensure that code does what they think it does.

Infrastructure as code is a good first step. Embracing the freedom you get from that reproduceability is the next step.

wgjordan · on Jan 27, 2018

I use CloudFormation to manage similar AWS web-app infrastructure. I've been continuously evaluating Terraform over the years (it is indeed maturing quickly), but have still decided to stick with CloudFormation for now, and would still continue to recommend the same for anyone managing an AWS-exclusive (or mostly-AWS) deployment.

To respond to some specific items mentioned in this post:

- "Terraform has a more straightforward and powerful language (HCL) that makes it easier to write and review code."

It's easy to pipe configuration through your favorite 'straightforward and powerful' templating language of choice to generate the stack template used by CloudFormation. I definitely wouldn't use CloudFormation at all without an extra preprocessing step of some sort.

- "Terraform has another gem of a feature that we rely on: terraform plan. It allows us to visualize the changes that will happen to the environment without performing them."

CloudFormation has a similar feature called "Change Sets" (released in March 2016).

Finally, two more things to note:

- CloudFormation supports "Custom Resources", which allow you to write Lambda-function scripts to perform any custom operation you want (e.g., interact with third-party APIs, or support new AWS resources that don't yet have an official CloudFormation resource implementation). There's not as extensive a library of providers Terraform supports, but you can often find open-source providers to fill the gaps on AWS resource until an official implementation is released. And if not, you can just write a quick and dirty implementation yourself. Terraform supports a similar feature in "Custom Providers", but they are Go-only as opposed to the various supports Lambda-language runtimes.

- CloudFormation and all official AWS-resource implementations not only have excellent public documentation but are officially supported by AWS Support, which is a big deal if you're deploying something with any degree of complexity or cost/risk.

As for Terraform's strengths: its extensive set of resource providers beyond AWS ecosystem can't be beat. Also, it has great support for mapping existing infrastructure resources, which is a feature still sorely lacking in CloudFormation. (I'm surprised the author of this post didn't take advantage of this for their migration to avoid downtime!)

ben0x539 · on Jan 27, 2018

> It's easy to pipe configuration through your favorite 'straightforward and powerful' templating language of choice to generate the stack template used by CloudFormation. I definitely wouldn't use CloudFormation at all without an extra preprocessing step of some sort.

The tricky bit is what kind of logic you can apply to values that are the result of provisioning/looking up resources in the first place, isn't it? Like, what dynamically determined resource IDs are exposed to the language and in what places you can reference them. Basically the capability of terraform's interpolation language vs. CloudFormation Fn::* constructs, I suppose.

stimur · on Jan 29, 2018

Disclaimer: I work for AgileBits.

CloudFormation is a great tool, don't get us wrong. It is one of the first cloud IaaC tools.

And it has it's downsides:

- inability to address specific object in the state, similar to `-target ` option in terraform

- forcible re-deployment (`taint`ing) of the resouce is not possible too, well, not as easy as in TF

- when creating ChangeSet in CF, it is absolutely useless on nested stacks, all you see in it is "Stack will be updated", even if there is nothing to update

- targeted `destroy`s do not exist, as well as destroy `plan`s: similar to what `terraform plan -out plan.out -destroy -target aws_resource.name; terraform apply plan.out` does

- The work with state, renaming, reassigning, deleting and importing resources is missing in CF as a class

- TF refreshes state of the resources each run, checking their actual state and trying to revert any changes applied to them outside of the flow, CF assumes that nothing is changed and gets very surprised when it is drifted or simply doesn't match its expectations. ( I know there is drift detection, but does it actually restores the desired state? I've never had a chance to check that..)

When we deployed new TF stack, we imported part of the resources created by old CF template. And now I am a bit worried to clean it up, because it might try to delete old resources, even with '"DeletionPolicy": retain'. I have no visibility or control over its action. Basically: apply the template and pray.

ChageSets also failed me once, trying to do not what it was telling me in the plan. Terraform plan once captured into the file will do exactly what it promised me, when applied.

To sum up: both tools got their 'pro-'s and 'con-'s, and personally I feel more comfortable with the Terraform.

0x62c1b43e · on Jan 27, 2018

Could you share what preprocessors/templating languages you've used with CloudFormation?

malcolmjuxt · on Jan 27, 2018

At JUXT we use Clojure's EDN, bolstered with some tag literals courtesy of our Aero library (described here: https://juxt.pro/blog/posts/aero.html). We use ClojureScript to compile to TF's JSON. EDN allows comments, ignores commas and otherwise is a nicer JSON. Aero allows us to encode multiple environments in a single document, and include ciphertext for secrets. We're pretty happy with the overall result.

wgjordan · on Jan 27, 2018

I've used Embedded Ruby (ERB), I was also converting YAML to JSON before CloudFormation added native support for YAML templates (Sep 2016). My company's web application was a Ruby/Rails stack so `.yml.erb` syntax was already quite familiar to our team. Pre-processing the template is a simple shell one-liner that you can easily add to your test/deploy scripts:

    cat template.yml.erb | ruby -rerb -e "puts ERB.new(ARGF.read, nil, '-').result" > template.yml

A lightweight template-preprocessor step adds just enough scripting automation (in a familiar language/environment of your choice) to cut through boilerplate, and avoids imposing yet another domain-specific intermediate abstraction layer on top of the whole stack (e.g., troposphere's Python API, arguably also Terraform's HCL).

Corrado · on Jan 28, 2018

cfn-builder[0] is my take on simplifying CloudFormation templates and allowing project teams to manage their own stack. I built it specifically to use in a CD/CI environment so it doesn't use things like input parameters that might change from one run to another. Most variables are stored either in a global namespace (AccountId, IP Addresses, AMIs, etc.) or in separate environment namespaces (Subnet IDs, CidrBlocks, etc.) It even includes several built-in commands to help you maintain your environment, including one to update the global namespace with current AMI information.

The CFN coverage is not as complete as I would like but I've built and managed production workloads with it and it does the job. Since it's built on NodeJS so if you know JSON and a little bit of Mustache it's not hard to understand. Anyone want to help?

[0] https://github.com/KangarooBox/cfn-builder

chatmasta · on Jan 27, 2018

I’ve always wondered — from a security perspective, is this kind of an in-depth engineering blogpost a good idea? You’re basically handing a map of your internal infrastructure to any potential attacker who reads the blog.

Of course obscurity is not security blah blah blah. Still I can’t help feeling that writeups like this could backfire down the line.

Drakim · on Jan 27, 2018

On the flipside, being this open might mean that weaknesses come to light because of the increased scrutiny.

It's just like when talking about secure communication, you explain exactly how the public/private key exchange works, what algorithms are used, and how the entire handshake takes place. You don't say "We are keeping those details secret just in case it might aid some hacker".

The added benefit of everybody examining the process and agreeing that it's sound and secure is just too great to give up.

vr46 · on Jan 27, 2018

The internal infrastructure of AWS apps are fairly guessable anyway. There are only a limited number of AWS resources and they fit together in predictable ways, VPC - Gateway - ASG - EC2 etc etc.

I think it's really great that they've talked about this, it's quite rare to hear about these kinds of internal migrations, and it's something I do a lot with clients but it's not really glamorous enough to talk about.

KurtMueller · on Jan 27, 2018

Hi vr46, where should I go / what resources should I consult if I want to start learning more about AWS & setting up infrastructure? Thanks!

vr46 · on Jan 27, 2018

Hi KurtMueller, get yourself a new AWS account with the free tier, find out exactly which resources ARE NOT included with the free tier, and get down with a simple Terraform file. Terraform code has a one-to-one mapping with AWS resources, so it's easy to follow. One code test that my previous company had was to create a simple load-balanced web server in Ansible or Chef, but you could also do it with Terraform too. Two servers and a load balancer in front, simple but teaches you a lot of fundamental stuff.

alexbilbie · on Jan 27, 2018

Forgive the self promotion but I'm building a subscription video site - http://stackleap.com/ - to teach people how to build and deploy to AWS infrastructure.

The difference between StackLeap and sites like acloud.guru is that I'm focussing on the day to day stuff rather than high level concepts you need to know to pass the certification exams.

I'm hoping to launch a beta in the next few weeks.

insomniacity · on Jan 27, 2018

I've been very impressed with CloudAcademy - it seems to issue AWS accounts per course/per lab - it guides you through the steps and check whether you completed them by checking the AWS account, and then tears it down at the end.

You might not be ready for that level of integration, but see if you can get a demo or trial.

roustem · on Jan 27, 2018

If our security depended on this information being private then we would be in big trouble.

I would love to go into more details about the environment and some of the things we did to build on top of the default AWS security settings. It was too much information for this post, maybe we will do another one that focuses on security.

geerlingguy · on Jan 27, 2018

The map / description in the blog post probably describes about 50% of all the applications running in AWS that I’ve seen, almost identically.

If they listed out security group and IAM configurations, or how exactly they’re connecting through the bastion, then it would be a little more risky, yes.

Jdam · on Jan 27, 2018

CDN in front of S3, LB in front of Multi-Az EC2 and RDS in background plus Redis for caching... this is basically the infrastructure Hello World of an AWS application. From a security perspective, that might be new for script kiddies but probably any random hacker will assume such a setup.

deweller · on Jan 27, 2018

On the other hand, you have a team of X employees who work at the company. I'm sure many of them have access to this map.

By sharing this with the world, you are encouraged to face any vulnerabilities that you may have overlooked.

brown9-2 · on Jan 27, 2018

All this tells me is that they use EC2, Aurora, and other AWS products. What am I supposed to do with that?

moltar · on Jan 27, 2018

I imagine the examples are probably fictional. The only thing you know is that they are using AWS.

roustem · on Jan 27, 2018

No, these are real code snippets. I only masked the account identifiers.

mterwill · on Jan 27, 2018

Interesting blog post, but AgileBits should have communicated planned downtime to their customers via email, which they did not. I’ve been a 1Password user for years and recently switched to their hosted offering. I know a massive infrastructure migration is rare, but that’s all the more reason to be transparent ahead of time.

blantonl · on Jan 27, 2018

What impact did this changeover actually have on you?

roustem · on Jan 27, 2018

I am sorry if the downtime affected you in any way.

Sending several million emails could be a challenge considering that most of our customers depend on 1Password apps and usually not affected by the downtime.

We do have a status page and Twitter feed where we make announcements:

https://status.1password.com

https://twitter.com/1passwordstatus

NelsonMinar · on Jan 27, 2018

Have you considered building a notification system into the app?

fredsted · on Jan 27, 2018

If you want to try out Terraform yourself for a simple Web server/DB Server/Load balancer infrastructure, I made a tutorial for just that here: https://simonfredsted.com/1459

juliosueiras · on Jan 27, 2018

If you are using vim for terraform, then you should try this https://github.com/juliosueiras/vim-terraform-completion, hoping to migrate to langserver by this year so any editor can use it

P.S. sorry for the self-promo

dev2ops · on Jan 27, 2018

Awesome post! Always happy to see more committed infrastructure. We made this move a while back and were really happy with the results. Check out Packer for AMI builds and you're all set.

One suggestion re: TF files is to keep each service in a separate TF file/state and keep all your state files on S3 that way if someone does an apply it's always consistent. Keeping things separated means that you don't have to worry about hitting other services when you do an apply

acd · on Jan 28, 2018

There is a Terraforming github project which exports existing Amazon EC2 infrastructure and creates Terraform infrastructure tf code and tfstate. Terraforming github repo does not seem to be related to 1password.

Terraforming github repo https://github.com/dtan4/terraforming http://terraforming.dtan4.net/

GordonS · on Jan 27, 2018

Wow, I found it fascinating to read such a detailed description of the architecture of a running business! Is anyone else aware of similar blog posts from other companies?

Zaheer · on Jan 27, 2018

I've found http://highscalability.com/ & https://stackshare.io/featured-posts to be really useful. I constantly refer to these at work for design inspiration.

sridca · on Jan 27, 2018

Terraform manages infrastructure; if you also want to manage the installed software in a declarative and reproducible way consider NixOps (based on Nix/ NixOS): http://container-solutions.com/step-towards-future-configura...

paulddraper · on Jan 28, 2018

> servers will be down for the next few hours. We are recreating our entire environment to replace AWS CloudFormation with @HashiCorp Terraform.

One of the greatest strengths of terraform (vs say, cloudformation) is that you can adopt existing resources, and ZERO downtime is needed to migrate.

Why take downtime?

stimur · on Jan 29, 2018

Because we needed to re-deploy our database, which unfortunately requires downtime.

somtum · on Jan 28, 2018

With infrastructure in multiple regions, do you have independent Aurora databases running in each region? Do you have any data synchronising between regions or are they running completely independently?

stimur · on Jan 29, 2018

They are completely independent, that was the goal.

mcrmonkey · on Jan 27, 2018

What are your thoughts on using modules inside terraform ?

_ix · on Jan 27, 2018

Not the op, but I have some brief thoughts.

Given a bit of time with Terraform, the need for modules becomes obvious as you identify common resources for the infrastructure you're modeling. I thought writing modules first was the "right way," but starting with modules ended up being a waste.

scaryclam · on Jan 27, 2018

I'll second this. I started out not using modules, but figured I'd migrate things over to using them as and when it would make things tidier. I never migrated things.

Modules are really neat but I don't think they're a sensible starting point if you're not doing a lot of repetition or don't exactly know what you need to build yet.

stimur · on Jan 29, 2018

Everything described in the blog post is using modules making infrastructure description abstract enough to understand and deploy/manage.

gergnz · on Jan 27, 2018

I find that both terraform and cloudformation have their pros and cons like many things in life. It comes down to personal preference and how you think (vim/Emacs?).

For me, I choose cfndsl to build my cloudformation templates. This gives me the ability to actually write code. JSON, YAML, HCL aren't full languages. Yes tf and cf try to provide language like concepts (iteration, etc) but really at the end of the day they are just a definition of your environment and not strictly code.

Note: I maintain cfndsl, so I'm totally bias.

somtum · on Jan 28, 2018

One major drawback of Cloudformation is its inability to view changesets of nested templates.

alexnewman · on Jan 27, 2018

Two of my least favorite technology. I have been using pass for a while and will never go back

merb · on Jan 27, 2018

would've been easier to just migrate to k8s and used some kind of ignition/managed k8s (and maybe ansible, if things still needs some manual tweaking).

kuschku · on Jan 27, 2018

Let's make one thing very clear here - k8s is never easy. I've been running a cluster since it became usable, and it definitely is the opposite of easy.

Kubernetes is powerful, it is modular, and it makes everything a lot more efficient, but setting it up - especially in such large deployments as 1Password would have here - is never easy.

_ix · on Jan 27, 2018

Any recommendations toward primers and getting started with k8s?

kuschku · on Jan 28, 2018

Manning's Kubernetes in action is a good start, additionally I can recommend everything Kelsey Hightower created on that topic.

Also, I recommend that you first try using kubernetes before you dive in with setting it up. Kubernetes.io has a live tutorial where you can work with a minikube cluster in your browser, afterwards you might want to use Google Cloud's free tier for a small Kubernetes cluster or Minikube until you're comfortable with kubernetes.

For seting up your own cluster, I've heard great stuff about kubernetes the hard way – not to actually set one up, but to learn how the internals work, so you can then fix issues in the cluster you’ll set up with kubeadm/kops/GKE

roustem · on Jan 27, 2018

I would love to migrate to k8s at some point. It would make some things easier. We still need to figure out how to do that without downgrading our existing security configuration -- a lot of it depends on running different services in their own subnets and AWS security roles.

It might be easier now that AWS is starting to support Kubernetes.

ben0x539 · on Jan 27, 2018

I'm really interested in migrating some of our own systems/services to k8s and I'd love if you could elaborate a bit. How do IAM resouces or VPC subnets etc map to k8s concepts?

kronin · on Jan 28, 2018

You can use IAM to auth to a cluster using heptio authenticator https://github.com/heptio/authenticator/blob/master/README.m...

You can grant IAM roles to individual pods running in k8s using kube2iam, though there are new advancements seemingly coming, or already out, now that amazon has announced eks. https://github.com/jtblin/kube2iam

Kops can provision and manage a cluster that incorporates multiple subnets allowing you to have a multi-az buildout https://github.com/kubernetes/kops/blob/master/docs/high_ava...

merb · on Jan 27, 2018

well k8s has his own RBAC system so IAM does not map 1:1. The same with VPC subnets, you actually have your own network with k8s. Some can peer via BGP, like kube-router (as far as I know) and calico (which I use). So basically k8s is a little bit different than the concepts of aws.

notyourday · on Jan 27, 2018

Wait. People are discussing and giving accolades how marvelous it is that a company is migrating infrastructure to a favor of the month because it is using a flavor of some other month tools when the company has the audacity to say they would be down for hours?

Are you kidding?

_ix · on Jan 27, 2018

I don't think there is any shortage of literature on the need to avoid vendor lock-in. Selecting a cloud agnostic tool like those developed by Hashicorp and the opensource community offers the folks at 1Password additional flexibility in their choice of cloud providers. AWS is great, but still...

I don't see any mention of Terraform Enterprise here, either. I imagine they're perfectly capable of pursuing Terraform with the foss version, although the enterprise complement has some pretty great additional features.

notyourday · on Jan 27, 2018

It is possible I was not clear:

The blog post is patting itself on a back for migration that caused downtime.

It included this gem:

> Couldn’t you’ve imported all online resources? Just > wondering.

> That is certainly possible, and it would have allowed us

> to avoid downtime. Unfortunately, it also

> requires manual mapping of all existing resources.

> Because of that, it’s hard to test, and the chance of a

> human error is high – and we know humans are pretty bad

> at this. As a wise person on Twitter said: “If you can’t

> rebuild it, you can’t rebuild it“.