Amazon AWS: To NAT or not to NAT, That is the Question
The Conceptual Stuff
I was curious about Amazon options to use NAT inside the VPC construct, so I decide to do some research about its merits. Before I delve into the practicalities – here’s the whys and wherefores.
Amazon recommend a NAT configuration if you have Internet facing web-servers, with backend servers that they communicate to. That statement shows how much the AWS geared around “Web Services”, although it’s fair to say that most applications these days have web-based front-end, with an application server/database server back-end. The alternative to this NAT configuration is to merely have public/private subnets protected with Security Groups – with no NAT. In this setup a heavily secured “jumpbox” or “bastion” instance is used as the access point for those environments – this would be a very typical setup for a test/dev environment where only developers need access to whatever Amazon AWS is hosting…
To get a NAT system up and running you have two main options:
- “NAT Instance” – The NAT runs as just another instance amongst your other instances. You can use a number of different sized instances provided by Amazon.
- “NAT Gateway” – This service is configured in the VPC, and has features such as high availability, higher bandwidth capabilities, and less administrative overhead (this method is recommended by Amazon).
I found the NAT Instance method is very easy to setup, and the VPC wizard does a good job updating the VPC “Routing Tables” in order to make sure traffic flows in the right directions. You do however, have to update the Security Groups around the “NAT Instance” to allow it to send and receive traffic – just like any other instance really.
The NAT Gateway method is a tiny bit trickier to setup, and critically is not a Freeium service (remember neither is the NAT Instance really). With the NAT Gateway as you create it you associate it with one of the public subnets inside a VPC, and assign an Elastic IP to it. You do have to manually update the routing tables for the affected (or should that be afflicted?) subnets before traffic flows. The easiest thing is to setup the VPC first, so you can then attach the NAT Gateway to the appropriate public subnet. There are other ways (in terms of order of the process) to do this, but I found this easiest way and the most logical for my brain to wrap its head round. The NAT Gateway is created within a particular “Availability Zone” (AZ) and is implemented with redundancy in mind. And I think it’s for this reason that Amazon recommends it. The NAT Gateways availability is set by which Public Subnet its associated with – so it is possible to create more than one NAT Gateway associated with multiple public subnets in different AZ’s. This web page contains this statement:
“If you have resources in multiple Availability Zones and they share one NAT gateway, in the event that the NAT gateway’s Availability Zone is down, resources in the other Availability Zones lose Internet access. To create an Availability Zone-independent architecture, create a NAT gateway in each Availability Zone and configure your routing to ensure that resources use the NAT gateway in the same Availability Zone.”
And here’s some other nuggets and facts worth highlighting:
- A NAT Gateway supports 10Gbps of bandwidth;
- You can’t swap out an elastic IP to an existing NAT Gateway – you have to destroy and re-create it to change the IP
- Although you can’t wrap a Security Group around NAT Gateway, it does support network ACLs to restrict the traffic it will pass
- Finally, NAT Gateway’s cannot be used with EC2 Classic-Link. However, this is really a legacy issue and would only impact on customers who have been using Amazon AWS for sometime.
The Practical Stuff
NAT Instance Method:
In the VPC wizard you specify the usual suspects:
- VPC IPv4 CIDR
- VPC Name
- Subnet Names (One Public and One Private)
- Subnet IP address range
The UI allows you click the “Use a NAT Instance instead”. This opens up the UI to include different sized NAT Instances, that obviously cost you varying amount of money depending on their sizing.
All this option does is create an instance with a public and private interface in the EC2 views like so:
My next step was to deploy a “jumpbox” instance to the Public01 subnet, and assign an Elastic IP to it, and at the same time deploy an instance into the Private01 subnet. With the appropriate firewall and security group settings I was able to ping from the “jumpbox” to this “test” instance. However, I was worried this isn’t really testing anything. As within an VPC subnets are routed to each other anyway – so you might expect this to happen regardless of whether there was a NAT involved or not.
I decided that testing the configuration for the private NAT side would be more realistic and interesting. Initially, I found my private instance couldn’t do a tracert to public IP address or get me to the Internet. This was because my NAT Instance created by the VPC wizard just uses the “Default” Security Group, and so was unable to translate packets anywhere. Once I wrapped a security group around it that allowed HTTP Port80 (for outbound web access) and ICMP (for my ping and tracert tests) I could validate the instance was going via the NAT Instance.
So you can see that the instance with the IP of 10.2.2.246 with a default gateway of 10.2.2.1, first communicates with 10.2.1.249 which is my NAT Instance:
I found ping –t on this public IP address of 75.x.y.z worked and if I powered the NAT Instance off, hey presto the ping stopped as well.
After a bit of tracert and ping tests I was able to reassure myself that things were working to my satisfaction, and it wasn’t just dumb-luck that’s made it work. 🙂
There’s couple of things that I feel uncomfortable about with this “NAT Instance” Configuration. Firstly, my private instances communicate by the power state of an instance in my EC2 view. It feels “too easy” for some head-banger (like me!) to turn it off, and heck even terminate it – and all this good work would be undone. Admittedly, you could fix this by using an auto-scaling group to bring up a new NAT instance should your existing one fail. However, that’s additional configuration – if you were that concerned about availability – you could just opt for NAT Gateway configuration. Secondly, it isn’t horribly clear what direction the traffic is going in and how it’s getting there. I had to do some digging in the “Routing Tables” to make it clear in my head. By looking at the routing table for the East US (Test) VPC, I was able to find this reference:
So here the routing table is saying anything destined for 10.2.x.y goes through “local” and anything not matching that range (0.0.0.0/0) goes via a target called eni-5f7a4b4b / i-00430de9e1b05a0d3.
And yes, eni-5f7a4b4b / i-00430de9e1b05a0d3 happens to be the ID of my NAT Instance.
So that’s all good then. I decided to blow this method away and see what the other approach was like.
NAT Gateway Method:
In fairness to Amazon themselves, they don’t really recommend the NAT Instance approach. On this page here they state:
“You can also use the VPC wizard to configure a VPC with a NAT instance; however, we recommend that you use a NAT gateway. For more information, see NAT Gateways.”
There isn’t a really a “wizard” way of doing this as such. I created a standard VPC with a single public subnet, and then added a private subnet to that VPC. This is essentially the NAT configuration, minus the NAT device itself. I created the VPC with precisely the same settings as I had previously. I did this first before I created a NAT Gateway.
Next I went through the VPC options to create a NAT Gateway itself. First, I need to select the correct public subnet in the right VPC. In my case this is the Public01 subnet in US East Test VPC.
…and then assign a new Elastic IP to with it.
After clicking the “Create a NAT Gateway” button, I was confronted with the option to either “View NAT Gateways” or “Edit Routing Tables”. You do need to update the routing tables manual for all subnets so they know to use the NAT Gateway for all traffic bound for 0.0.0.0. This routing table update is not unlike the changes that the wizard made for the “NAT Instance” method we saw previously.
I must admit I struggle with the UI here somewhat – finding the right route table to modify isn’t always clear to me.
In the NAT Gateway part of the VPC management I was able to find my NAT Gateway and see both the external IP address as well as the internal IP address configured for the public subnet.
So… (drumroll) my next task was to configure a jumpbox and test instances on my network and see how the comms worked out. This worked first time without any monkeying around with Security Groups for the NAT as it resides out side their scope – just the usual work with Security Groups for the jumpbox/test instances themselves.
Both VPCs were tests, and so to clean up I terminated the instances, and deleted the VPC. This should also delete any NAT Gateway associated at the same time. However, I did get timeouts errors around this deletion process.
Clicking retry did eventually clean everything up, as ever with Amazon AWS objects are marked for deletion and the UI eventually clears itself of terminated instances and deleted NAT Gateways.
You can see why IF you need this configuration that a NAT Gateway is the way to go. No need for Security Groups, coupled with the kind of availability options you would expect from Amazon AWS. I also like the fact the NAT Gateway configuration held separately from EC2, and reside along side other networking components such as the routing tables and Internet Gateways. To me it’s the logical way of doing it.