From Heroku to AWS With SaltStack (Part 2)

Share on: FacebookGoogle+Email

This is the follow-up post describing how I moved a client off of Heroku and onto AWS using SaltStack. The first post can be found here. In this section I will describe how I managed the actual EC2 instances using salt-cloud and AWS autoscaling.

AWS Setup

The first step in getting SaltStack set up to work with any cloud provider is to create a configuration file on the salt master that defines the necessary credentials. For AWS that file looks like this:

aws_provider:
  minion:
    master: {{ salt_master_domain }}

  ssh_interface: private_ips # Master is hosted in same VPS. Otherwise public_ips
  ssh_username:
    - ubuntu
    - ec2-user
    - root
  keyname: salt_master
  private_key: /etc/salt/keys/salt_master
  delete_sshkeys: True

  id: {{ aws:key }}
  key: {{ aws:secret_key }}


  {% if aws_region -%}
  region: {{ aws_region }}
  {% endif -%}
  provider: ec2
  rename_on_destroy: True

The first section allows for setting key/value configuration parameters on the minions that are initialized via salt-cloud. In this case it is telling the minion what IP address or URL to use for communicating with the master. The next segment tells the salt master how to connect to the minion over SSH. Since this deployment is inside an Amazon VPS it is using the private interface to connect. It also tells salt which usernames to attempt to use while connecting. The keyname is what tells AWS which SSH key to use when creating the EC2 instance and the private_key is the file path to where the private SSH key is located on the master node.

Next are the IAM keys that you would like to use for managing your compute nodes. The values for the keys themselves are being passed in as context values from the file.managed state. Following that is the configuration for wich AWS region you wish to use.

The provider configuration is what tells SaltStack which cloud provider you are targeting. In this instance we are using the AWS EC2 API. Finally, rename_on_destroy will cause Salt Cloud to rename the EC2 instance when it is destroyed so that subsequent creation of instances will not result in a name collision.

Now that we have a connection to AWS, we need to define each of the instance types that we are interested in creating. To do that we provide a profile that specifies the various values that EC2 needs to be able to spin up a node.

mongodb-server:
  provider: aws_provider
  size: m3.medium
  image: ami-<some_id>
  ssh_interface: private_ips
  ssh_username: ubuntu
  block_device_mappings:
    - DeviceName: /dev/sda1
      Ebs.VolumeSize: 50
  subnetid: subnet-<some_id>

  grains:
    roles:
        - mongodb
        - mongodb-server
    env: prod

This creates a new EC2 instance of size m3.medium with a 50GB EBS disk mounted at /dev/sda1 and assigns it to a given subnet of the VPS. It also sets the roles and env grains to be used for targeting within salt. A similar configuration file needs to be created for each type of instance that you are going to be deploying.

Building the Environment

Now that SaltStack knows how to connect to AWS and build a node we can start telling it how to configure things like auto-scaling, security groups and load-balancing. For the following, you can refer to the collection of formulas found here.

The first thing that is needed is to set up the security groups that will be used for the different server types that we are going to deploy. The formula is driven entirely by pillar data. One thing that I like about this pattern is that it allows for a more data driven infrastructure. An example pillar file might look like this:

  security_groups:
    - name: webhost
      description: Web server security group
      vpc_id: vpc-1234567
      rules:
        - ip_protocol: tcp
          from_port: 80
          to_port: 80
          cidr_ip:
            - 0.0.0.0/0
        - ip_protocol: tcp
          from_port: 443
          to_port: 443
          cidr_ip:
            - 0.0.0.0/0
        - ip_protocol: tcp
          from_port: 22
          to_port: 22
          cidr_ip: default
    - name: mongodb
      description: MongoDB Security Group
      vpc_id: vpc-1234567
      rules:
        - ip_protocol: tcp
          from_port: 27017
          to_port: 27017
          cidr_ip: webhost
        - ip_protocol: tcp
          from_port: 22
          to_port: 22
          cidr_ip: default

Fortunately SaltStack allows for passing the name attribute of an as-yet undefined security group as the cidr_ip of another group, greatly simplifying creation of multiple groups at one time. With this pillar data it is simple to see that we are creating two security groups and that both of them are allowing SSH connections from the default security group.

Now that we have defined the network restrictions for our application and database tiers, it is necessary to build the load-balancer that will proxy the traffic for the application servers. Again, the ELB formula is entirely data driven.

  elb:
    name: MY-ELB
    listeners:
      - elb_port: 443
        instance_port: 443
        certificate: 'arn:aws:iam::<your_account_number>:server-certificate/<your_certificate_name>'
        elb_protocol: HTTPS
        instance_protocol: HTTPS
      - elb_port: 80
        instance_port: 80
        elb_protocol: HTTP
        instance_protocol: HTTP
    subnets:
      - subnet-<vpc_subnet1>
      - subnet-<vpc_subnet2>
      - subnet-<vpc_subnet3>
    health_check:
      target: 'HTTPS:443/check.html'
    attributes:
      cross_zone_load_balancing:
        enabled: True
    security_groups:
      - default
      - webhost

This is creating an elastic load balancer listening on ports 80 and 443 and proxying those requests to servers across each of the subnets contained in the target VPC.

Now that the ELB has been created, we can create the autoscale group for the application tier. To make this work, we need to define two sections of pillar data. First the settings for the autoscale group itself.

  autoscale:
    groups:
      - name: app-hosts
        desired_capacity: 2
        max_size: 10
        min_size: 2
        default_cooldown: 1500
        vpc_zone_identifiers:
          - subnet-<vpc_subnet1>
          - subnet-<vpc_subnet2>
          - subnet-<vpc_subnet3>
        load_balancers:
          - MY-ELB
        launch_config_name: webhost
        launch_config:
          - image_id: ami-1234567
          - key_name: salt_master
          - instance_type: t2.small
          - security_groups:
              - webhost
              - default
          - region: us-east-1
        scaling_policies:
          - name: webhost_scale_up
            adjustment_type: ChangeInCapacity
            as_name: app-hosts
            scaling_adjustment: 1
          - name: webhost_scale_down
            adjustment_type: ChangeInCapacity
            as_name: app-hosts
            scaling_adjustment: -1

This will scale up and down by one node at a time when the associated cloudwatch alarms are triggered. Instances that are added to the autoscale group will also be registered with the ELB instance that we created previously. The settings for the cloudwatch alarms that will trigger the autoscaling are shown here:

  cloudwatch:
    alarms:
      - name: webhost_cpu_scale_up
        attributes:
          metric: CPUUtilization
          namespace: 'AWS/EC2'
          statistic: Average
          comparison: ">="
          threshold: 70.0
          period: 60
          evaluation_periods: 5
          description: 'Web hosts scale up alarm'
          alarm_actions:
            - arn:aws:sns:us-east-1:<account_id>:alert_salt_web_scale_up:<UUID for sns event>
            - scaling_policy:app-hosts:webhost_scale_up
          insufficient_data_actions: []
          ok_actions: []
          dimensions:
            AutoScalingGroupName:
              - app-hosts
      - name: php_cpu_scale_down
        attributes:
          metric: CPUUtilization
          namespace: 'AWS/EC2'
          statistic: Average
          comparison: "<="
          threshold: 40.0
          period: 60
          evaluation_periods: 5
          description: 'PHP hosts scale down alarm'
          alarm_actions:
            - arn:aws:sns:us-east-1:<account_id>:alert_salt_php_scale_down:<UUID for sns event>
            - scaling_policy:app-hosts:webhost_scale_down
          insufficient_data_actions: []
          ok_actions: []
          dimensions:
            AutoScalingGroupName:
              - app-hosts

This will trigger building of a new node when the average CPU usage of the group surpasses 70% for 5 minutes, and then remove it once the usage drops back below 40% for 5 minutes. This set up does require a manual component for creating the SNS notifications that send a request to the Salt API. The SNS request triggers an event in the reactor system which we will use to register the minions that are created from the autoscaling.

This reactor function registers the minions with the salt master. It is adapted from the reactor formula found here.

import pprint
import os
import time
import json
import requests
import binascii
import M2Crypto
import salt.utils.smtp as smtp
import salt.config as config


def run():
    '''
    Run the reactor
    '''
    sns = data['post']

    if 'SubscribeURL' in sns:
        # This is just a subscription notification
        msg_kwargs = {
            'smtp.subject': 'EC2 Autoscale Subscription (via Salt Reactor)',
            'smtp.content': '{0}\r\n'.format(pprint.pformat(sns)),
        }
        smtp.send(msg_kwargs, __opts__)
        return {}

    url_check = sns['SigningCertURL'].replace('https://', '')
    url_comps = url_check.split('/')
    if not url_comps[0].endswith('.amazonaws.com'):
        # The expected URL does not seem to come from Amazon, do not try to
        # process it
        msg_kwargs = {
            'smtp.subject': 'EC2 Autoscale SigningCertURL Error (via Salt Reactor)',
            'smtp.content': (
                'There was an error with the EC2 SigningCertURL. '
                '\r\n{1} \r\n{2} \r\n'
                'Content received was:\r\n\r\n{0}\r\n').format(
                    pprint.pformat(sns), url_check, url_comps[0]
                ),
        }
        smtp.send(msg_kwargs, __opts__)
        return {}

    if 'Subject' not in sns:
        sns['Subject'] = ''

    pem_request = requests.request('GET', sns['SigningCertURL'])
    pem = pem_request.text

    str_to_sign = (
        'Message\n{Message}\n'
        'MessageId\n{MessageId}\n'
        'Subject\n{Subject}\n'
        'Timestamp\n{Timestamp}\n'
        'TopicArn\n{TopicArn}\n'
        'Type\n{Type}\n'
    ).format(**sns)

    cert = M2Crypto.X509.load_cert_string(str(pem))
    pubkey = cert.get_pubkey()
    pubkey.reset_context(md='sha1')
    pubkey.verify_init()
    pubkey.verify_update(str_to_sign.encode())

    decoded = binascii.a2b_base64(sns['Signature'])
    result = pubkey.verify_final(decoded)

    if result != 1:
        msg_kwargs = {
            'smtp.subject': 'EC2 Autoscale Signature Error (via Salt Reactor)',
            'smtp.content': (
                'There was an error with the EC2 Signature. '
                'Content received was:\r\n\r\n{0}\r\n').format(
                    pprint.pformat(sns)
                ),
        }
        smtp.send(msg_kwargs, __opts__)
        return {}

    message = json.loads(sns['Message'])
    instance_id = str(message['EC2InstanceId'])

    if 'launch' in sns['Subject'].lower():
        vm_ = __opts__.get('ec2.autoscale', {}).get(str(tag.split('/')[-2]))
        vm_['reactor'] = True
        vm_['instances'] = instance_id
        vm_['instance_id'] = instance_id
        vm_list = []
        for key, value in vm_.iteritems():
            if not key.startswith('__'):
                vm_list.append({key: value})
        # Fire off an event to wait for the machine
        ret = {
            'ec2_autoscale_launch': {
                'runner.cloud.create': vm_list
            }
        }
    elif 'termination' in sns['Subject'].lower():
        ret = {
            'ec2_autoscale_termination': {
                'wheel.key.delete': [
                    {'match': instance_id},
                ]
            }
        }

    return ret

This will accept requests from the AWS SNS service to the salt API, verify the message validity and then dispatch an action request to either register a new minion or remove it depending on the subject of the message. Because SNS requires verification of new URL endpoints it is necessary to configure the SMTP system in salt as defined here. By adding these lines to the reactor configuration in the master, we will register the endpoints that SNS will use when sending the autoscale messages.

reactor:
  - 'salt/netapi/hook/autoscale/web/up':
      - /srv/lta/reactor/ec2-autoscale.sls
  - 'salt/netapi/hook/autoscale/web/down':
      - /srv/lta/reactor/ec2-autoscale.sls

The specifications for the cloud instance are set in the master configuration as shown here.

ec2.autoscale:
  web:
    provider: aws_provider
    ssh_interface: private_ips
    ssh_username: ubuntu
    minion:
      startup_states: highstate
    grains:
      env: prod
      roles: web-host

This will preseed the salt master with a key for the new minion and register a trigger in the salt cloud system to wait for the new minion to come online. Once the master is able to connect to the minion, it will execute the bootstrap routine, setting the minion grains and telling it to run a highstate once the minion daemon starts up. Because the reactor script is calling tag.split('/')[-2] when looking up which ec2.autoscale configuration to use, we can easily have multiple autoscale configurations that are triggered dependent on the URL that the request is received on. For instance, by changing the configuration in the reactor file to 'salt/netapi/hook/autoscale/database/up', the definition of the instance would be

ec2.autoscale:
  database:
    ...

There are any number of other things that you may want to do with AWS and SaltStack does a good job of making them easy to automate. With a few changes in pillar data, we can now rapidly and repeatably build out an autoscaled and load balanced application tier inside a VPC.

Comments !

Links

social