Automate to the max: instant Ubuntu Server setup with Chef

You've started developing new app and need server to deploy it. You can choose hosting platform like Heroku or Shelly which may turn out to be quite expensive if you want to host multiple apps. You can also set up your own server. Going with the latter option can be quite time consuming, especially if sysadministration is not you main responsibility and you have multiple servers to provision. I that case automation beyond simple Bash scripts is a must - time to meet Chef.

What is Chef?

Chef is the automation framework which uses Ruby DSL and helps provisioning new servers by automating the whole process. We are going to concentrate on Chef Solo where we set up all roles (like PostgreSQL server) on local machine and use them on our server, contrary to Chef Server which is a hub for configuration data that is automatically applied to all nodes (servers) connected with central server, quite useful when you need to manage serious amount of servers.

We will also be using some other utilities - Knife (solo) which is a command line utility helpings us interact with server and Berkshelf - a bundler-like utility for Chef recipes.

I'm not going to write another tutorial explaining every possible detail of entire Chef DSL. The documentation is pretty good and there are also plenty of other resources you can learn from. I'd rather like to explain the most important terms, show basic configuration, demonstrate how to write very simple recipe and at the end I am going to introduce my own cookbook that I use for servers' setup with Ubuntu Server 14.04. Why Ubuntu? Well, sysadministration is not my main responsibility and it's much easier to find solutions (or Chef cookbooks) for Ubuntu than any other distribution.

After reading this post you should be able to setup every server instantly and have some basic understanding of what's going on.

Why use Chef?

You may wonder what are the benefits of using Chef over the Bash scripts. Firstly, Chef provides extremely expressive DSL, just take a look at the code below:

template "/etc/nginx/nginx.conf" do
  owner "root"
  group "root"
  mode "0644"
  source "nginx.conf.erb"
end

You can have some general idea what it does, even if you don't know Chef. That way it's quite easy to find reusable recipes that you can customize to your requirements and write your own solutions.

Another huge benefit is idempotence - you can apply the same recipes multiple times on your server and the state of the server will be exactly the same as after running them for the first time. If you change something in eg. configuration files, only these changes will be applied. Try achieving the same using shell scripts only ;).

Once you understand Chef, provisioning new servers will be extremely easy and fast - it can be even limited to 2 commands if you use the same configuration for all servers.

Chef basics

Terminology

Node

A server (machine) we are going to set up and run Chef on.

Recipe

Basic unit in Chef for installing one thing, like PostgreSQL, ImageMagick, etc.

Cookbook

Collection of recipes, e.g. PostgreSQL cookbook.

Role

Combination of recipes that fulfill specific "feature" (or role) - PostgreSQL server role ,besides Postgres itself, may also require Monit configuration

Data bags

Files with data that may be required by some recipes, e.g. ssh keys that will be added to the authorized keys for deploy user.

Getting started

Let's start with creating directory for really simple chef recipe:

mkdir chef-simple-recipe
cd chef-simple-recipe

and create Gemfile with chef, knife-solo and berkshelf gems:

source 'https://rubygems.org'

gem 'knife-solo'
gem 'chef'
gem 'berkshelf'

Now we can initialize our project using Knife utility:

knife solo init .

You should be familiar with the generated directory structure after going through terminology part. You may wonder what's the difference between cookbooks and site-cookbooks directory: cookbooks is for storing, well, cookbooks, installed by Berkshelf and site-cookbooks is for our own cookbooks.

Berskfile resembles closely Gemfile: to add new cookbooks just specify the name of cookbook:

site :opscode

cookbook 'build-essential'

In most cases I also specify git repository to have a quick reference to the cookbook:

site :opscode

cookbook 'build-essential', git: 'https://github.com/opscode-cookbooks/build-essential

To install all recipes use Berkshelf:

berks install

If you want cookbooks to be extracted to cookbooks directory:

berks install --path cookbooks

Writing first Chef role - Nginx server

To understand the general idea behind Chef we are going to write pretty simple role for Nginx. It will take care of installing Nginx with some specified attributes in configuration file and setup monitoring with Monit. We also need to check if everything works. Fortunately, we don't need real server, Vagrant will be perfect to set up a virtual environment. Just download it and follow the instructions below:

Firstly, let's create directory for our server:

mkdir ubuntu-server-14-04
cd ubuntu-server-14-04

and install Ubuntu Server 14-04:

vagrant init ubuntu-server-14-04 https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box

As you can see, the Vagrantfile was created. You may check it out but it's not necessary.

To run Vagrant:

vagrant up

To check if everything works, ssh on the virtual machine with Ubuntu Server:

vagrant ssh

Now we have our node, let's prepare it for running Chef (the command below must be run within main Chef project directory):

knife solo prepare vagrant@127.0.0.1 -p 2222 -i ~/.vagrant.d/insecure_private_key

Remember to use real system user name.

Our virtual node can now run the Chef client. The command above has also generated nodes/127.0.0.1.json configuration file. Let's investigate the content: run_list is a list of all roles and recipes that will be applied on the node. You can add new roles/recipes just by specifying the name but it may be a good idea to be more explicit and specify whether it's a role or a recipe to avoid name collisions. In our case it will be:

{
  "run_list": [
    "role[nginx]"
  ],
  "automatic": {
    "ipaddress": "127.0.0.1"
  }
}

Note: If you want to play a bit with the entire configuration discussed in this blog post, you may check it out here.

Let's create our nginx role:

touch roles/nginx.json

Here's a basic template for roles:

{
  "name": "nginx",
  "description": "Nginx server with Monit configuration",
  "default_attributes": {},
  "json_class": "Chef::Role",
  "run_list": [],
  "chef_type": "role"
}

The content is pretty self-explanatory: We need to specify name of the role and mark it to be role, so we use Chef::Role json_class and role as a chef_type. We can also provide some description. And what's the default_attributes? We will put there any configuration related parameters, once we set up basic template, you will know how it works.

Let's create our custom cookbook for Nginx in site-cookbooks directory. It will consist of:

metadata.rb file - where we can specify some details like dependencies, supported operating systems, author of the cookbook etc.
recipes directory - with default.rb file which will contain all the commands that need to be executed to install Nginx.
templates directory - place for configuration files etc., we will put nginx.conf.erb template in default subdirectory.

Why default.rb file name? You can write multiple recipes and specify their names, e.g. monit-configuration::postgres, monit-configuration::nginx but in our case we need just one recipe so the default.rb will be sufficient. Using nginx::default and nginx won't make any difference in this case. The same applies to the template files, that's why nginx.conf.erb is located in templates/default/nginx.conf.erb, not templates/nginx.conf.erb

Out metadata.rb can look like that:

# site-cookbooks/nginx/metadata.rb
name              "Nginx"
maintainer        "Karol Galanciak"
maintainer_email  "karol.galanciak@gmail.com"
description       "Installs Nginx"
version           "0.0.1"

recipe "nginx", "Installs Nginx"

supports "ubuntu"

What about the configuration file? We need some basic template and decide which attributes will be hardcoded and where we want to have an ability to customize them within our roles. Here's an example, based on that one:

user www-data;
worker_processes 4;

pid /var/run/nginx.pid;

events {
  worker_connections 768;
}

http {

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;
  keepalive_timeout 65;
  types_hash_max_size 2048;
  server_names_hash_bucket_size  64;

  include /etc/nginx/mime.types;
  default_type application/octet-stream;

  access_log /var/log/nginx/access.log;
  error_log /var/log/nginx/error.log;

  gzip on;
  gzip_disable "msie6";

  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

Just to keep it as simple as possible let's assume that we will want only user and worker_processes attributes to be customizable and www-data user will be our default with 4 worker processes. We can achieve that using erb templates and specifying default attributes for our default.rb recipe. Using attributes is pretty straight-forward: we need to create file for our recipe (in that casedefault.rb) file in attributes directory and use hash syntax on default object where nginx will be our namespace:

# attributes/default.rb
default['nginx']['user']      = 'www-data'
default['nginx']['worker_processes'] = '4'

And how to access these values within our nginx.conf.erb template? The same way, hash syntax but with node object. So our template will look like this:

# templates/default/nginx.conf.erb
user <%= node['nginx']['user'] %>;
worker_processes <%= node['nginx']['worker_processes'] %>;

pid /var/run/nginx.pid;

events {
  worker_connections 768;
}

http {

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;
  keepalive_timeout 65;
  types_hash_max_size 2048;
  server_names_hash_bucket_size  64;

  include /etc/nginx/mime.types;
  default_type application/octet-stream;

  access_log /var/log/nginx/access.log;
  error_log /var/log/nginx/error.log;

  gzip on;
  gzip_disable "msie6";

  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

We will be able to customize these attributes from our role and/or node definition.

So we are left with the last part of installing Nginx: the recipe itself. Let's think how we want to this: we probably want to install Nginx - before that we may add ppa:nginx/stable repository to download the latest version, extract our template for configuration file and restart Nginx to use the new configuration. Fortunately, it looks very similar in Chef DSL:

# recipes/default.rb
bash 'add repo for Nginx' do
  user 'root'
  code <<-CODE
    add-apt-repository ppa:nginx/stable
    apt-get update
  CODE
end

package "nginx"

template "/etc/nginx/nginx.conf" do
  owner "root"
  group "root"
  mode "0644"
  source "nginx.conf.erb"
  notifies :run, "execute[nginx-restart]", :immediately
end

execute "nginx-restart" do
  command "/etc/init.d/nginx restart"
  action :nothing
end

What's going on here?

We start with adding Nginx repository using bash method which is used for executing Bash scripts (as the name implies). We want to run in as root user and the script for adding repository is in code body.
In next step we tell Chef to install Nginx itself using package manager.
Then we want Chef to use our template file for configuration. Most of the parameters are pretty self-explanatory: root user will be the owner of the file, the file will belong to root group, we set permissions for the file, specify source in templates directory and we use notifications to take some action immediately (the other option is delayed taking action at the end of chef-client run). To specify action to take place we use resource[name] syntax.
In last step we define our action for restarting Nginx using Chef Execute provider: we specify the command to be run and action, which can be :run(will run the command) and :nothing(prevents from running the command - we use :nothing in this case as we use it in notifies method in template).

And that's it. We are left with Monit config. We've already written our own recipe for Nginx so let's use this recipe for Monit itself and that one for Nginx configuration. Copy these two cookbooks to Berksfile:

cookbook 'monit_configs-tlq', git: 'git@github.com:TalkingQuickly/monit_configs-tlq.git', branch: 'master'
cookbook 'monit-tlq', git: 'git@github.com:TalkingQuickly/monit-tlq.git', branch: 'master'

And run:

berks install

Let's get back to our nginx.json role definition. We need to specify attributes for nginx namespace: the default user as www-data is ok, so we will just set worker_processes to 2 and also add Monit configuration for Nginx. At the end the role will look like that:

// roles/nginx.json
{
  "name": "nginx-server",
  "description": "Nginx server",
  "default_attributes": {
    "nginx": {
      "worker_processes": "2"
    }
  },
  "json_class": "Chef::Role",
  "run_list": [
    "nginx",
    "monit_configs-tlq::nginx"
  ],
  "chef_type": "role"
}

We will also need to install Monit itself. To check if everything works as it should, we will include email notifications. Let's define monit role:

// roles/monit.json
{
  "name": "monit",
  "description": "Monit",
  "default_attributes": {
    "monit": {
      "notify_emails" : ["email@example.com"],
      "enable_emails" : true,
      "mailserver" : {
        "host" : "smtp.gmail.com",
        "port" : "587",
        "username" : "email@example.com",
        "password" : "password",
        "hostname" : "hostname"
      }
    }
  },
  "json_class": "Chef::Role",
  "run_list": [
    "monit-tlq"
  ],
  "chef_type": "role"
}

Don't forget to put real data there ;). You may be wondering how did I know what attributes should I specify - in most cases these are documented but sometimes you will have to read the template files and check what kind of attributes you can customize and which are hardcoded.

We have our roles defined, the last thing we need to do is to include them in node definition:

// nodes/127.0.0.1.json
{
  "run_list": [
    "role[monit]",
    "role[nginx]"
  ],
  "automatic": {
    "ipaddress": "127.0.0.1"
  }
}

So here is the final step - applying recipes on our node:

knife solo cook vagrant@127.0.0.1 -p 2222 -i /Users/system_user_name/.vagrant.d/insecure_private_key

The great thing about Chef is that you can change the values of attributes, apply them on the node and the chef-client will pick that change up. Just change worker_processes to 3 and watch what happens - Chef client will change the value of the attribute and restart Nginx.

Note: applying the cookbooks on a real server is almost the same as working with Vagrant:

knife solo prepare root@ip
knife solo cook root@ip

Setting up complete server for Rails apps with Chef

Composing Chef cookbooks for your server can take a long time: reading all recipes / cookbooks, checking configuration etc. may be quite tedious, especially when doing if for the first time, so I decided to share with my own configuration which I'm going to describe in this section (heavily inspired by Ben Dixon's recipes, author of Reliably Deploying Rails Applications).

Let's take a look at the node definition. We have some new parameters: environment set to production - I will explain in a minute what is it for - and Debian platform_family. Next we've got some recipes-related attributes. It could also be put in role definitions but I like keeping sensitive data in node definition:

authorization - these attributes are related to sudo recipe - we assume that we are going to use deploy user which is going to have sudo access enabled. Also, the entire sysadmin group is going to have sudo access. We set passwordless to be false - the password will alwaus be required.
monit - configuration for Monit concerning sending notifications and accessing via web interface. I would suggest having them enabled. However, if you decide not to enable them, just delete this section.
postgresql - you must specify password hash for postgres user. You can generate it easily using openssl: openssl passwd -1 "yourpassword"
security - we can set ssh port here. The important thing is that you will have to restart ssh service, even if you don't change the value. Restarting using Chef caused some exceptions that I couldn't handle so far, so remember to restart the service while sshing on your server after running Chef for the first time: /etc/init.d/ssh restart

And the last thing is run_list:

role[server] - responsible for basic server setup
role[postgres-server] - install PostgreSQL and related stuff
role[rails-app] - Ruby / Rails related components, like RVM, Rubies
role[mongo-server] - installs MongoDB and sets up Monit monitoring
role[redis-server] - installs Redis and sets up Monit monitoring
role[memcached-server] - installs Elasticsearch and sets up Monit monitoring
role[nginx] - installs Nginx with Passenger and sets up Monit monitoring.

If you don't want to install some components, simply remove them from run_list.

One more thing before we move to more detailed description of the roles: data_bags directory. It will be used for creating user (deploy), setting up password (again, password hash, not the plain password) and uploading ssh key. The deploy user is already specified in deploy.json file, so just paste your ssh key from id_rsa.pub and the password hash generated by:

openssl passwd -1 "yourpassword"

As the attributes set in the node definition take precedence over the ones defined in roles, the important part in the server role is the run_list:

openssl is responsible for managing passwords>
build-essential installs build-essential
chef-solo-search - library related to data bags which helps with searching
sudo and users::sysadmins were already discussed - they are responsible for creating users with specified password and giving sudo access
ssh_key_gen generated ssh key for deploy user
basic-security-tlq - based on this recipe - deals with security. It installs fail2ban, ufw (firewall), unattended-upgrades packages and installs security updates automatically each day. It also modifies ssh settings (X11Forwarding is set to no, UsePAM to no and ssh port to specified value in node definition). It also enables 22, 80 (for Nginx) and specified ssh port in firewall and disables any other. You can add some rules for firewall using firewall_allow attributes in the following format: {"port": "x", "ip": "xxx.xxx.xxx.xxx"}
look-and-feel-tlq - based on this recipe (I had to comment out restarting ssh service) - installs htop, vim, unzip packages. Remeber that environment parameter? If set to production, it will display beautiful "PRODUCTION" banner while sshing ;).
monit-tlq - installs Monit
monit_configs-tlq::system - sets up Monit configuration for system. You should check it out and decide if you want to include it, you may receive some occasional spam about on eg. small VPS instances, which is not a good sign. Monit notifications shouldn't be neglected.

Let's move to another role - Postgres server: basically it installs PostgreSQL (9.3), changes pg_hba.conf configuration using specified attributes, uses pgtune utility to provide better configuration parameters (based on the hardware) and thus performance and sets up Monit monitoring. If you need more customization, refer to the docs.

Next role deals with Ruby and Rails environment. The most important thing here is that it installs system-wide RVM, which is in my opinion much more convenient to work with on servers (development machine is a different story). Next, deploy user is added to rvm group, default Ruby version is specified and Rubies are installed. We also install Bundler and Passenger gems. You can check the docs if you need further customization. And what about rails_gem_dependencies-tlq recipe? It installs some packages that you will probably need: curl, libcurl3, libcurl3-dev, imagemagick, libmagickwand-dev and nodejs. And one more thing: there might be a problem with RVM permissions (at time of writing this post, this pull request haven't been merged yet), so it may be a good idea to run: rvm fix-permissions system when sshing to your server for the first time.

Another role is Mongo server - it installs Mongodb, sets up Monit monitoring and uses /home/data/mongodb directory for db, you may delete it if you want the default value.

Next three roles are quite similar: they install Redis, Memcached and Elasticsearch and set up Monit configuration for each of them. Also, in case of Elasticsearch, it installs OpenJDK and gives possibility to customize the amount of allocated memory - you will probably want to remove it, I keep it in a template, just to remember that it's a customizable attribute.

And the last one role: Nginx role, which installs Nginx with Passenger and sets up monitoring with Monit. There were some problems with using RVM Ruby when dealing with Passenger so it required helper recipe for rake package. There are a lot of hardcoded values (Ruby version, Passenger version) so make sure they match the ones specified in Rails App role. If you want more customization, refer to the docs.

Wrapping up

That was pretty quick introduction to Chef and there might be a lot things that weren't made perfectly clear. Again, it was not the purpose of this post to explain every possible detail but to give you the general idea. I hope that after reading this blog post you will have some basic understanding how Chef and related utilities work, how to write your own recipes, fork other cookbooks and modify them to your taste and never again do the manual server setup. You also have pretty nice starting point - just clone the repo of my server template and apply to the nodes ;).

posted in: Server, Chef, Deployment