None

PostgreSQL HA setup using ansible


A declarative approach on the most popular automation tool.

By Kostas Koutsogiannopoulos

Introduction

Quite early on working with automation tools like ansible, you will realise the need that they really satisfy. Yes, they will help you to repeatedly configure a system, fast, without errors -like a script will do- and sure, they will describe that configuration in a more humanly readable yaml form. But the real power with these tools is that they give you the ability to be declarative on configurations, installations, topologies that you deliver in the end.

Think of an installation that includes multiple nodes, multiple layers of services, with a heavy integration layer. Add some availability and security policies. Also, keep in mind that every system need to be maintainable, expandable on demand and recoverable during unpredictable failures. Before configuration management tools, all these, required manual changes to the system from multiple specialists during time. After some time, the system nearly reminds nothing about the initially delivered and there is no way to deliver the mature system again.

This phenomenon which described as “configuration drift”, is the problem that all these configuration management tools try to resolve.

Is using configuration management tools makes you declarative? Not necessarily. For example, you can use the “command” module of ansible in a way that the tool connot recognize if your command made a change to the remote system or not. In general the configuration management / automation tool need to be aware of changes made to the system.

And this is the article’s purpose: Demonstrate the need and respond with a fully functional example.

What you will learn

As you continue reading you will see some things that are not so intuitive working with ansible:

  • How grouping (or tagging on cloud systems) in inventory helps you describe your topology
  • How you can use system's environment variables as ansible variables (useful for passwords or other sensitive values)
  • How you can use “handlers”to execute tasks, only if another task made a change
  • How you can access variables (or “facts”) of a host, on other hosts in order to declare integration between them

The system to deliver

We will create a master PostgreSQL database with two hot replicas. By "hot" we mean that the data is replicated live only one-way. From master to slave databases. We can use the master database to read/write data but the slaves will remain up at anytime only for reading (select) data.

We will not set up automatic fail over for the master database (for now). Instead, we will write our ansible playbook with the ability to intentionally promote a server from slave to master (without data loss) and start replication data from the new "master" to the other 2 servers.

Software Versions

Note that version 12 of PostgreSQL has deprecated recovery.conf file for standby servers. If your are planning to use older versions than 12, you need to make some changes to the playbooks.

  • Ansible 2.8.5 (python version = 3.6.8)
  • PostgreSQL V12.0
  • CentOS 8

ansible.pngPostgreSQL.pngcentos-logo-light.png

 

Inventory

 hosts

[db_hosts]
pgdb1
pgdb2
pgdb3

[db_master]
pgdb1


[db_slaves]
pgdb2
pgdb3

 

The inventory above implies that we will create some tasks for all the database servers, some tasks for the master database only and some tasks for the slaves. Of course all hostnames are resolvable by DNS and our public ssh key is on privileged user 'ansible' on every host.

But let's go straight to our playbook.

The playbook

 

 pg_setup.yml

---
- hosts: db_hosts
  vars:
    repl_user: replusr
    repl_passwd: "{{ lookup('env', 'PG_REPL_PASSWD') }}"
  remote_user: ansible
  become: True
  tasks:
  - name: Check if database is standing by
    stat:
      path: '/var/lib/pgsql/12/data/standby.signal'
    register: standby_status
  - name: Execute prequisites on all hosts
    import_tasks: pg_prequisites.yml
    when: inventory_hostname in groups['db_hosts']
  - name: Configure master node
    import_tasks: pg_conf_master.yml
    vars:
      standby_status: "{{ standby_status }}"
    when: inventory_hostname in groups['db_master']
    become_user: postgres
  - name: Configure slave nodes
    import_tasks: pg_conf_slave.yml
    when: inventory_hostname in groups['db_slaves']
    become_user: postgres
    vars:
      master_host: "{{ groups.db_master[0] }}"
  handlers:
  - name: Restart Linux server
    reboot:
  - name: Restart database
    systemd:
      state: restarted
      name: postgresql-12

Our playbook will need a username (repl_user) for slave databases to connect on the master and start replicating data. It will also need a password for this user but we gave this as environment variable because we do not want our playbook to contain sensitive data.

So, in order to run we need:

$ export PG_REPL_PASSWD='**********'

 

... and then execute with:

$ ansible-playbook -i hosts pg_setup.yml

 

Notes:

  1. We do not have all the task sequence here. Instead, we are using "import_tasks" module to execute pg_prequisites.yml, pg_conf_master.yml and pg_conf_slave.yml files that will folllow.
  2. We built our playbook in order to survive changes in inventory file. For example we can add a new slave or move the master node to "db_slaves" group and "promote" a slave to master. All we need to do is make the changes to inventory and run the same playbook again.
  3. Because of the 2. we need ansible to know if a node has PostgreSQL installed and if was master or slave before execution. So we are feeding pg_conf_master.yml and pg_conf_slave.yml with "stabdby_status" and "master_host" variables respectively. The tasks will act slightly differently depending on those variables as will see.

 

The tasks

 

 pg_prequisites.yml

---
- name: Install PostgreSQL 12 packages
  dnf:
    name:
    - postgresql12
    - postgresql12-server
    - postgresql12-contrib
    - python3-psycopg2
    state: present
- name: Enable database service on boot
  systemd:
    name: postgresql-12
    enabled: yes
- name: Open postgresql service on firewall
  firewalld:
    source: "{{ hostvars[item]['ansible_ens3']['ipv4']['address'] }}"
    zone: internal
    state: enabled
    service: postgresql
    permanent: yes
  with_items: "{{ groups['db_hosts'] }}"
- name: Reload firewalld
  systemd:
    state: reloaded
    name: firewalld

Notes:

  1. The "python3-psycopg2" is required for the pg_* ansible modules to work.
  2. The firewall will remain active, so we need to permit connections on postgresql service only from the cluster members. Check how we are looping over magic variable "groups" in order to feed the task "Open postgresql service on firewall" with the ipv4 addresses.

 

 pg_conf_master.yml

---
- name: Check if database is already initialized
  stat:
    path: '/var/lib/pgsql/12/data/PG_VERSION'
  register: init_status
- name: Initialize the master database
  shell: "/usr/pgsql-12/bin/postgresql-12-setup initdb"
  when: init_status.stat.exists == False
  become_user: root
- name: Promote slave database to master
  shell: "/usr/pgsql-12/bin/pg_ctl promote -D /var/lib/pgsql/12/data"
  when: init_status.stat.exists == True and standby_status.stat.exists == True
- name: Start database
  systemd:
    state: started
    name: postgresql-12
  become_user: root
- name: Create replication user, set MD5-hashed password, grant privs
  postgresql_user:
    name: replusr
    password: "{{ 'md5' + ((repl_passwd + repl_user) | hash('md5')) }}"
    role_attr_flags: REPLICATION
    priv: ALL
    db: postgres
- name: Configure DBs to listen on ip address
  postgresql_set:
    name: listen_addresses
    value: 'localhost,{{ ansible_ens3.ipv4.address }}'
- name: Configure wal_level parameter
  postgresql_set:
    name: wal_level
    value: 'replica'
- name: Configure wal_log_hints parameter
  postgresql_set:
    name: wal_log_hints
    value: 'on'
- name: Configure max_wal_senders parameter
  postgresql_set:
    name: max_wal_senders
    value: '8'
- name: Configure wal_keep_segments parameter
  postgresql_set:
    name: wal_keep_segments
    value: '8'
- name: Configure hot_standby parameter
  postgresql_set:
    name: hot_standby
    value: 'on'
- name: Enable replication user to login
  blockinfile:
    path: /var/lib/pgsql/12/data/pg_hba.conf
    block: |
      {% for host in groups['db_hosts'] %}
      host    replication     replusr     {{ hostvars[host]['ansible_facts']['ens3']['ipv4']['address'] }}/32    md5
      {% endfor %}
- name: Restart database
  systemd:
    state: restarted
    name: postgresql-12
  become_user: root
  when: standby_status.stat.exists == False

Notes:

  1. We depend on the existence of "/var/lib/pgsql/12/data/PG_VERSION" file to determine if the database is installed and initialized.
  2. On task "Create replication user, set MD5-hashed password, grant privs" you can see how you can calculate the md5 hash of a password inside a playbook.
  3. On task "Enable replication user to login" you can see how you can loop over "groups" magic variable generating one line per host in pg_hba.conf file.

 pg_conf_slave.yml

---
- name: pgpass for non interactive logins to the other node
  blockinfile:
    path: /var/lib/pgsql/.pgpass
    create: yes
    mode: 0600
    block: |
      # hostname:port:database:username:password
      {% for host in groups['db_hosts'] %}
      {{ hostvars[host]['inventory_hostname'] + ':5432:replication:' + repl_user + ':' + repl_passwd}}
      {% endfor %}
- name: Stop database
  systemd:
    state: stopped
    name: postgresql-12
  become_user: root
- name: Remove data
  file:
    state: absent
    path: "/var/lib/pgsql/12/data/"
- name: Check if database is already initialized
  stat:
    path: '/var/lib/pgsql/12/data/PG_VERSION'
  register: init_status
- name: Initialize the slave database
  shell: "pg_basebackup -D /var/lib/pgsql/12/data -h {{ master_host }} -X stream -c fast -U {{ repl_user }} -w -R"
  when: init_status.stat.exists == False
- name: Start database
  systemd:
    state: started
    name: postgresql-12
  become_user: root
- name: Wait for postgres to listen on port 5432
  wait_for:
    port: 5432
    delay: 2
- name: Configure recovery_target_timeline parameter
  postgresql_set:
    name: recovery_target_timeline
    value: 'latest'
  notify: Restart database
- name: Configure DBs to listen on ip address
  postgresql_set:
    name: listen_addresses
    value: 'localhost,{{ ansible_ens3.ipv4.address }}'
  notify: Restart database

Notes:

  1. On the task "pgpass for non interactive logins to the other node" you can see how you can loop over "groups" magic variable generating one line per host in .pgpass file.
  2. On step "Initialize the slave database" we are loading all the data from the beggining, connecting to the master database. This is not smart for large databases. You can use archiving instead, but we tried to keep a certain level of simplicity for this presentation.

 

Conclusion

The setup above assumes that applications can handle database connections (read/write, read only, load balancing, stale connections etc). In the future we may include a service that can handle connections transparently. But this is out of article's scope.

 


View epilis's profile on LinkedIn Visit us on facebook X epilis rss feed: Latest articles