If you want to ensure the high availability of your mail server (Postfix, Dovecot, Rspamd), you'll usually choose high-availability systems like Pacemaker or Corosync. This involves complex cluster setups and specialized cluster mechanisms (STONITH, DRBD), but it also increases the intensity of cluster maintenance. This only becomes profitable on a large scale; it would be overkill for small companies, organizations, or individuals.
However, the challenge of a highly available mail server also exists for these target groups. The way to achieve this is through DNS-based mail server failover.
Our mail server consists of Postfix for the MTA and Dovecot as our MDA. We also protect ourselves against spam emails using Rspamd. Furthermore, we need both a web server (Apache2) and a database server (MariaDB) for both the administration of the mail accounts with PostfixAdmin and for web-based access to the mail accounts (e.g., with Roundcube). For encrypted mail sending and access to the web interfaces, we use the free LetsEncrypt service to create the SSL/TLS certificates for the server.
Failover and redundancy of the mail server means having a second mail server that is identical except for the IP address and hostname. This means that both mail servers always have the same Postfix configuration – except for the hostname and IP address. The same applies to the configuration of the Dovecot service; even the MariaDB database must be identical on both, right down to the Rspamd rules, which must also be identical.
It should also be mentioned that for reasons of redundancy, the two mail servers - in our case VMs on a KVM host each - are located on different server hardware.
Another challenge is the Let'sEncrypt certificates, which must be completely identical on both servers. Let's assume our mail server listens on smtp.example.com, imap.example.cpm, pop3.example.com, mail.example.com, as well as smtp.example2.com, imap.example2.cpm, pop3.example2.com, mail.example2.com, and mx1.example.com, and the failover mail server initially only listens on mx2.example.com. If mx1.example.com fails, mx2.example.com must be able to immediately take over all the domains and subdomains just listed. Let'sEncrypt's conventional HTTP challenge, which tests the web space on an IP-based host, doesn't work, but the DNS challenge does exactly that. It is also important that the TTL (Time To Live) is set to 60 seconds.
Why is that? A "TTL=60" means a higher DNS load, since the ISP's clients and resolvers will only cache the DNS entry for 60 seconds. A TTL of 60 seconds does mean more flexibility, but at the expense of performance, stability, and server load. However, our goal in this case is a redundant mail server system. A mail server failure without any backup is much more critical. That doesn't hurt as much.
It's also worth mentioning that simple ideas like entering two mail server addresses for sending and receiving mail in the mail client as mail account users and simply using both are unfortunately far too complicated for the majority of users, despite being the simplest solution. That's why we are pursuing this ingenious solution of mail server redundancy.
The mail server system used here runs on a current Debian Linux and includes these server components and software.
Servers, software | Purpose and description |
---|---|
Postfix | MTA for sending and receiving mail via SMTPs |
Dovecot | MDA for storing and delivering emails via IMAP |
MariaDB | MySQL server for managing the mail domains of the mail system and mail accounts as well as the webmailer DB |
Rspamd | Mail filter system to block SPAM |
Apache2 | Web server for the administration GUI, the webmailer |
Let's Encrypt | Free certification tool for SSL/TLS certificates |
PostfixAdmin | Administration GUI for user-friendly administration of mail accounts |
Roundcube | Webmailer for the mail accounts |
As you can see, we have a multitude of server and software components here that must be absolutely identical on both mail servers (main server and failover server). Regardless of which of the two sends or receives an email, it must be available on the other Dovecot server (MDA) within a very short time (a few minutes). The emails are accessible via mail clients (usually) as well as via both webmail clients on mx1 and mx2.
While the configuration of the two Postfix servers remains virtually unchanged (both are identical except for the hostname), the situation is different for Dovecot, the MySQL databases, the rspamd rules, and the Let's Encrypt certificate. The latter is checked for renewal against the certificate authority once a week and, upon successful renewal, is synchronized to the failover mail server mx2.
The challenge will therefore be to ensure that the databases involved and all the emails are synchronized regularly and securely so that the mail servers are synchronized for the users.
Both the Dovecot server and the MariaDB server offer ingenious functionalities that can implement exactly this: the replication of email accounts or databases at clearly defined time intervals.
We should also consider that we may not have a direct connection between the two root servers (crossover cable), but rather the second KVM server with the redundant VMs involved may be located on a different network. In this case, a VPN (Virtual Private Network) should definitely be set up between the servers involved, allowing data to be synchronized in both directions.
In addition to the identical Let's Encrypt certificate, which the main mail server regularly updates, the Rspamd rules are also important. These can be maintained on both the main and the failover server. Synchronization tools like Rsync immediately spring to mind, but the program Unison proves to be much more practical in this scenario. Rsync is primarily designed for unidirectional synchronization, bidirectional synchronization with Rsync is possible, but not trivial:
In contrast, Unison was designed from the ground up for bidirectional synchronization.
Therefore, Unison is used for both cases of synchronization of the Rspamd rules and the SSL/TSL certificates, controlled by respective cron jobs.
This means we have two virtually identical mail server systems at all times – apart from the IP addresses and hostnames.
This architecture dispenses with traditional MX priorities. Instead, the failover program "DNS_Failover" ensures that relevant DNS CNAME records are dynamically updated in the event of a mail server failure, seamlessly redirecting mail traffic to the available server. This is an ideal approach for small infrastructures where resilience is just as critical as for large ones.
The failover program “DNS_Failover”, installed directly on the authoritative DNS server, automatically updates the CNAME records for receiving and sending mail as soon as a mail server or KVM host is no longer accessible.
This allows for almost instant switching to the functioning server – without waiting times due to MX retries or DNS caching problems as with traditional setups.
An SMTP proxy, such as HAProxy, inevitably brings with it additional dependencies, configuration effort, potential sources of error, and possibly its own failover mechanism.
The DNS-based approach, on the other hand, completely eliminates a central SMTP entry point, making the system simpler, leaner, and less maintenance-intensive – perfectly suited to small IT teams.
While a proxy always represents a potential single point of failure – unless it is set up redundantly with keepalived or load balancing – this approach is based on DNS-based redirection, which reacts directly at the infrastructure level.
There is no central bottleneck; control is carried out by the DNS logic itself.
Minimizing infrastructure is crucial, especially for small businesses. The approach used here requires no additional hardware or VMs, but instead uses the existing DNS server as the control center.
This solution is therefore completely software-based and resource-efficient, making it ideal for lean IT setups.
The dynamic switching of DNS CNAME records is carried out using the specially developed open source tool "DNS_Failover", which serves as transparent and lightweight failover management for mail servers.
Since it is free and open source software, DNS_Failover can be easily:
The solution remains completely under your control, is auditable, and can be flexibly adapted to technical and organizational circumstances – a decisive advantage over commercial "black box" failover systems.
A highly available mail server system is technically demanding—too complex to be fully described here.
With IT-LINUXMAKER, we are at your side for expert advice and implementation.