04. March 2012

reclustering ejabberd

Der Jabber-Dienst ist dieser Tage etwas instabil, das hat 2 Gründe:

  • Unser Storage macht uns schwere Sorgen. Wir benutzen iscsi und der Server, der die Volumes verteilt rebootet in regelmäßigen Abständen ohne ersichtlichen Grund. Das hat inkonsistente bzw read-only Dateisysteme zur Folge und lässt fast alle Dienste in einem halb-funktionierenden Zustand. Wir arbeiten an einer Lösung.
  • Das Clustering der Ejabberd-Knoten scheint irgendwie nicht funktioniert zu haben, sodass wir, so scheint es, kein redundantes, sondern ein voneinander abhängiges Setup hatten. Ich habe hier mal zusammengetragen, was man tun muss, um wirklich ordentliches Fail-Over zu haben. Das sind zwar sinnvolle Einstellungen, aber wir garantieren natürlich nicht, dass das so funktioniert :) Zu Doku-Zwecken ist der folgende Teil auf Englisch.

Assume we have a 2-node setup (vm-jabber{0,1}) which has a broken replication scheme and start over be purging vm-jabber1 completely. Since Ejabberd V 2.1.x there is a nice way to remove a db node from a setup.

On our master server (vm-jabber0): Make sure to include the following line in ejabberd.cfg

{modules,
 [
[...]
  {mod_admin_extra, []},
[...]

After this, restart the ejabberd process and run:

ejabberdctl remove_node 'ejabberd@vm-jabber1'

In a debug shell (or the webinterface) confirm that the node has been purged:

$ ejabberdctl debug
Attaching Erlang shell to node ejabberd@vm-jabber0.
To detach it, press: Ctrl+G, q, Return

Erlang R14A (erts-5.8) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [kernel-poll:false]

Eshell V5.8  (abort with ^G)
(ejabberd@vm-jabber0)1> mnesia:info().
// SNIP //
running db nodes   = ['ejabberd@vm-jabber0']
stopped db nodes   = [] 
master node tables = []
// SNIP //
// Hit Ctrl-C twice to abort the debug shell

On the purged node, stop ejabberd, remove all database files and get a fresh ejabberd.cfg copy from the master. Also, we will need the master cookie to authenticate the nodes with each other.

/etc/init.d/ejabberd stop
rm -rf /var/lib/ejabberd/*
scp root@vm-jabber0:/etc/ejabberd/ejabberd.cfg /etc/ejabberd/
chown root:ejabberd /etc/ejabberd/ejabberd.cfg
chmod 640 /etc/ejabberd/ejabberd.cfg
scp root@vm-jabber0:/var/lib/ejabberd/.erlang.cookie /var/lib/ejabberd/
chown ejabberd:ejabberd /var/lib/ejabberd/.erlang.cookie
chmod 440 /var/lib/ejabberd/.erlang.cookie

When we are done we have to rebuild the mnesia database i.e. import the schema (to disc) and get copies for all tables from the master. So we start a basic erlang process and not ejabberd since this would recreate the ejabberd db for a new local setup.

su - ejabberd -c bash
erl -sname ejabberd@vm-jabber1 -mnesia dir '"/var/lib/ejabberd/"' \
  -mnesia extra_db_nodes "['ejabberd@vm-jabber0']" -s mnesia
[...]
(ejabberd@vm-jabber1)1> mnesia:change_table_copy_type(schema, node(), disc_copies).
// submit and hit ctrl-c twice to exit or check the newly populated db with mnesia:info().

Now you can fire up the second ejabberd node on vm-jabber1. But there is still work to do. Ejabberd makes some weird decisions storing the data. Basically we want to store as much shared data as possible in ram AND disc so that the slave node can start ejabberd on its own because it has a copy of everything on disc. Of course some tables are not required to start the jabber server. Session or s2s data for example can be stored in ram only. The important thing is to elliminate or at least reduce the number of “remote copy” entries since this could block failover. Some memory eating things like offline_msg can be ignored if there is not enough ram to begin with. I found it very handy to use the web_admin module to go through the replication type of each table, here is a reminder on how to tunnel it through to your client (we do not forward port 5280 here):

ssh vm-jabber0 -L 8000:localhost:5280 # and fire up a browser to <a href="http://localhost:8000/admin">http://localhost:8000/admin</a>

First go through the master table and make sure every table has a sane type

  • you need a disc copy if the nodes hast to start on its own!

Here are the basic rules we implemented:

  • default to RAM and Disc copy on both ends
  • if the table is machine dependent use RAM copy on both ends
  • use Disc only copy only for memory eating tables on the master and Remote copy on the slave

That's it. Good Luck :)

Tags: ejabberd jabber