VM setup using LXC

Using Ubuntu 16.04. Follow server setup first to configure the server.

Following the LXC Server Guide, but modified for VLAN bridging:

sudo apt install lxc

# Add a range of UIDs to be used by containers run by the "inst" user
# (see https://bugs.launchpad.net/serverguide/+bug/1571135 for details):
echo 'inst:200000:65536' | sudo tee -a /etc/subuid
echo 'inst:200000:65536' | sudo tee -a /etc/subgid

mkdir -p ~/.config/lxc
LXC_DEFAULTS=~/.config/lxc/default.conf
echo "lxc.id_map = u 0 200000 65536" > $LXC_DEFAULTS
echo "lxc.id_map = g 0 200000 65536" >> $LXC_DEFAULTS
echo "lxc.network.type = veth" >> $LXC_DEFAULTS
echo "lxc.network.link = br0" >> $LXC_DEFAULTS
echo "lxc.start.auto = 1" >> $LXC_DEFAULTS

# Limit RAM used by containers:
echo 'lxc.cgroup.memory.limit_in_bytes = 512M' >> $LXC_DEFAULTS
echo 'lxc.cgroup.memory.memsw.limit_in_bytes = 1G' >> $LXC_DEFAULTS

# Allow up to 60 unprivileged users to use br0 as a veth (bridged network) device:
echo "$USER veth br0 60" | sudo tee -a /etc/lxc/lxc-usernet

Disable the lxc-net dnsmasq to stop it binding to port 67, preventing our own dnsmasq from doing the same, by editing /etc/default/lxc-net and setting:

USE_LXC_BRIDGE="false"

Create /etc/dnsmasq.d/afnog with the following contents:

server=<your host IP address>
interface=br0
dhcp-range=196.200.219.20,196.200.219.80,12h

Edit /etc/default/grub and set:

GRUB_CMDLINE_LINUX_DEFAULT="cgroup_enable=memory swapaccount=1"

as seen here, otherwise the lxc.cgroup.memory.memsw.limit_in_bytes setting will not work, and will prevent you from starting any LXC containers.

For this to work, you also need to edit /etc/pam.d/common-session*, find the lines for pam_cgfs.so and add ,devices to the end, as described here, like this:

session optional        pam_cgfs.so -c freezer,memory,name=systemd,devices

Then sudo update-grub and sudo reboot to activate swap accounting.

Create a gold master guest image. According to the LXC documentation:

most distribution templates simply won’t work with (unprivileged containers). Instead you should use the “download” template which will provide you with pre-built images of the distributions that are known to work in such an environment.

lxc-create -t download -n debian8 -- --dist debian --release jessie --arch i386
lxc-ls --fancy
chmod a+x .local
chmod a+x .local/share
lxc-start --name debian8
lxc-attach --name debian8

Follow guest setup to configure the gold master guest.

Add the following line to /etc/sysctl.conf on the host, to ensure that the host kernel allows sufficient AIO handles for all the guests to run MySQL InnoDB:

fs.aio-max-nr = 1000000

Edit your user’s crontab and add the following line to make your containers auto-start:

@reboot lxc-autostart

Ensure that systemd on the host gives sufficient tasks to LXC containers started by at and cron, by editing /etc/systemd/system.conf, uncommenting DefaultTasksMax and setting it to at least 12288. See here for more information on this problem.

The following commands are useful for dealing with systemd and control groups:

You may also have issues logging into sshd with password authentication due to this issue. (This appears to have been fixed by 2017). The solution is to edit /etc/pam.d/sshd and /etc/pam.d/cron in the guest, find the line that says:

session    required     pam_loginuid.so

and change it to:

session    optional     pam_loginuid.so

And there is an issue with systemd, dbus and OOM_ADJUST in the container which requires us to comment-out the OOMScoreAdjust=-900 line in /lib/systemd/system/dbus.service. The dbus package is not installed by default, so this file does not exist, but installing certain applications (e.g. mutt) will install it and break the system. Even a simple apt install dbus hangs during package installation, and it can’t be safely removed either.

Since we cannot preconfigure dbus (because dbus.service is overwritten on package installation) and we cannot easily fix the problem in systemd (as the actual change that fixes the issue has not been identified), we prevent the installation of dbus in all containers instead, by creating /etc/apt/preferences.d/no-dbus.pref (in the gold image) with these contents:

# https://github.com/systemd/systemd/issues/719#issuecomment-223057529
# http://askubuntu.com/questions/75895/how-to-forbid-a-specific-package-to-be-installed

Package: dbus 
Pin: version *
Pin-Priority: -1

Package: dbus:amd64
Pin: version *
Pin-Priority: -1

Stop the container and make a lot of copies:

lxc-stop --name debian8 -t 30
NUM_PCS=40
LXC_ROOT=/home/inst/.local/share/lxc
for i in `seq 1 $NUM_PCS`; do
	hostname=pc$i
	domainname=$hostname.sse.ws.afnog.org
	lxc-copy --name debian8 --newname $hostname
	macaddr=`openssl rand -hex 4 | sed -e 's/^\(..\)\(..\)\(..\)\(..\).*/52:56:\1:\2:\3:\4/'`
	echo "lxc.network.hwaddr = $macaddr" >> $LXC_ROOT/pc$i/config
	echo $domainname > $LXC_ROOT/pc$i/rootfs/etc/hostname
done
lxc-autostart

To run a command on all containers (for example hostname):

for i in `seq 1 $NUM_PCS`; do
	hostname=pc$pc
	domainname=$hostname.sse.ws.afnog.org
	lxc-attach -n $hostname -- hostname
	lxc-attach -n $hostname -- sh -c "echo $domainname > /etc/hostname"
done

Give them all unique IP addresses by doing this with:

lxc-attach -n $hostname -- sed -i -e "s/196.200.219.100/196.200.219.$[$i+100]/" /etc/network/interfaces

And then reboot all the containers:

lxc-autostart -r

If you need to destroy all the containers, so you can recreate them, you can do this:

for i in `seq 1 $NUM_PCS`; do lxc-stop --name pc$i -k; lxc-destroy --name pc$i; done

Optional: time how long it takes for them all to start completely (enough to get an IP address):

lxc-autostart -k -t 5
lxc-autostart & time while lxc-ls --fancy | awk '{ print $5 }' | grep -q -- -; do sleep 1; done

And try to reduce it with unionfs mounts (experimental). This caused some issues: in particular locking in /var/mail did not work with Dovecot, and modifying the underlying filesystem after creating the union mounts (removing a package) resulted in inconsistencies between the package database and the files visible in the cloned PCs, so best avoided. It’s probably worth checking out overlayfs and btrfs subvolumes for future deployments.

lxc-autostart -k -t 5
LXC_ROOT=/home/inst/.local/share/lxc
for i in `seq 1 $NUM_PCS`; do
	hostname=pc$pc
	domainname=$hostname.sse.ws.afnog.org
	sudo umount $LXC_ROOT/$hostname/rootfs
	test -d $LXC_ROOT/$hostname/rootfs.orig || mv $LXC_ROOT/$hostname/rootfs{,.orig}
	mkdir -p $LXC_ROOT/$hostname/rootfs{,.rw}

	echo "none $LXC_ROOT/$hostname/rootfs" \
		"aufs br=$LXC_ROOT/$hostname/rootfs.rw=rw:$LXC_ROOT/debian8/rootfs=ro 0 0" \
	| sudo tee -a /etc/fstab

	sudo mount $LXC_ROOT/$hostname/rootfs
	sudo sed -i -e "s/100/$[100+$i]/" $LXC_ROOT/$hostname/rootfs/etc/network/interfaces
	echo $domainname | sudo tee $LXC_ROOT/$hostname/rootfs/etc/hostname

	# Dovecot has problems locking the mailbox when hosted on AUFS. Work around it
	# by mounting a ramdisk for /var/mail in all containers:
	echo 'none /var/mail tmpfs defaults 0 0' | sudo tee -a $LXC_ROOT/$hostname/rootfs/etc/fstab
	lxc-start -n $hostname
done
lxc-autostart