Serverless Autoscaling for Jitsi on Azure

As many businesses during the pandemic decentralised their operations, conferencing tools exploded in popularity. Alongside the well-known paid players, some open source underdogs emerged as victors. At Airwalk we needed to build a self-hosted, scalable, customisable and secure solution to meet the strict needs of our highly-regulated environments while maintaining control over cost. It turns out the tool that ticked all those boxes was Jitsi Meet. Although deploying a solution on Azure is nowadays a piece of cake, there’s still some caveats to a truly scalable solution, follow along to see how we built it.


What’s Jitsi?

Jitsi Meet is a conferencing solution that combines multiple components, a bare-bones setup of the solution includes the following:

  • Prosody, a well known Extensible Messaging and Presence Protocol (XMPP) server
  • Jicofo, a service to manage the mapping between users and conferences
  • Jitsi Videobridge (JVB) as the video relay backend
  • Jitsi Meet as the web client to gather all the functionality under a single webapp

Aside from the above, it offers the possibility of adding plug-ins and enabling additional external components to add capabilities like transcription, mail integration and file sharing.

How does it scale?

At the time this solution was built, Jitsi was advertised as scalable, however that’s not out of the box, there’s some additional setup that needs to happen for that to be the case:

  • We wanted not only a scalable solution, but also to benefit from a high availability setup. At least two stack replicas (shards) continuously active across different availability zones.
  • It requires of HAProxy fronting the web service with the stick tables feature configured. This ensures users are redirected to the correct shard based on the room id passed as a URL parameter when a user joins a specific conference in a specific shard.
  • The videobridges need a capability to scale up and down based on network load without interrupting sessions (a conference cannot span multiple videobridges/shards) and registering themselves with Jicofo/Prosody (luckily this is almost out-of-the-box with the latest JVB versions and the correct prosody config).

HA and infrastructure setup

As most cool kids nowadays, a great deal of our infrastructure lives on Kubernetes, the vast majority of it managed through Helm and Terraform. Jitsi provides Docker images of all the components, with guides to deploy with docker-compose, Helm and on VMs. We initially decided to build the entire system on k8s, however, once we started considering the potentially heavy network requirements of the JVBs when used by hundreds of users, we took a hybrid approach and split the setup across AKS and Azure VM Scale Sets (VMSS). Just another Terraform module 😉.

Simplified hybrid architecture

We chose Standard_F8s_v2 instances in the VMSS that support a max of 12000 Mbps/NIC and load testing with jitsi-meet-torture (over 30 video participants per call with 3 simultaneous conferences) never saturated the machines, with CPU usage at 80% max under peak load. We also applied some configuration settings to limit frame rate and resolution, as by default it’s configured to 720p.

Worth noting that each VMSS shard is hooked up to zone specific public IP prefixes (global prefixes don’t yet work with VMSS) so that each JVB gets assigned a public IP on launch and can directly interact with the users via UDP or TCP as a fallback (in restricted environments). Initial testing of the TCP/443 fallback resulted in persistent failures when forcing the system to switch from UDP to TCP mid-conference. Jitsi provides configuration details in this guide to properly configure the desired behaviour, however, we banged our heads without much success until we switched our base image from OpenJDK 8 to 11 and upgraded to the latest jitsi-videobridge release.

Scaling approach

The kubernetes side of things is fairly straightforward, we had a multi availability-zone cluster to start with, deployed the Jitsi services per shard with a node affinity to the zone we were interested in and defined the components in a HPA to handle scaling load. Internal services for Prosody and Jicofo were exposed via NodePorts (internal load balancer could also do) to enable connection from the JVBs.

The VMSS setup is where the tricky bit resides. Base Ubuntu images were built with a specific version of the jitsi-videobridge component and a cloud-init triggered set of scripts to perform the following on boot:

  • Extract the machine’s own public IP to register it with Prosody.
  • Configure the XMPP domain, port, secret and certificate.
  • Enable the Colibri API to allow for log collection and monitoring.
  • Start the JVB service and connect to the private service in AKS in order to register itself with Prosody.

Scale-out was setup through a Azure Monitor rule set based on network load. When the network load on the machine exceeded 70% of the available bandwidth (for the specific NIC), a new VM would be added to the pool. Minimum machine count was set to 1.

As a fallback, a CPU based rule was also set.

Scale-in is where things get interesting. In order to provide a smooth service, we wanted to avoid shutting down instances that had active conferences, and also avoid a possible split-brain problem resulting from running less than 3 instances at any given time should the scaling logic had been handled from within the machine. Given the JVB exposed conference stats through the Colibri API, we exposed the internal API service to an Azure Function App hosted in the VNET coupled to a simple state management solution (to keep tabs on machine age and perform usage analytics).

Scale-in / out solution.

The serverless function checks on a cron for idle JVBs, taints and destroys the machine if the following set of conditions are met per machine/shard:

healthy_jvb_count > 1
AND jvb.active_conferences == 0
AND jvb.uptime_minutes > 55
AND oldest(jvbs) == jvb

The rule enforcing destruction based on machine age was coupled to an Azure Monitor nightly rule that would at 4am increase the minimum count +1 and reset it to 1 a few minutes later, force-recycling the machines from older to newer and ensuring the machines would always be running on a new patched base image if a new one had been made available.


Jitsi has proven to be a well integrated suite of projects. The extensibility and configurability of the solution enable integrating with most enterprise systems and building a fully custom experience with your own frontend via its iframe client embedding. The amazing community support around the open source project and its docs and forum have enabled us to build a great platform that meets all of our strict needs.



Written by Adrian Fernandez, Senior Consultant, Engineering.

Contact us if you’d like to discuss this article further with Adrian.