When building Confluence Data Center on AWS, I was wondering how Confluence Data Centre manages the index file. As we run Confluence cluster in auto-scaling group, the Confluence nodes come and go (not that frequent though, as Confluence is not good at dynamic scaling. It is more schedule based scaling). The newly launched instance gets a new local home folder which is on the EBS storage. My concern was if the dataset is very big, Confluence will takes a long time to rebuild the index when it is the only node in the cluster, so it impacts the Confluence startup time.
I figured it out after talked to a technical person from Atlassian. This what he explained:
Hazelcast plays a role in this scaling up and down, alongside the database and the shared home folder location. Index Snapshots are performed on a schedule and written to the shared home (EFS in our case). So when a new node is added to the cluster a few things happen:
– Confluence will check if the index is current
– The Index Snapshot is tried first (The default timeout is 120 seconds. If the snapshot file is big, you should consider to increase the timeout in JVM: confluence.cluster.index.recovery.generation.timeout
– Hazelcast is tried second (copy from online/existing node)
– If these two options fail, a full re-index will occur
– If an existing node fails and recovers, the Journal is used to correct the index
I also asked if Jira Data Center follows the same logic. Here is answer:
In a sense, yes, but the clustering is quite different. This uses EHCACHE & RMI, which is more like a distributed queue. There’s some similar logic, but it works around a more eventual consistency which will catch up over time, but borrow from all nodes.
Important to consider here is that this is very disk intensive vs memory for confluence. Faster disk and a good mix of CPU favors Jira well. The more nodes added to a Jira cluster, the more intensive this type of clustering becomes. Sweet spots are always around 2-3 nodes for the sizing you’re looking for.