I have been working on the Confluence space archiving project recently. Here is my design which is already in production. And it works like a charm 🙂
I will explain a bit deeper into the two major automations:
Confluence Online Archiving Automation
The Online archiving does 3 things:
- Set the space status to Archive. This is to exclude the space from search by default
- Remove all users access from the space. This is to prepare for the offline archive.
- Add ‘online-archived’ category to the space. This is where offline archiving to find the candidate spaces.
Confluence Offline Archiving Automation
The offline archiving automation checks the spaces that are categorised as ‘online-archived’, if the last modify date is older than 6 months then it will kick off the offline archiving – exporting the space as XML zip file and upload it to a S3 bucket. The S3 bucket has a life cycle policy configured which will move the files that are older than 6 months into Glacier.
I have open sourced the space export tool, check it out if you are interested – https://www.npmjs.com/package/confluence-space-exporter