To put it briefly, has anyone had success running the community version in production at scale? If so, do they know of a guide that describes how to do so?
We have been running in distributed mode and I’ve read through the official documentation like this guide: https://orientdb.com/docs/2.2.x/Distributed-Configuration.html, but we keep running into different stability issues. Usually shutting down the entire cluster and starting one node at a time will fix it, but that still means the system is down for up to an hour. We’re running the docker image 2.2.37 with 3 nodes and the database is about 18 gb. We primary use the rest api.
Here are some issues we’ve been unable to solve:
- If a node is starting up, then a 2nd is in backup mode, and the 3rd doesn’t have a quorum for writes. So if any node has issues, the entire database is effectively down.
- If a node goes down once it’s been added to the cluster, it will usually not come back up without restarting the whole cluster.
- One node (that we don’t write to) keeps getting corrupted in a way that it cannot rejoin. We have to restore from another node or a backup.
- Crashes and memory errors sometimes corrupt the database and it needs to be exported and imported to recover (especially in our standalone testing databases).