Adding an edge between two existing vertexes decreases throughput by ~80%.
We are inserting vertexes into a graph using Java.
We chose to use the tinkerpop gremlin API to be graph agnostic.
Our code looks approximately like this:
Graph graph = new OrientGraphFactory(databaseURL, user, password).getTx();
graph.tx().open();
GraphTraversal<Vertex, Vertex> vertexTraversal = graph.traversal()
.addV("Event");
Vertex newVertex = vertexTraversal.next();
graph.tx().commit();
graph.tx().open();
event.getRelations()
.forEach(rel -> {
Vertex relatedVertex = EntityVertex.getOrAddVertex(graph, rel).getVertex();
newVertex.addEdge(
rel.getName(),
relatedVertex,
"created_at", LocalDateTime.now().toString());
});
graph.tx().commit();
We have indexes on all fields used to query the graph.
If we disable the call to .addEdge(), the throughput is pretty high(1200/sec). Notice that we still loop through each relation and create or get a Vertex.
If we keep it, the throughput drops to just over 200/sec.
Is there a faster way to insert edges to the graph than the code above?
We are running a remote OrientDB db server v3.0.27 and using tinkergraph-gremlin v 3.4.4.
The following are the server configs:
orientdb:
image: orientdb:3.0.27
command: >
/orientdb/bin/server.sh -Xmx800m -Xms800m -Dstorage.diskCache.bufferSize=7200 -Dstorage.useWAL=false
-Dobject.saveOnlyDirty=true -Dtx.useLog=false -Drecord.downsizing.enabled=false -Dstorage.wal.syncOnPageFlush=false
-Ddb.pool.min=10 -Ddb.pool.max=6000 -DridBag.embeddedToSbtreeBonsaiThreshold=-1 -Dclient.channel.minPool=15 -Dclient.channel.maxPool=6000
-Dstorage.openFiles.limit=4096