I am trying to use apoc.periodic.iterate to reduce heap usage when doing very large transactions in a Neo4j database. I've been following the advice given in this presentation. BUT, my results are differing from those observed in those slides.
First, some notes on my setup:
- Using Neo4j Desktop, graph version 4.0.3 Enterprise, with APOC 4.0.0.10
- I'm calling queries using the .NET Neo4j Driver, version 4.0.1.
- neo4j.conf values:
- dbms.memory.heap.initial_size=2g
- dbms.memory.heap.max_size=4g
- dbms.memory.pagecache.size=2g
Here is the cypher query I'm running:
CALL apoc.periodic.iterate(
"UNWIND $nodes AS newNodeObj RETURN newNodeObj",
"CREATE(n:MyNode)
SET n = newNodeObj",
{batchSize:2000, iterateList:true, parallel:false, params: { nodes: $nodes_in } }
)
And the line of C#:
var createNodesResCursor = await session.RunAsync(createNodesQueryString, new { nodes_in = nodeData });
where createNodesQueryString
is the query above, and nodeData
is a List<Dictionary<string, object>>
where each Dictionary has just three entries: 2 strings, 1 long.
When attempting to run this to create 1.3Million nodes I observe the heap usage (via JConsole) going all the way up to the 4GB available, and bouncing back and forth between ~2.5g - 4g. Reducing the batch size makes no discernible difference, and upping the heap.max_size causes the heap usage to shoot up to almost as much as that value. It's also really slow, taking 30+ mins to create those 1.3 million nodes.
Does anyone have any idea what I may be doing wrong/differently to the linked presentation? I understand my query is doing a CREATE whereas in the presentation they are only updating an already loaded dataset, but I can't imagine that's the reason my heap usage is so high.
Thanks