-
Notifications
You must be signed in to change notification settings - Fork 13
Writing documents in batches
The BatchWriter library was created primarily for applications using marklogic-spring-batch. But it can be used in any environment. It provides the following features:
- Uses Spring's TaskExecutor library for parallelizing writes
- Supports writes via the REST API, XCC, or DMSDK
This capability is similar to what MLCP and DMSDK use under the hood, but it's intended to be usable in any context, with either the REST API or XCC as the API for connecting to MarkLogic.
Via Spring's TaskExecutor library, you can essentially throw an infinite number of documents at this interface. The library will default to a sensible implementation of ThreadPoolTaskExecutor, but you can override that with any TaskExecutor implementation you like.
Note that you still need to do the batching yourself (which Spring Batch handles nicely) - i.e. if you pass in a list of 1 million documents, this library will try to write them all in one request. The intent of this library is to write each batch asynchronously with a round-robin approach to which host the documents are written to - breaking up a very large list of documents into smaller lists is outside the scope of it (again, since it was first written within the context of Spring Batch, which handles batching).
Here's a sample using two DatabaseClient instances:
// This is all basic Java Client API stuff
DatabaseClient client1 = DatabaseClientFactory.newClient("host1", ...);
DatabaseClient client2 = DatabaseClientFactory.newClient("host2", ...);
DocumentWriteOperation doc1 = new DocumentWriteOperationImpl("test1.xml", ...);
DocumentWriteOperation doc2 = new DocumentWriteOperationImpl("test2.xml", ...);
// Here's how BatchWriter works
BatchWriter writer = new RestBatchWriter(Arrays.asList(client1, client2));
writer.initialize();
writer.write(Arrays.asList(doc1, doc2));
writer.waitForCompletion();