Parallel Processing

The batch-queue processor allows images to be processed in parallel in a high-throughput manner. An image along with the processing workflow (a set of scripts) defines a batch. Processing batches are added to the processor queue in the order in which they are issued and processed with the policy of ‘first come first served’, i.e., the oldest batch is processed first. Alternatively, batches can be added to the top of the processing queue if the user wants to see the result for the most recently recorded image as soon as possible.

Processing batches can be submitted by the Import Tool so that a set of tasks such as unpacking, Fourier cropping, drift correction, CTF fitting, particle picking, and so on, are carried out on each freshly imported image. Processing batches can also be submitted manually from the Project Library by selecting a number of images and launching specific tasks for all of them. This is required, e.g., if already imported and processed images need to be processed again with different parameter settings. The processing order of scheduled tasks can be changed manually, if necessary.

The scripts that define a workflow can be selected from all scripts available at the image level. These scripts are divided into different categories, such as “Prepare Stack”, “Drift Correct”, “Calculate Statistics”, etc. The selected scripts are executed in a specific order that cannot be changed manually. This order is defined by the processing logic that a workflow should observe. For example, the CTF is estimated on the drift-corrected image stack, thus the CTF script cannot be run before drift correction. All calculated values are stored in the image-level parameter file.

Batches can be concurrently dispatched to the resources available on the system. If this setting is used, different processors are internally created and each processor is supplied with a different batch. When a processor starts to run the scripts from its batch and processes a particular image, it is marked as ‘busy’, and only becomes available again when all the scripts in the batch have been run. A log table that can be sorted by the processor id, image id, or the log time is maintained, allowing the processing progress to be monitored and any errors to be detected.

A typical processor looks like below: