EnSight can be run in parallel using both multiple threads and parallel cores.
Threads: nothing is required by the user's side. EnSight will automatically detect the number of threads available at start-up, and use up to 8 threads (4 for version 10.0 or older) with a standard license, unlimited threads with a gold (or higher) license.
Parallel cores: EnSight has the ability of reading the data and executing calculations on it in parallel through the SOS (Server of Server) capability. The first step is to assign to each core (or Server) a section of the data. There are four possible ways to do this:
1. Manual decomposition. You move the separate spatially decomposed data onto separate platforms and use a .sos file to point to them. This is very rare and requires that the user already has the data spatially decomposed.
2. Reader decomposition. Some readers automatically decompose the data. CTH, Exodus, Plot3d, etc.
3. Server (or auto) decomposition. This happens at read-time on the EnSight server.
4. External decomposition (or partitioned data). This is similar to the first option, but is done using the partition101 routine (that can be found in the installation directory of EnSight) on Case Gold files.
In this article we will discuss the two most common options, option 3 and 4.
The first difference between the two options is how they decompose the dataset. External decomposition is run outside of EnSight, prior to your EnSight session and uses partition101 to divide the single case file into multiple case files. Then EnSight is run and the case files are loaded separately into servers. Server decomposition occurs at run-time and divides the elements in a single case among the servers when the data is loaded. Server decomposition uses a connectivity decomposition via the connectivity matrix, meaning that it will assign an element to a server based on its position in the element connectivity list. External decomposition, on the other hand, uses a geometry (or spatial) decomposition, meaning that it will slice the geometry into chunks along the x, y, and z axes.
Example: let's say you have a dataset with 3000 elements, from element ID 1 to element ID 3000. Let's say that the elements are inside a cubic mesh with x_min = 0 and x_max = 3.0. If you apply server decomposition using 3 Servers, elements with ID between 1 and 1000 will be assigned to Server1, elements with ID between 1001 and 2000 will be assigned to Server2 and finally elements with ID between 2001 and 3000 will be assigned to Server3. If, on the other hand, you apply external decomposition along the x axis, Server1 will contain all elements where Coordinates[X] is between 0.0 and 1.0, Server2 will contains all elements with Coordinates[X] between 1.0 and 2.0 and finally Server3 will contains all elements with Coordinates[X] between 2.0 and 3.0.
Depending on how your dataset is designed and what kind of analysis you want to apply on it, one method or the other may be more useful.
The second major difference between the options is that External decomposition pays the cost of decomposition up-front: you need to physically break the dataset into smaller datasets before loading them into EnSight. On the other hand, Server decomposition does not require any extra-work prior to opening EnSight, but Server decomposition occurs each time you reload your data for all data types. Also, if your data is changing connectivity, the decomposition occurs every change of timestep. This implies that, if you have a dataset that you need to analyze in EnSight multiple times, you probably want to consider option 3 as it will save you the computational time of calculating the decomposition each time you open the dataset. On the other hand, if you need to analyze a dataset only once, you probably want to consider option 4, as it will have similar time performance without requiring any extra step from the user's side.
A lot of performance depends on how many cells each server has. Too few and you spend too much time communicating. Too many and the server is overloaded. As you add more servers, you get diminishing performance gains. How many servers do you need to use to get optimal performance? Unfortunately there isn't a unique answer for this, as it must be determined empirically depending on your cell type. SOS can actually slow your performance down, if you assign too few cells to each server.
Finally, ideally the user should be running his/her Servers where the data is located (in order to avoid delays due to i/o on a network), and the Client at the desktop with a high-end CAE/CAD hardware graphics card.