Follow

EnSight in parallel

EnSight can be run in parallel using both multiple threads and parallel cores. 

Threads: nothing is required by the user's side. EnSight will automatically detect the number of threads available at start-up, and use up to 8 threads (4 for version 10.0 or older) with a standard license, unlimited threads with a gold (or higher) license. 

Parallel cores: EnSight has the ability of reading the data and executing calculations on it in parallel through the SOS (Server of Server) capability. The first step is to assign to each core (or Server) a section of the data. There are four possible ways to do this:

1. Manual decomposition. You move the separate spatially decomposed data onto separate platforms and use a .sos file to point to them. This is very rare and requires that the user already has the data spatially decomposed.

2. Reader decomposition. Some readers automatically decompose the data. CTH, Exodus, Plot3d, etc.

3. Server (or auto) decomposition. This happens at read-time on the EnSight server.

4. External decomposition (or partitioned data). This is similar to the first option, but is done using the partition101 routine (that can be found in the installation directory of EnSight) on Case Gold files.

In this article we will discuss the two most common options, option 3 and 4.

The first difference between the two options is how they decompose the dataset. External decomposition is run outside of EnSight, prior to your EnSight session and uses partition101 to divide the single case file into multiple case files. Then EnSight is run and the case files are loaded separately into servers. Server decomposition occurs at run-time and divides the elements in a single case among the servers when the data is loaded. Server decomposition uses a connectivity decomposition via the connectivity matrix, meaning that it will assign an element to a server based on its position in the element connectivity list. External decomposition, on the other hand, uses a geometry (or spatial) decomposition, meaning that it will slice the geometry into chunks along the x, y, and z axes. 

Example: let's say you have a dataset with 3000 elements, from element ID 1 to element ID 3000. Let's say that the elements are inside a cubic mesh with x_min = 0 and x_max = 3.0.  If you apply server decomposition using 3 Servers, elements with ID between 1 and 1000 will be assigned to Server1, elements with ID between 1001 and 2000 will be assigned to Server2 and finally elements with ID between 2001 and 3000 will be assigned to Server3. If, on the other hand, you apply external decomposition along the x axis, Server1 will contain all elements where Coordinates[X] is between 0.0 and 1.0, Server2 will contains all elements with Coordinates[X] between 1.0 and 2.0 and finally Server3 will contains all elements with Coordinates[X] between 2.0 and 3.0.

Depending on how your dataset is designed and what kind of analysis you want to apply on it, one method or the other may be more useful.

The second major difference between the options is that External decomposition pays the cost of decomposition up-front: you need to physically break the dataset into smaller datasets before loading them into EnSight. On the other hand, Server decomposition does not require any extra-work prior to opening EnSight, but Server decomposition occurs each time you reload your data for all data types. Also, if your data is changing connectivity, the decomposition occurs every change of timestep. This implies that, if you have a dataset that you need to analyze in EnSight multiple times, you probably want to consider option 3 as it will save you the computational time of calculating the decomposition each time you open the dataset. On the other hand, if you need to analyze a dataset only once, you probably want to consider option 4, as it will have similar time performance without requiring any extra step from the user's side.

A lot of performance depends on how many cells each server has. Too few and you spend too much time communicating. Too many and the server is overloaded. As you add more servers, you get diminishing performance gains. How many servers do you need to use to get optimal performance? Unfortunately there isn't a unique answer for this, as it must be determined empirically depending on your cell type. SOS can actually slow your performance down, if you assign too few cells to each server.

So how to proceed?  One way is to choose your number of servers based on memory: plan to load sufficient data to fill each server’s memory half full.  If you are not the only one using the servers, then consider existing loading when adding up the number of servers that you need.
First, assess your data. How many cells are in your data?  For Hex8 cells, use 200 MB/million cells. Use somewhat less for simpler cells and more for complicated cells. For polyhedrals use 1.2 GB/million cells.  
 
If you have a large memory machine, try loading the data on your machine just running EnSight (which will run both the client and one server on your local machine).   If you can load the dataset onto your machine and have sufficient memory to spare, and the time required to change timesteps with several variables active is reasonable then you do not need SOS.  If it is slow to change time, then try running client server, with your server running on the machine where the data is located. If that doesn’t speed time change sufficiently, then you can try SOS.
 
If your machine is sluggish when rotating the model, then try to save some client memory by loading the larger parts as non visual if they represent Fluid parts.  This will keep the polygons off your client and save client memory and perhaps speed up your client.  If your graphics are still too slow consider upgrading your hardware graphics card.
 
On the other hand, if changing timesteps is taking too long, then add two servers (on two different machines) and see if that speeds up changing timestep and by how much. This will give you an idea of the performance gain by adding four servers.
 
Example: I have a dataset with 64 million hex cells. I calculate this to require 12.8 GB of computer memory.  My local machine has 70 GB. It takes 10 min just to load the data. I close EnSight. I try to move the 12 GB file to my local disk drive. It takes 30 min just to move the data file. This tells me that I have a very slow network. So, I start a server on the linux machine which has 24 GB and is colocated at the same point as the data, and then start a client on my local machine. The data loads in a minute and takes about 1 minute to change each timestep. I’m happy and I stop here.
 
However, if I have only 12 GB on each linux machine, then it will not fit on one linux machine and it’s too slow over my network. So, I restart EnSight and load the data with 2 servers. It takes 10 min to load the data (because it is doing the decomposition on each server).  Then, it takes about 40 seconds to change each timestep, because the data geometry is static and doesn’t need to be decomposed each timestep. I’m happy and I stop here.

Finally, ideally the user should be running his/her Servers where the data is located (in order to avoid delays due to i/o on a network), and the Client at the desktop with a high-end CAE/CAD hardware graphics card.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments