Parallel Computations Efficiency: Abaqus, Ansys and Simmakers

Hardware that is based on parallel computing architecture has recently been gaining increasing popularity in high performance computing.

The efficiency of parallel processing hardware in engineering problem solving such as the computer simulation of physical processes is not directly dependent on the number of processors: four CPU cores do not in fact provide a fourfold speed increase in solving complex engineering problems over one CPU core. Similarly, the transfer of computation to graphics cards with hundreds of cores cannot provide a hundredfold increase in speed.

First of all, parallel computation acceleration is limited by computational algorithms; running algorithms with a low degree of parallelization on supercomputers and high-performance workstations is irrational. The notion of "efficiency of parallelization" is explained by Amdahl's law, according to which if at least 1/10 of the program is executed sequentially, then the acceleration cannot be increased beyond 10 times the original speed regardless the number of cores employed.

Telling examples of the limited effectiveness of algorithm parallelization for solving engineering problems are provided in the relatively weak results of worldwide leaders in computer-aided engineering (CAE) software - Abaqus and Ansys.

In SIMULIA's Abaqus transfer of computations from 2 CPU cores to 4 CPU cores, the speedup factor was 1.7 times. Transferring these algorithms to CUDA architecture with 448 cores of Nvidia Tesla C2075 sharing 4 CPU cores resulted in an increase of only 3.5 times [Source].


*SIMULIA’s Abaqus performance acceleration when transferring from 2 to 4 CPU cores*	*SIMULIA’s Abaqus performance acceleration when using 4 CPU cores and 448 GPU cores*

Ansys also achieved parallelization efficiency of algorithms commensurate with Abaqus. When increasing the number of CPU cores from two to eight, the processing speed of the Ansys Mechanical 15.0 package tripled. Sharing between 2 CPU cores and the 2880 cores on the Nvidia Tesla K40 video accelerator was 3.5 times faster than the 2 CPU cores alone [Source].

Ansys Mechanical 15.0 performance acceleration with parallel processing

Ansys Mechanical 15.0 performance acceleration with parallel processing

The mathematical solvers embedded in the «Frost 3D Universal» software demonstrate the superior computational algorithm parallelization and use of parallel architecture in terms of efficiency.

A computer model of production wells was used to compare the parallel computing speed on CPUs and GPUs.

Soil thermal field distribution over 5 years in the XZ plane

The hardware was selected from widely available user computing resources such as the Intel Core i7 CPU and the Nvidia Titan graphics card.

Intel Core i7-3770	Nvidia GeForce GTX Titan

Specifications	Specifications
Cores: 4	Cores: 2688
Base Clock: 3.4 GHz	Base Clock: 836 MHz
Boost Clock: 3.9 GHz	Boost Clock: 876 MHz
Graphics Card Power: 77 W	Graphics Card Power: 250 W
Recommended price: $305	Recommended price: $1080

The three-dimensional model was discretized with different spatial steps. As a result, meshes with the following number of nodes were obtained: ~2 million, 4 million, 8 million and 16 million. Each computational mesh was computed on 1 core of Intel Core i7, 4 cores of Intel Core i7 and the GeForce GTX Titan video card. Below there are computational results for the two-year simulation forecast.

Number of nodes	Processing time, s			Speedup factor
Number of nodes	1 core of Intel Core i7	4 cores of Intel Core i7	GeForce GTX Titan	4 cores of Intel Core i7 to 1 core	GeForce GTX Titan to 4 cores of Intel Core i7	GeForce GTX Titan to 1 core Intel Core i7
2,000,000	9.62 h (34,632 s)	5.97 h (21,504 s)	34.11 min (2,047 s)	1.61x	10.50x	16.91x
4,000,000	18.16 h (65,388 s)	10.63 h (38,287 s)	57.65 min (3,459 s)	1.70x	11.06x	18.90x
8,000,000	34.33 h (123,600 s)	19.22 h (69,221 s)	1.62 h (5,844 s)	1.78x	11.84x	21.14x
16,000,000	61.14 h (220,104 s)	32.98 h (118,736 s)	2.62 h (9,456 s)	1.85x	12.55x	23.27x

Computation acceleration chart

The performance of 1 core of Intel Core i7 represents an speedup factor of 1x

It should be noted that, when comparing the computational speed on multi-core architectures, the following model parameters have a significant impact on the acceleration:
- number of materials;
- the number of boundary conditions;
- mesh uniformity;
- multiplicity of mesh nodes and computational cores;
- conformity of thermo-physical properties of materials.
It means that the maximum acceleration on parallel architectures could be achieved on the simplest models with a uniform computational mesh and the minimum number of materials and boundary conditions. In practice, however, computational models are more complicated, that’s why our speed analysis was based on the production wells simulation model for more objective results.

Conclusions:

The use of computational algorithms with a low degree of parallelization is inefficient on multi-core processors and video accelerators.
The major engineering analysis software packages on the market contain a high degree of serial code, significantly hampering the acceleration potential of parallel computing. This is largely due to the implementation of now dated mathematical solver algorithms, developed when there were no technologies such as CUDA and therefore not designed to take advantage of these parallelization technology enhancements.
Mathematical algorithms in the latest generation CAE software are designed basing on parallel processing technology. It allows achieving speedup by a factor of ten by transferring computation from one CPU core to multi-core graphics accelerators.

Parallel Computations Efficiency: Abaqus, Ansys and Simmakers

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112