Implementation of the Douglas - Rachford Scheme with Cuda Technology

1. INTRODUCTION

This article describes the performance of calculations on video cards (using CUDA) for modeling physical processes and phenomena based on the solution of the three-dimensional heat equation via the Douglas-Rachford scheme (ADI method). A comparative analysis of the calculation speeds of the central (CPU) and graphics (GPU) processors was conducted.

2. DOUGLAS-RACHFORD SCHEME DESCRIPTION

For the mathematical simulation of heat distribution, accounting for filtration and phase transition, the following heat equation is applied:

(1) $\begin{equation*} C_{ev}\frac{\partial T}{\partial t}+\bigtriangledown{(-k\bigtriangledown{T})}+C_w\vec{v}\bigtriangledown{T}-Q=0 \end{equation*}$

A description of the coefficients is given in Table 1.

Table 1: The heat equation coefficients

Coefficients	Description
$C_{ev}=C-\rho_{ice}L{\partial w_{ice}}/{\partial T}$	effective heat capacity
$w_{ice}=1-1/(exp(2A_{w}(T_{p}-T))+1)$	frozen water content
	fitted coefficient
	temperature of phase transition
	temperature
$C=C_{m}w_{ice}+C_{t}(1-w_{ice})$	heat capacity of the ground
	heat capacity of frozen ground
	heat capacity of thawed ground
	time
$k=k_{m}w_{ice}+k_{t}(1-w_{ice})$	effective conductivity of ground
	thermal conductivity of frozen ground
	thermal conductivity of thawed ground
	water heat capacity
$\vec{v}$	filtration velocity vector
	heat source

Equation (1) is a partial differential equation of the parabolic type. Since the computational domain of equation (1) is arbitrary, numerical methods are used to solve it.
Alternating direction schemes form a class of numerical methods for parabolic partial differential equation solving [1]. The Douglas-Rachford scheme is a widely used method [1, 2], briefly described below.

Let’s consider a homogeneous heat equation in three dimensions, which is a simplified version of equation (1):

(2) $\begin{equation*} \frac{\partial u}{\partial t}=\frac{\partial^{2} u}{\partial x^{2}}+\frac{\partial^{2} u}{\partial y^{2}}+\frac{\partial^{2} u}{\partial z^{2}} \end{equation*}$

The computational domain is represented by a parallelepiped with the spatial mesh $\omega_{l}=\{(x_{i},y_{j},z_{k}):i=1,...,N_{x};j=1,...,N_{y};k=1,...,N_{z}; \}$ , and the time mesh $\omega_{\tau}=\{t_{i}:i=1,...,n\}$ .

$H^{x}=\{h^{x}_{i}=x_{i+1}-x_{i}:i=1,...,N_{x}-1\}$ , $H^{y}=\{h^{y}_{j}=y_{j+1}-y_{j}:j=1,...,N_{y}-1\}$ , $H^{z}=\{h^{z}_{k}=z_{k+1}-z_{k}:k=1,...,N_{z}-1\}$

– spatial mesh steps, and

$H^{t}=\{\tau_{i}=t_{i+1}-t_{i}:i=1,...,n-1\}$

– time steps.

That temperature field value at time $t_{s}\in{\omega_{\tau}}$ is known, and at node $(x_{i},y_{j},z_{k})$ the temperature is equal to $y^{s}_{i,j,k}$ . To determine the temperature field at the next time layer, the approximation equation (2) is used with finite-difference equations (3)-(5) due to the Douglas-Rachford scheme, as follows:

(3) $\begin{equation*} \frac{y^{s+1/3}_{ijk}-y^{s}_{ijk}}{\tau}=\Delta_{x}y^{s+1/3}_{ijk}+\Delta_{y}y^{s}_{ijk}+\Delta_{z}y^{s}_{ijk}, j=1,...,N_{x};(j,k) - fix; \end{equation*}$

(4) $\begin{equation*} \frac{y^{s+1}_{ijk}-y^{s+2/3}_{ijk}}{\tau}=\Delta_{z}y^{s+1}_{ijk}-\Delta_{z}y^{s}_{ijk},i=1,...,N_{y};(i,k) - fix; \end{equation*}$

(5) $\begin{equation*} \frac{y^{s+2/3}_{ijk}-y^{s+1/3}_{ijk}}{\tau}=\Delta_{y}y^{s+2/3}_{ijk}-\Delta_{y}y^{s}_{ijk},k=1,...,N_{z};(i,j) - fix; \end{equation*}$

where

$\Delta_{x}u_{ijk}=\frac{1}{h_{i}^{2}}u_{i+1jk}-\left(\frac{1}{h_{i}^{2}}+\frac{1}{h_{i-1}^{2}}\right)u_{i+1jk}+\frac{1}{h_{i-1}^{2}}u_{i-1jk}$ ,
$\Delta_{y}u_{ijk}=\frac{1}{h_{j}^{2}}u_{ij+1k}-\left(\frac{1}{h_{j}^{2}}+\frac{1}{h_{j-1}^{2}}\right)u_{ij+1k}+\frac{1}{h_{j-1}^{2}}u_{ij-1k}$ ,
$\Delta_{z}u_{ijk}=\frac{1}{h_{k}^{2}}u_{ijk+1}-\left(\frac{1}{h_{k}^{2}}+\frac{1}{h_{k-1}^{2}}\right)u_{ijk+1}+\frac{1}{h_{k-1}^{2}}u_{ijk-1}$

– are the 2nd order finite difference derivatives along axes , и respectively. At every fixed pair of indices along the and axes (the number of sets is $N_{y}\times{N_{z}}$ ), $N_{x}$ of equations (3) are gathered together and form a tridiagonal system of linear equations. To solve a system of linear equations with a tridiagonal matrix algorithm requires O(N_x) operations.

Since for each fixed pair of indices (j,k) ((i,k), (i,j)) of equation (3) ((4) and (5), respectively) are solved independently of each other, then we can say that the Douglas–Rachford scheme has natural parallelism, hence allowing for the effective parallelization of the scheme with CUDA.

3. BENCKMARKS

Calculations have been performed to demonstrate the performance acceleration for a software-implemented Douglas-Rachford scheme using CUDA technology, compared with the implementation of the scheme on a CPU.

Consider computational domain P = {(x,y,z): $0\leqslant{x}\leqslant{10}, 0\leqslant{y}\leqslant{10}, 0\leqslant{z}\leqslant{10}$ } (see Figure 1), which has the shape of a cube with a side of 10 meters. SCD (seasonal cooling device) is placed in domain parallel to axis . The lowest point of SCD S is located at the coordinates (5.0, 5.0, 1.0), and the upper point of SCD is at (5.0, 5.0, 9.0).

SCD has the following design parameters: the radius of the tube evaporator is 16,85 mm, the area of the evaporator is 1.0 m², the area of the capacitor is equal to 2.0 m².

The computational domain is filled with material , whose volumetric heat capacity in the thawed and frozen states is equal to $2.14\times{10^6}\frac{J}{m^{3}\times{K}}$ .

The thermal conductivity of material in the thawed and frozen states is equal to $1.8\frac{W}{m\times{K}}$ .

The computational domain

Fig. 1: Computational domain

A boundary condition of the second type (zero flow) is specified at the boundary of the computational domain , a boundary condition of the third type is specified for SCD , where the ambient temperature is -20.0 °C, and the heat transfer coefficient is equal to $10.0\frac{W}{m\times{K}}$ . The initial temperature of the material (which occupies the computational domain) is equal to -1.0 °C.

In the test, the thermal field distribution is calculated every 30 days across 360 days.

There were four pairs of calculations with different levels of computational domain discretization: the first calculation of each pair was performed running on a one CPU core, and the second on a video card using CUDA technology.

For the first pair of calculations, a spatial step equal to 100 mm in all three directions was used. For second pair of calculations, the spatial step along the axes and is 100 mm, and 50 mm along the axis. For the third pair of calculations, the spatial step along the axis is equal to 100 mm, and 50 mm along and . Finally, for the last pair of calculations a spatial step of 50 mm in all three directions was used. Thus, in the first case, the domain was meshed into $100\times{100}\times{100}=1,000,000$ nodes. In the second, third and fourth cases, the domain was divided into 2,000,000, 4,000,000 and 8,000,000 nodes, respectively.

Note: that this article does not analyze the accuracy of the calculations and the dependence of accuracy of the obtained solution on the level of computational domain discretization. In all the calculations performed, identical heat distributions were obtained; the results are shown in Figures 2 (a, b).


*Fig. 2a: heat distribution in the form of color filling*		*Fig. 2b: heat distribution in the form of isolines*

The calculations were performed on the processors listed in Table 2.

Table 2. Computational units

Processor (CPU)	Inlel Core i7	4 cores
Video card (GPU)	GeForce GTX TITAN	2688 cores

The computational time depending on the number of nodes in the computational domain is given in Table 3 and Figure 3.

Table 3. Computational time

Number of nodes	Computational time on Intel Core i7 (to the nearest second)	Computational time on GeForce GTX TITAN (to the nearest second)	GPU acceleration factor (rounded to 2 decimal places)
1,000,000	1 962	63	31.14
2,000,000	16 296	524	31.10
4,000,000	30 654	1013	30.26
8,000,000	63 858	2079	30.72

Difference in processing speed of CPU and GPU

Fig. 3: Computation speedup

4. CONCLUSIONS

On average, calculations on the GPU version of the ADI solver using CUDA technology on the Nvidia GTX Titan video card ran 30 times faster than the calculations on a one core of Intel Core i7 CPU. Note that the tests involved resolving for a relatively simple task (a small number of fixed boundary conditions for a small amount of material). When solving more complex computing tasks, the acceleration factor is expected to decrease when implementing parallel computing.

As shown in the article, CUDA technology can be successfully used on personal computers to solve tasks that require large computational resources, such as modeling physical processes and phenomena.

5. REFERENCES

1. N.N. Yanenko, The method of fractional steps: The solution of problems of mathematical physics in several variables, 1st ed. Springer – Verlag, (1971).

2. J.W. Thomas, Numerical partial differential equations: finite difference methods / J.W. Thomas. – USA: Springer 1995. – 437 p.

Implementation of the Douglas - Rachford Scheme with Cuda Technology

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112