report/Chapters/results.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

\chapter{Results}

\section{Simulation Performance}

\subsection{Domain Decomposition}
To evaluate the performance gained from domain decomposition, simulations of different numbers of particles were run with increasing numbers of sectors, with each simulation consisting of 100 thousand time steps. The $N^2$ scaling of the original algorithm came from the need to evaluate distances between each of the $N$ particles with the other $N-1$ particles. By implementing domain decomposition, each of the $N$ particles only needs to be evaluated with particles in the $9$ surrounding sectors. This means that on average, the number of interaction calculations will scale as $\propto N \left( 9 \frac{N}{S}\right)$. The wall time should then grow similarly, plus the time taken to iterate over the sectors which scales as $S$. The expected scaling of wall time for domain decomposition is then 
\begin{equation}
    W \propto N\left(9\frac{N}{S}\right) + S
\end{equation}

\begin{figure}[h]
    \centering
    %\includegraphics[width=0.5\linewidth]{Figures/time_vs_N.png}
    \input{Figures/fig41.tex}
    \caption{Wall times versus the number of particles for different numbers of sectors. }
    \label{fig:time_vs_N}
\end{figure}


\subsubsection{Optimal Number of Sectors}
It was seen in the previous result that the number of sectors $S$ that minimizes the wall time changes with the number of particles $N$, suggesting that there is an optimal number of sectors to use for a given number of particles based on the density of particles per sector, $\rho$. Here, we investigate this relationship and seek to identify the optimal choice of $S$ for a given $N$. 

The implication of an optimal number of sectors is not surprising. When the number of sectors is small, and thus their size larger, particles that occupy the same sector can still often be outside of the interaction range, resulting in many wasted distance calculations. Adding sectors at this stage reduces the search radius for interactions, leading to fewer unnecessary distance calculations and a shorter wall time. Increasing the number of sectors further will eventually result in having approximately one particle per sector on average. From this point, adding more sectors now \textit{increases} the number of wasted evaluations, this time by needing to iterate over empty sectors. Based on this, we can expect that the optimal number of sectors is that which yields a density of one particle per sector. 

To measure the dependence of wall time on the sector densities, simulations were run for different $N$ ranging from a single particle to $N=1024$, over increasing numbers of sectors. In figure \ref{fig:time_vs_density}, the measured wall times are plotted as a function of the number of particles per sector. For any number of particles, it can be seen that the shortest wall time is achieved when the sector density is $\rho_s = 1.0$ and that the wall time increases with any deviation from this density, as expected. Thus, the optimal number of sectors for minimizing wall time is $S=N$. 

\begin{figure}[H]
    \centering
    %\includegraphics[width=0.6\linewidth]{Figures/times_vs_density.png}
    \input{Figures/fig42.tex}
    \caption{Wall times versus the density of particles per sector for different $N$.}
    \label{fig:time_vs_density}
\end{figure}


\subsection{Parallelization}

\begin{figure}[H]
    \centering
    %\includegraphics[width=0.9\linewidth]{Figures/parallelization_results.pdf}
    \input{Figures/fig43.tex}
    \caption{Wall times versus the different numbers of threads and different $N$ values.}
    \label{fig:parallelization_results}
\end{figure}

At lower particle counts ($N = 1024$), thread management overhead dominates and parallelization provides no meaningful benefit. At larger counts ($N = 2048$ and $N = 4096$), speedup is observed going from 2 to 4 threads, but gains diminish beyond that. The maximum observed speedup is approximately $2.6\times$ at $N = 4096$ with 8 threads, considerably below the ideal $4\times$ speedup.


\section{Langevin Dynamics}

\subsection{Root Mean Square Displacement}
RMSD scales at $t$ in the ballistic regime, $t^{1/2}$ in the diffusive regime, and should eventually saturate as the system equilibrates (particles start to each boundary of the box). The calculation methodology for RMSD involves summing the Euclidean distance between a particle and its origin (birth position) for all relevant particles. In the case of the developed Fortran program, a ``relevant'' particle is one which has not escaped the box. The particles are always typically in motion and therefore the RMSD with vary with the simulation time. This is expressed in mathematical form as:
\begin{equation}
    \mathrm{RMSD}(t) = \sqrt{
        \frac{1}{N_{\mathrm{in}}(t)}
        \sum_{i\,\in\,\mathrm{inside}}
        \left|\mathbf{x}_i(t) - \mathbf{x}_i(0)\right|^2
    },
    \label{eq:rmsd}
\end{equation}
Various runs were conducted to investigate the behaviour of the RMSD for the system. The transition time $\tau$ that bounds the ballistic and diffusive regimes can be expressed analytically as:
\begin{equation}
    \tau_p = \frac{m}{\gamma}
\end{equation}
where $m$ is the mass of the particle and $\gamma$ is the drag coefficient (also known as the Stokes friction coefficient) \cite{ma2012}. Worth noting is that the boundary does not represent a `hard' transition from a perfectly $t$-proportional regime to a $t^{1/2}$-proportional one, but rather the curve will tend towards the correct profile as $\left| t-\tau_p \right| \rightarrow \infty$. The behaviour is shown in Figure \ref{fig:rmsd} for various values of the friction coefficient $\gamma$:

\begin{figure}[H]
    \centering
    \hspace{-1cm}
    \input{Figures/rmds.tex}
    \caption{Smoothed RMSD as a function of simulation time and the drag coefficient $\gamma$. Note the relevant transition boundary $\tau_p$ plotted for each curve, and a sample $f(t)\propto t$ is provided (dashed black line). The deviance from $t$-proportionality is clearly seen as the transition point is taken closer and closer to the starting time of $t=0$.}
    \label{fig:rmsd}
\end{figure}

Note that in the figure above, a `smoothed' view is provided for the RMSD shown in solid lines. The raw output from the code contains some fluctuations that make visualization of the trends difficult. Smoothing is applied strictly during post-processing using \texttt{cspline} in Gnuplot. For clarity, the data points are also presented for each curve. The point density considerably increases as time goes on because data is sampled at a regular interval, so the increase in `outlier' points is considered expected. 

A trend is still shown with respect to the analytical values of $\tau_p$ despite the fluctuations. The coefficient $\gamma=0.5$ yields a curve that is mostly linear until the regime boundary, at which point there is a clear deviation. The systems that transition sooner (e.g., $\gamma = 1000$) deviate from the linear trend immediately.

\subsection{System Energy}
The total system energy was investigated as part of the testing methodology for boundary conditions. It was demonstrated that significant fluctuations exist only if stochastic and drag forces are considered. If the case of a billiards-like system is considered, where particles only bounce and interact with one another, the system's kinetic energy was constant barring when particle-particle interactions occur. When particles interact, kinetic energy is transferred to the Lennard-Jones potential between both particles, and as such there are localized valleys in the system kinetic energy over time.