summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorConnor Moore <connor@hhmoore.ca>2026-02-04 21:03:03 -0500
committerConnor Moore <connor@hhmoore.ca>2026-02-04 21:03:03 -0500
commit9899da194522f93be153e0a0d19c690505c3ca1c (patch)
tree101db3f8b806c6e4847b647f303ec9f67670d96e
parent1a29d3af67135587125230ecd4d515e1fd00dca5 (diff)
Added more to the LaTeX document and references library.
-rw-r--r--report/refs.bib44
-rw-r--r--report/report.tex77
2 files changed, 119 insertions, 2 deletions
diff --git a/report/refs.bib b/report/refs.bib
new file mode 100644
index 0000000..3320ee2
--- /dev/null
+++ b/report/refs.bib
@@ -0,0 +1,44 @@
+@book{Robey2021,
+ author = {Robey, Robert and Zamora, Yuliana},
+ date = {2021},
+ pages = {704},
+ publisher = {Manning Publications},
+ tags = {mcsc-6030g-p1},
+ title = {Parallel and High Performance Computing},
+ url = {https://openlibrary.org/books/OL34760653M/Parallel_and_High_Performance_Computing},
+ year = {2021}
+}
+@book{Watkins2010,
+ author = {Watkins, David S.},
+ date = {2010},
+ publisher = {Wiley},
+ tags = {mcsc-6030g-p1},
+ title = {Fundamentals of Matrix Computations},
+ url = {https://openlibrary.org/books/OL24019701M/Fundamentals_of_matrix_computations},
+ year = {2010}
+}
+@book{Curcic2020,
+ author = {Curcic, Milan},
+ date = {2020},
+ pages = {500},
+ publisher = {Manning Publications},
+ tags = {mcsc-6030g-p1},
+ title = {Modern Fortran},
+ url = {https://openlibrary.org/books/OL29840260M/Modern_Fortran},
+ year = {2020}
+}
+@manual{Intel2025,
+ address = {Santa Clara, CA},
+ author = {{Intel Corporation}},
+ edition = {2025.03},
+ title = {Intel Fortran Compiler Developer Guide and Reference},
+ year = {2025}
+}
+@manual{GCC2024,
+ address = {Boston, MA},
+ author = {Stallman, Richard and {Free Software Foundation}},
+ edition = {14.2.0},
+ organization = {Free Software Foundation},
+ title = {Using the GNU Compiler Collection},
+ year = {2024}
+}
diff --git a/report/report.tex b/report/report.tex
index 95703b9..0b4018b 100644
--- a/report/report.tex
+++ b/report/report.tex
@@ -6,16 +6,89 @@
\usepackage{xcolor}
\usepackage{graphicx}
+\usepackage{hyperref}
+\hypersetup{colorlinks=true,
+ linkcolor=black,
+ urlcolor=blue,
+ citecolor=black,
+ pdftitle={High-Performance Square Matrix-Matrix Multiplication},
+ pdfpagemode=FullScreen}
-\title{High-Performance Matrix-Matrix Multiplication}
+\usepackage{algpseudocode}
+
+\title{High-Performance Square Matrix-Matrix Multiplication}
\author{Connor Moore, 100826701}
\begin{document}
\maketitle
+
\section{Introduction}
+It is widely regarded in the scientific community that computers can be useful. One notable advantage of using computers is that they can be significantly faster in producing results when compared with humans. Although the computing power of today provides great flexibility in what can be computed, there is continuing merit in trying to be as efficient as possible in writing software. \emph{High performance computing} is computing with a focus on extreme performance \cite{Robey2021}, and is the discipline that enables large-scale simulation efforts in various areas of science and engineering \cite{Curcic2020}.
+
+Matrices are commonly found in computing. Because of their widespread use, it is beneficial to implement matrix operations as efficiently as possible, especially when considering prohibitively large systems. This report surveys various ways of taking a square matrix-matrix product from the perspective of minimizing wall times. The impact of different solution techniques, compilers, compiler flags, and matrix sizes are investigated. Different solutions were implemented in Fortran and programmatically timed to achieve this.
+
+\section{Matrix-Matrix Multiplication Implementations}
+When considering two $n\times n$ matrices $a$ and $b$, their product $c$ can be calculated using:
+\begin{equation}
+ c_{ij}=\sum_{k=1}^{n}a_{ik}b_{kj}
+ \label{eq:matproduct}
+\end{equation}
+where the subscript $i$ and $j$ represent the index of each entry in the respective matrix \cite{Watkins2010}.
+
+\subsection{Triple-loop Techniques}
+It is trivial to implement (\ref{eq:matproduct}) in most programming languages. The general algorithm for computing the product is:
+\begin{algorithmic}
+\State $b \gets 0$
+\For{$i=1,n$}
+ \For{$j=1,n$}
+ \For{$k=1,n$}
+ \State $c(i,j)=c(i,j)+a(i,k)\times b(k,j)$
+ \EndFor
+ \EndFor
+\EndFor
+\end{algorithmic}
+where the subscript notation has been replaced by Fortran-like notation. The complexity (number of floating-point operations (FLOPs)) for this algorithm is $\mathcal{O}(2n^3)$ in asymptotic notation \cite{Watkins2010}.
+\subsection{Fortran's \texttt{Matmul} Function}
+Fortran, like many other languages, comes built-in with an intrinsic function for matrix-matrix multiplication. It is called as a function with \texttt{c = matmul(a,b)} using the same notation as above. The exact implementation of \texttt{matmul} and whether or not it calls an external library is compiler-dependent.
+
+\subsection{BLAS Routines}
+A number of libraries exist that provide existing high-performance matrix-matrix operations. The Basic Linear Algebra Subprograms (BLAS) library is a collection of various subroutines organized into 3 ``levels.'' Level 1 is vector-vector routines, level 2 is vector-matrix routines, and level 3 is matrix-matrix routines. BLAS provides a level 3 routine for matrix-matrix multiplication known as \texttt{DGEMM} (Double-precision General Matrix-matrix Multiplication), which can be used to find the product.
+
+For the binaries compiled with \texttt{gfortran}, OpenBLAS 0.3.31 was used. OpenBLAS is a modern implementation of BLAS that provides better support for newer computer hardware. For binaries compiled with the Intel OneAPI suite, Intel's MKL BLAS was used. Library usage was verified using \texttt{ldd} on Linux for each binary.
+
+\subsection{Compiler and Flag Considerations}
+Two different Fortran compilers were tested; \texttt{gfortran} from the GNU compiler collection, and \texttt{ifx} from Intel's OneAPI suite. Both compilers have support for various levels of optimization with the \texttt{O0}, \texttt{O1}, \texttt{O2}, \texttt{O3}, and \texttt{Ofast} flags. Each of these flags were tested to quantify their impact on wall time.
+
+Additional flags were specified to optimize performance. When a binary is compiled, efforts are made to keep it `portable' by avoiding specific instruction sets or niche optimizations unavailable in common hardware. Because the tests are only performed locally, both compilers were instructed to compile the highest-performance binary using all available hardware tricks. In \texttt{gfortran} this involved specifying the \texttt{march=native} flag \cite{GCC2024}. On \texttt{ifx} this was performed with the \texttt{xHost} flag \cite{Intel2025}.
+
+\section{Comparisons and Results}
+Runs were conducted parametrically and driven by a GNU Makefile.
+
+\subsection{Matrix Size}
+As expected, an increase in matrix size corresponded with a pro-linear increase in wall time. This was consistent for all compilers, flags, and techniques.
+
+\subsection{Compiler}
+
+\subsubsection{Optimization Flags}
+
+\subsection{Parallelism}
+
+\section{Conclusions}
+
+\bibliographystyle{ieeetr}
+\bibliography{refs.bib}
+
+\clearpage
+\appendix
-This is the introduction!
+\section{Compiling and Testing the Code}
+The git repository is hosted at \url{https://git.hhmoore.ca/mcsc-6030g/p1-matrix-product} and contains various files. All files related to the report, including exported figures, are kept within the \texttt{report} subdirectory. An explanation is provided for everything else:
+\begin{description}
+ \item \texttt{matrixproduct.f90} is the actual Fortran program that computes the matrix products. It does this in three major ways: via triple-loop(s), using Fortran's intrinsic \texttt{matmul} function, and with OpenBLAS. Two variants of the triple-loop technique are provided, with one being row-major and the other column-major. The compiled binary expects command-line arguments of the start, stop, and step size to run properly. This is followed by a string \texttt{yes/no} input that toggles the triple-loop solutions on or off. These are turned off for "long" runs because they are prohibitively slow.
+ \item \texttt{Makefile} is a GNU Make script that drives the compilation and testing process. There are various make targets provided in the file. To produce a full set of results, simply type \texttt{make tests}. The default target ran by just calling \texttt{make} only compiles the binaries.
+ \item \texttt{plots.gnu} is a script that generates plots for the report using Gnuplot. The makefile target for plots can be run with \texttt{make plots}. This will produce plots in hybrid \texttt{.pdf} and \texttt{.tex} formats that embed cleanly in the \LaTeX document.
+\end{description}
\end{document}