Skip to content

Commit

Permalink
Merge pull request #96 from jzarnett/FP64
Browse files Browse the repository at this point in the history
Add note about FP64
  • Loading branch information
jzarnett authored Feb 4, 2024
2 parents 2e2e103 + d8592aa commit 7eafd05
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 1 deletion.
11 changes: 10 additions & 1 deletion lectures/459.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1355,4 +1355,13 @@ @misc{parler
year = {2021},
url = {https://www.wired.com/story/parler-hack-data-public-posts-images-video/},
note = {Online; accessed 2023-10-14}
}
}

@misc{fp3264,
author = {{JeGX}},
title = {AMD Radeon and NVIDIA GeForce FP32/FP64 GFLOPS Table},
year = {2014},
url = {https://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/},
note = {Online; accessed 2024-02-04}
}
37 changes: 37 additions & 0 deletions lectures/L22-slides.tex
Original file line number Diff line number Diff line change
Expand Up @@ -291,5 +291,42 @@

\end{frame}

\begin{frame}{Trading Accuracy for Performance?}

One more item from previous ECE 459 student Tony Tascioglu.


A crowd favourite in ECE 459 is trading accuracy for performance.


NVIDIA GeForce gaming GPU's don't natively support FP64 (double).
\begin{itemize}
\item Native FP64 typically requires \$\$\$ datacentre GPUs.
\item FP64 used to be locked in software, now missing in HW.
\item Emulated using FP32 on gaming and workstation cards.
\end{itemize}

\end{frame}

\begin{frame}{Trading Accuracy for Performance?}

Using 32-bit floats rather than 64-bit doubles is typically a 16, 32 or even 64x speedup depending on the GPU!

Even more: 16 bit float instead of 32 bit is typically another 2x faster.

For many applications, double precision isn't necessary!

\end{frame}

\begin{frame}{Trading Accuracy for Performance?}

How dramatic is the difference?

\begin{center}
\includegraphics[width=\textwidth]{images/gpu-fp32-fp64-table.png}
\end{center}

\end{frame}

\end{document}

12 changes: 12 additions & 0 deletions lectures/L22.tex
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,18 @@ \subsection*{N-Body Host Code}

The full version of the improved code is in the course repository as \texttt{nbody-cuda-grid}. But what you want to know is, did these changes work? Yes! It sped up the calculation to about 1.65 seconds (still with 100~000 points, still on the same server). Now that's a lot better! We are finally putting the parallel compute power of the GPU to good use and it results in an excellent speedup.

\paragraph{Trading Accuracy for Performance?}
Thanks to previous ECE 459 student Tony Tascioglu who contributed this section. We've covered on numerous occasions that trading accuracy for performance is often a worthwhile endeavour. You might even say it's a crowd favourite. It's an instructor favourite, at lea1st.

Most of the gaming-oriented NVIDIA GeForce GPUs don't natively support FP64 (double-precision floating point numbers). Native support for that requires expensive datacentre GPUs; it used to be locked in software and is missing in the hardware in more modern cards. Instead of running in hardware, the 64-bit operations are emulated in software and that is significantly slower. How much slower? Using 32-bit floats rather than 64-bit doubles is typically a 16, 32 or even 64x speedup depending on the GPU! We can even push that a bit farther because using a 16-bit float might typically be another 2x faster. For many applications (gaming?) this level of precision isn't necessary.

How dramatic is the difference? See this table from\cite{fp3264}, which although its date says 2014, has clearly been updated since then since the GeForce RTX 3080 did not come out until September of 2020:

\begin{center}
\includegraphics[width=\textwidth]{images/gpu-fp32-fp64-table.png}
\end{center}


\input{bibliography.tex}

\end{document}
Binary file added lectures/images/gpu-fp32-fp64-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7eafd05

Please sign in to comment.