Second generation GPU client on ATI hardware (GPU2) FAQ

Table of Contents

- A Brief History of FAH: From Tinker to Gromacs and the power of the GPU
- Introduction
- Folding@home debuts with the Tinker core (October 2000)
- A major step forward: the Gromacs core (May 2003)
- The next major step forward: Streaming Processor cores (September 2006)
- The second-generation GPU core, aka GPU2, for ATI hardware (April 2008)
- The second-generation GPU core for NVIDIA (June 2008)
- General instructions
- Frequently Asked Questions (common to both ATI and NVIDIA GPU2 clients)
- My points per day (PPD) varies significantly from project to project
- What about visualization?
- What OSs does the new client/core support?
- Can I run the GPU as a service?
- Can I use my CPU to do calculations too?
- How do I use flags with the GPU client?
- What about multi-GPU support?
- How do you decide the credit value of GPU work units?
- Why is the GPU client important?
- What's different between the GPU1 (first generation) and the GPU2 (second generation) client?
- Can I still use my GPU when the client is running?
- Troubleshooting
- The client was working, but now all I'm getting was Early Unit Ends (EUE's). How can I fix this?
- My client gives an UNSTABLE_MACHINE error and is going to sleep for 24 hours! What should I do?
- Issues specific to GPU2/ATI
- I'm having problems with Vista/Win7, any ideas?
- Does the new GPU client run the same WUs?
- What about multi-gpu support and the -gpu switch?
- What's different between the old and the new FAH GPU client?
- Can I still use my GPU when the client is running?
- What hardware does the new client/core support?
- What OSs does the new client/core support?
- What about hardware clocks?
- How about AGP vs PCIe slots?
- The client displays an error saying that I do not have a supported GPU, but I do!!!
- The core can't find the DLL's!
- A DLL error dialog box is popping up -- what's up with that?
- What's with all the new DLL's anyway?
- Who did all of this anyway?


A Brief History of FAH: From Tinker to Gromacs and the power of the GPU

Introduction

Since 2000, Folding@home (FAH) has lead to a major jump in the capabilities of molecular simulation. By joining together hundreds of thousands of PCs throughout the world, calculations which were previously considered impossible have now become routine. FAH has targeted the study of protein folding and protein folding diseases, and numerous scientific advances have come from the project.

In 2006, we began looking forward to another major advance in capabilities. This advance utilizes the new, high performance Graphics Processing Units (GPUs) from ATI to achieve performance previously only possible on supercomputers. With this new technology, as well as the new Cell processor in Sony's PlayStation 3, we sought to attain performance on the scale of 100 gigaflop per computer. With this new software and hardware, we pushed Folding@home a major step forward.

Our goal is to apply new technology to dramatically advance the capabilities of Folding@home, applying our simulations to further study of protein folding and related diseases, including Alzheimer's disease, Huntington's disease, and certain forms of cancer. With your help, coupled with new simulation methodologies to harness the new techniques, we will be able to address questions previously considered impossible to tackle computationally, and make even greater impacts on our knowledge of folding and folding related diseases.

Folding@home debuts with the Tinker core (October 2000)

In October 2000, Folding@home was officially released. The main software core engine was the Tinker molecular dynamics (MD) code. Tinker was chosen as the first scientific core due to its versatility and well laid out software design. In particular, Tinker was the only code to support a wide variety of MD force fields and solvent models. With the Tinker core, we were able to make several advances, including the first folding of a small protein starting purely from sequence (subsequently published in Nature).

A major step forward: the Gromacs core (May 2003)

After many months of testing, Folding@home officially rolled out a new core based on the Gromacs MD code in May 2003. Gromacs is the fastest MD code available, and likely one of the most optimized scientific codes in the world. By using hand tuned assembly code and utilizing new hardware in many PCs and Intel-based Macs (the SSE instructions), Gromacs was considerably faster than most MD codes by a factor of about 10x, and approximately a 20x to 30x speed increase over Tinker (which was written for flexibility and functionality, but not for speed).

In 2003, Gromacs had limits to what it could do, and did not support many implicit solvent models, which played a key role in our folding simulations with Tinker. Thus, while Gromacs significantly sped certain calculations, it was not a replacement for Tinker, and so the Tinker core continued to play an important role in the science of Folding@home. For these reasons, points for Gromacs WUs were set to be consistent with points for Tinker WUs. Moreover, we switched the benchmark machine to a 2.8 GHz Pentium 4 (from a 500MHz Celeron) in order to allow us to fairly benchmark these types of WUs (as the benchmark machine needed to have hardware support for SSE).

The next major step forward: Streaming Processor cores (September 2006)

Much like the Gromacs core greatly enhanced Folding@home by a 20x to 30x speed increase via a new utilization of hardware (SSE) in PCs, in 2006, we developed a new streaming processor core to utilize another new generation of hardware: GPUs with programmable floating-point capability. By writing highly optimized, hand-tuned code to run on ATI X1900 class GPUs, the science of Folding@home will see another 20x to 30x speed increase over its previous software (Gromacs) for certain applications. This great speed increase is achieved by running essentially the complete molecular dynamics calculation on the GPU; while this is a challenging software development task, it appears to be the way to achieve the highest speed improvement on GPUs.

In addition, through collaboration with Pande Group, Sony has developed an analogous core for the PS3's Cell processor (another streaming processor), which should see a significant speed increase for the science over the types of calculations we could previously do on a x86/SSE Gromacs core as well. Following what we did with the introduction of Gromacs, we will now switch benchmark machines and include an ATI X1900XT GPU in order to be able to benchmark streaming WUs (which cannot be run on non-GPU machines). This machine will also benchmark CPU units (which continue to be of value since GPUs work only for certain simulations) without using its GPU.

The second-generation GPU core, aka GPU2, for ATI hardware (April 2008)

After running the original GPU core for quite some time and analyzing its results, we have learned a lot about running GPGPU software. For example, it has become clear that a GPGPU approach via DirectX (DX) is not sufficiently reliable for what we need to do. Also, we've learned a great deal about GPU algorithms and improvements. One of the really exciting aspects about GPU's is that not only can they accelerate existing algorithms significantly, they get really interesting in that they can open doors to new algorithms that we would never think to do on CPUs at all (due to their very slow speed on CPUs, not but GPU's).

After much effort, we took all we learned about GPUs from the first-generation client and produced a second-generation client, GPU2. This core was much more technically sophisticated than the original, but it was faster, had higher reliability, ease of use, and much more scientific calculation capabilities. The results from it were very exciting.

The second-generation GPU core for NVIDIA (June 2008)

In collaboration with NVIDIA, we released a GPU2 core for NVIDIA hardware.

ATI Radeon™ 3870 X2


General instructions

This web page will serve as the FAQ and Release Notes for this new client, and we will update this page as more information becomes available.

The FAH GPU Client installer should do everything one needs. It installs the new v6.x SysTray style client, as well as DLL files used by this new client. Download the client from the High Performance Client Download Page for folding experts. If you need a guide to help you get thru the installation process, it can be found here.

Basic Requirements:

  • 2xxx/3xxx/4xxx/5xxx ATI Video Card, or newer
  • ATI Driver v8.1+, v8.3 or newer recommended, up to v9.2 (v9.3 not supported yet) (do not use OEM drivers)
  • 5xxx - v9.10 driver or newer, MUST use -forcegpu ati_r700 switch (w/ v6.23)
  • AGP GPU aperture size in the BIOS must be set to 128 MB or larger
  • Microsoft .NET Framework 2.0, with updates recommended
  • Windows operating system, XP or newer

This is a beta release and we expect there will be bugs, flaws, problems, etc. To minimize problems, we have been testing the client and cores extensively in house and they run well there. However, it's our experience that running in the controlled setup in our lab and running "out in the wild" are very different situations.

As in the use of any beta software, please make sure to back up your hard drive, and do not run this client on any machine which cannot tolerate even the slightest instability or problems.


Frequently Asked Questions (common to both ATI and NVIDIA GPU2 clients)

My points per day (PPD) varies significantly from project to project

There are lots of differences between GPUs and this leads to big swings in PPD when proteins of different sizes are simulated. When we benchmark on a given machine, we can ensure that on a machine that is similar to the benchmark machine, there will be no fluctuation in PPD. For machines which are very different from the benchmark machine, there could be big swings (33% is not unheard of, considering the large differences in hardware, such as the number of shaders, from GPU to GPU). This is particularly true for NVIDIA cards, which do very well at small proteins compared to the benchmark machine, but not nearly as well for larger proteins.

What about visualization?

In 2012 we completed visualization for the V7 client. It now can display a live 3D rendering of the protein you are simulating, which is animated using a series of snapshots. Click the "View" button at the top of the application. You can rotate around the image using your mouse, and zoom in and out using the mousewheel. Click the lifesaver ring on the right-hand side for a list of further controls.

This visualization is based on Folding@home PS3 client. Thanks to ATI and NVIDIA for their help with this visualization, and to Adam Beberg for the main engine behind it. Joseph Coffland addressed a number of lingering significant issues with the viewer and brought the final product out in the V7 client.

What OSs does the new client/core support?

GPU folding is supported only in Microsoft Windows for now. Native support for Linux and OX-X may be a possibility in the future, and it's something we're looking into.

Can I run the GPU as a service?

The service installation is not currently supported. Windows Vista and 7 do not present the driver interface to the service, so it would take a significant effort to make that work. Microsoft also has some security features in place that prevent applications, including Folding@home, from using the GPU on startup as a service. This is not the case for the CPU.

However, by default the V7 installer adds a link to the user's Startup folder, so the V7 client launches automatically when the user logs in. This can allow GPU folding to start automatically, although it's not a service.

Can I use my CPU to do calculations too?

For now, the GPU2 core uses the CPU a bit in addition to heavy use of the GPU. However, we hope to off load the calculation completely to the GPU in the future. In general, the CPU and GPU can each be given a separate WU to process.

We also have plans to develop a hybrid CPU-GPU core using OpenMM and new Long Timestep Molecular Dynamics (LTMD). See this blog post for more information.

How do I use flags with the GPU client?

The v6 and V7 clients both support flags. However, in V7 adding a flag is much different. See the Configuration FAQ for more information on how to do this. Settings for both clients are persistent, so they will be preserved in the configuration settings for you.

What about multi-GPU support?

Yes, both cards can be utilized for Folding@home. Each GPU card will be given a separate Work Unit to process. In the older v6 client, adding the "-gpu N" flag (N starts at 0) would tell the client which GPU to use for folding. In the newer V7 client, the installer should automatically detect a multi-GPU setup and will configure itself to use them. Sometimes it has difficulties with mixed brands of GPUs. When this happens, our Guide pages may be helpful.

To adjust the configuration, change V7 to advanced mode using the drop-down menu in the upper right-hand corner, then under Configure -> Slots, edit the GPU slot and adjust the indicies. If you need help, consider posting in our Folding Support Forum.

How do you decide the credit value of GPU work units?

Points are determined by the performance of a given machine relative to a benchmark machine, similar to the CPU client benchmark process. Before releasing any new project (series of work units), we benchmark it on a dedicated computer with an ATI Radeon 3850 GPU (512 MB, 320 Stream Processors), running in a Dell Inspiron 531, with a 2.16 GHz dual core AMD 64 X2 4000+.

We plug the results of this benchmark test into the following formula:

Points = 1500 * (DaysPerWU)

where DaysPerWU is the number of days it took the benchmark hardware to complete the work unit. Note that the GPU client still relies on a fast CPU, so the CPU is an important part of this. The Points Per Day (PPD) given here assumes that a CPU is heavily needed, with a larger PPD to compensate for the use of that CPU.

Please note the very concept of a reference machine will mean that some WU performance will vary from the performance on your machine. Even between various GPU models, there are significant differences in architectures and memory speeds. Moreover, there are variations between WUs within a given project which can lead to speed differences.

Our goal is consistency within a given definition of a reference machine setup (described above), but beyond that, the natural variation from machine to machine and WU to WU will never allow any point system to perfectly predict what you get on your machine.

Why is the GPU client important?

The purpose of the GPU client is twofold: to take advantage of the high-performance capabilities of Stream Processing, and to help develop a simulation architecture that will become one of the dominant FAH computing paradigms as multi-processor GPUs become an industry standard over the next several years. High-performance clients enable us to run types of calculations that would be impractical on our standard architecture--calculations that enhance our scientific capabilities, and your scientific contributions, significantly.

High-performance clients often require more computing resources. GPU clients typically run on dedicated systems, 24 hours a day, and use more processing power, more disk space, more network resources, more system memory, etc. Also, a major part of the scientific benefit is dependent on rapid turnaround of work units; hence we assign short deadlines for GPU work units. To reward those contributors for donating resources beyond the typical CPU client, for completing these work units very quickly within the short deadlines, and for contributing to the development of our next-generation capabilities, we currently set a benchmark value proportional to these demanding GPU work units. Without the GPU clients and your additional contributions, we would not be able to complete many important projects.

What's different between the GPU1 (first generation) and the GPU2 (second generation) client?

GPU1 taught us a great deal of running GPU's "in the wild," which is very different than in the lab, and what we've learned has gone into GPU2. GPU2 more reliable scientifically and has more advanced algorithms. It's also significantly easier to run than GPU1.

Actions such as fast-user switching, or locking your computer have no effect on GPU processing. Remote desktop does still affect the GPU client and will cause the FahCore to fail when a connection is initiated. VNC does not have the same problem and can be used as an alternative.

Note: The interaction of the GPU and the Operating System change from one version of Windows to the next. So while the GPU client seems unaffected by Fast User Switching in Windows XP, the same is not true for newer versions of Windows. This may change again as drivers are updated, Operating Systems are patched, and/or the eventual change to OpenCL.

Can I still use my GPU when the client is running?

Yes. We have taken steps to try to prevent the GPU client from interfered with many operations that used the GPU, such as watching videos or playing games but this may not work to your satisfaction. The very process of trying to make the GPU client use all of the GPU's resources can cause video lags due to the current generation drivers sending data to the GPU in a first-come-first-serve basis, rather than based on priority as they would for a CPU. If you experience significant system impact, and you would like help resolving this, please visit our Folding Support Forum.

Troubleshooting

The client was working, but now all I'm getting was Early Unit Ends (EUE's). How can I fix this?

We've seen cases where playing GPU intensive games can leave the GPU in a weird state, leading to consistent EUE's (Early Unit End error messages). Restarting the computer has worked to resolve this problem. We are looking into a better solution.

My client gives an UNSTABLE_MACHINE error and is going to sleep for 24 hours! What should I do?

This occurs when 10 EUE's occur. Rapidly EUE-ing machines are a sign that the client needs some donor intervention to fix it. Please check out the FAQ below as well as Folding Support Forum for details about how to fix a misconfigured client. This error typically results from a problem with drivers. Please see the instructions above for which drivers you should use for your hardware. Unfortunately, we cannot give more information from the client, since all the client knows is that it can't run CUDA and there's lots of reasons why (and there's currently no way for the core to detect them).

If your client has worked before, try restarting your machine, as that has also shown to help. Restarting the client will reset the EUE counter.



Issues specific to GPU2/ATI

I'm having problems with Vista/Win7, any ideas?

We have seen reports that the GPU client only works well in Vista/Win7 when running in "XP Compatibility Mode" or to "run as Admin." We suggest trying this out if you are having problems with the ATI client in Vista/Win7. We are investigating work arounds.

Does the new GPU client run the same WUs?

No, this new second generation GPU client will run a different set of WUs specially constructed for the Fahcore_11.exe functionality. Fahcore_11 will not run with the first generation GPU client, and Fahcore_10 will not run with this new client.

What about multi-gpu support and the -gpu switch?

Running multiple GPU2 clients, one client each on multiple GPU cards, is supported through the -gpu x command line switch. The setup is similar to running multiple SysTray CPU clients.

  • Copy your \Application Data\Folding@home-gpu folder to a new folder \Folding@home-gpu2 (\AppData\Roaming\Folding@home-gpu in Vista/Win7)
  • Create a new shortcut for the first client, and be sure to use the correct Target: and Start In: information. Note that one has to be very careful with shortcuts, and in particular, make sure that the "Start in:" field is set correctly. If you are having problems with automatic core upgrades, it is likely that your short cut is not set up correctly.
  • Edit the shortcut properties to add the -gpu 0 switch to the end of the Target: field.
  • Create a new shortcut for the second client, and be sure to use the correct Target: and Start In: information.
  • Edit the shortcut properties to add the -gpu 1 switch to the end of the Target: field.

Except for the different -gpu x switch, the Target: field in both shortcuts will point to the same FAH executable. The Start In: field for each client will point to the two different \Apps Data\FAH folders. The Target: and Start In: fields for a SysTray client are explained in more detail below.

The display must be active on the GPU card you plan to use, and –gpu 0 will select the first board, –gpu 1 will select the second board, -gpu 2 the third board, and so-on. You will need to extend the desktop for multiple boards to be detected. You will also need to use different Machine IDs for each client. Currently, only one client is supported on a 3850X2 or 3870X2.

More details ca be found in the Windows GPU Guide.

What's different between the old and the new FAH GPU client?

Scientifically, the new client introduces several new advances which makes it much more useful. It matches the advanced water models in the PS3 client and adds a new one (which will likely appear in a future PS3 client). These more advanced water models make this new GPU client very useful to us.

There are also many changes under the hood. The first generation client proved to be problematic due to GPU-specific and we've fixed all of them (as far as we can tell) in this second generation client. An important part of these fixes is using ATI's CAL instead of DirectX (the previous generation GPU client highlighted several issues with using DirectX). A major upside to using CAL is that DirectX context switches no longer affect the client. Actions such as fast-user switching, or locking your computer have no effect on GPU processing. Remote desktop does still affect the GPU client and will cause the FahCore to fail when a connection is initiated. VNC does not have the same problem and can be used as an alternative.

Note: The interaction of the GPU and the Operating System change from one version of Windows to the next. So while the GPU client seems unaffected by Fast User Switching in Windows XP, the same is not true for newer versions of Windows. This may change again as drivers are updated, Operating Systems are patched, and/or the eventual change to OpenCL.

Initially, this new client will be a SysTray style client only. A console version may follow later.

Can I still use my GPU when the client is running?

Yes. Unlike the original GPU client, which interfered with many operations that used the GPU, the new CAL-based client does not. Playing videos and playing games either have no effect on the action of the GPU client other than a slow-down in processing, or cause a temporary suspension of folding. The new client will automatically back off whenever an application requests exclusive DirectX mode, although it is not reported in the client logfile. DirectX programs that do not request exclusive mode will cause the GPU client to slow down, and may in some instances have a detrimental effect on application performance. Full screen video is unaffected by the GPU client.

What hardware does the new client/core support?

The client runs on all hardware supported by the AMD Compute Abstraction Layer (CAL), i.e. R6xx hardware, Radeon 2400 and above. The 3870 X2 has not been tested with both cores active, but running on 1 core does currently work.

What OSs does the new client/core support?

The client runs on Windows XP/Windows 2003 32-bit/64-bit and Vista/Win7 32-bit/64-bit. Windows XP should have SP2 installed. Win XP/2003 vs. Vista/7 use different CAL DLLs (see below), but the installer should install the correct ones for you.

What about hardware clocks?

On 3xxx hardware or newer, 3D clocks will be set automatically when FAH runs and you can adjust the clocks rates for core and memory in Catalyst Control Center in the Overdrive panel. Note that stable clocks for graphics may not imply stable clocks for Folding, overclock at your own risk. On 2xxx hardware, setting to 3D clocks is not reliably automatic, so a third party tool like ATI Tray Tool can be used to adjust clocks. Once again user beware. The recommendation is to leave the settings alone and fold at the clocks set by the driver.

How about AGP vs PCIe slots?

Performance of the GPU client is best with the board in a PCIe x16 slot. An x8 or x4 slot will cause some degradation in performance as the communication path between the CPU and GPU will be slower. AGP hardware is supported, but overall performance will be lower than PCIe boards because of a slower CPU to GPU connection. PCIe v2.0 is only marginally faster than PCIe.

The client displays an error saying that I do not have a supported GPU, but I do!!!

When the client displays the error At present, only ATI Radeon HD 2xxx/3xxx/4xxx/5xxx and ATI FireGL Vx6xx GPUs are supported the client did not recognize your GPU card. If you have a supported 2xxx/3xxx/4xxx model GPU, this error is most likely caused by custom device drivers that do not work with the GPU2 client. Some OEM and some laptop vendors modify the Catalyst drivers that ship with their cards. Download and install the latest ATI Catalyst drivers.

At present, the primary video device must also be a supported 2xxx/3xxx/4xxx/5xxx model, or you may see this error as well. The error CoreStatus = FFFFFFFF (-1) is also a symptom. Changing the ATI card to be the primary display device will resolve this.

In the 6.23 client version, 5xxx series GPUs must use a v9.10 driver or newer, and must use the -forcegpu ati_r700 switch.

The core can't find the DLL's!

We've been seeing some unusual behavior with virus scanners. We are looking into this. For now, give it a second try and it should work.

A DLL error dialog box is popping up -- what's up with that?

If the DLL error pops up, go to the installed location, C:\Program Files\Folding@home\Folding@home-gpu by default, and make sure the amdcalcl.dll and amdcalrt.dll ended up there with the FahCore_11.exe file and run the client from that location.

What's with all the new DLL's anyway?

We're using a new system (CAL) which uses a few DLL's. We are looking into the possibility of statically linking the whole thing to avoid DLL issues, but for now we've got DLL's.


Who did all of this anyway?

In alphabetical order:

  • Adam Beberg (Pande Lab): client modifications, GPU's APIs under the hood
  • Dan Ensign (Pande Lab): server setup, science, testing
  • Mark Friedrichs (Pande Lab, Simbios): core science code updates, testing
  • Mike Houston (AMD): testing, problem solving, GPU tuning
  • Vijay Pande (Pande Lab): Project management, fitting square pegs through round holes, etc
  • We would also like to thank the Folding@home Community Forum moderators for their help with this FAQ and some early beta testing of the software.

For More Information


Last Updated on August 10, 2012, at 09:09 PM