New FERMI GPU, 4x more cores, more memory

Author	Message
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 63583 - Posted: 3 Oct 2009, 16:28:12 UTC Nvidia has now announced a new architecture with on-chip L1 and L2 cache memory to support more memory intensive applications. FERMI Fermi makes GPU and CPU co-processing pervasive by addressing the full-spectrum of computing applications. Designed for C++ and available with a Visual Studio development environment, it makes parallel programming easier and accelerates performance on a wider array of applications than ever before – including dramatic performance acceleration in ray tracing, physics, finite element analysis, high-precision scientific computing, sparse linear algebra, sorting, and search algorithms. Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 63583 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 63584 - Posted: 3 Oct 2009, 16:49:40 UTC Here's a nice article talking about why protein folding is an important and challenging field of study and how some are using GPUs for such atomic modeling. Probing Biomolecular Machines with Graphics Processors. (be sure to click the settings item on the menu bar to format to a readable size) Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 63584 · Rating: 0 · rate: / Reply Quote

zpm Send message Joined: 21 Mar 09 Posts: 6 Credit: 349,801 RAC: 0	Message 63587 - Posted: 4 Oct 2009, 2:50:46 UTC - in response to Message 63584. Last modified: 4 Oct 2009, 2:51:33 UTC Here's a nice article talking about why protein folding is an important and challenging field of study and how some are using GPUs for such atomic modeling. Probing Biomolecular Machines with Graphics Processors. (be sure to click the settings item on the menu bar to format to a readable size) Drugdiscovery@home is in the progress of trying to get gpu's ati and nvidia working and hopefully Multi-thread like aqua@home has, but it's a slow work in progress with 1 man doing 95% of the work, ageless is working on the ati app.... http://boinc.drugdiscoveryathome.com if your interested in helping us, pm me for invite code. ID: 63587 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 63588 - Posted: 4 Oct 2009, 4:00:36 UTC Ahhhh.... and to think I'm having trouble understanding linear algebra in college :S ID: 63588 · Rating: 0 · rate: / Reply Quote

rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0	Message 63589 - Posted: 4 Oct 2009, 4:13:08 UTC - in response to Message 63588. Ahhhh.... and to think I'm having trouble understanding linear algebra in college :S congratulations to you! ID: 63589 · Rating: 0 · rate: / Reply Quote

zpm Send message Joined: 21 Mar 09 Posts: 6 Credit: 349,801 RAC: 0	Message 63590 - Posted: 4 Oct 2009, 6:23:48 UTC - in response to Message 63588. Last modified: 4 Oct 2009, 6:23:58 UTC Ahhhh.... and to think I'm having trouble understanding linear algebra in college :S college algebra is tough, maybe you and i should hook up and see what/where we are... my quarter just began. I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/ http://boinc.drugdiscoveryathome.com ID: 63590 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0	Message 63598 - Posted: 4 Oct 2009, 20:53:19 UTC - in response to Message 63583. Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU? Not much of a dent, since minirosetta currently requires about 500MB total memory to run on just one processor - a few hundred times as much. If you plan to use all the cores in order to get the maximum speedup, multiply 500MB by the number of cores to get the approximate amount of memory needed on the GPU card without a major, and therefore rather slow, rewrite of the program. It should be easier to start with a version that runs minirosetta on only as many cores as enough memory is available for, though, with one more core used to combine the various data streams. Much less of a speedup, but still some. Also, it looks to me like the compilers to allow the use of languages such as C++ and Fortran to prepare the computer code are likely to only prepare it for cards with the new GT300 series of GPU chips, and not the GPU cards sold in the past. Something to ask Nvidia about, at least. ID: 63598 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 63604 - Posted: 5 Oct 2009, 0:35:06 UTC - in response to Message 63598. Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU? Not much of a dent, since minirosetta currently requires about 500MB total memory to run on just one processor - a few hundred times as much. If you plan to use all the cores in order to get the maximum speedup, multiply 500MB by the number of cores to get the approximate amount of memory needed on the GPU card without a major, and therefore rather slow, rewrite of the program. It should be easier to start with a version that runs minirosetta on only as many cores as enough memory is available for, though, with one more core used to combine the various data streams. Much less of a speedup, but still some. Also, it looks to me like the compilers to allow the use of languages such as C++ and Fortran to prepare the computer code are likely to only prepare it for cards with the new GT300 series of GPU chips, and not the GPU cards sold in the past. Something to ask Nvidia about, at least. Wouldn't writing to RAM be useful? ID: 63604 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,260,318 RAC: 30	Message 63612 - Posted: 5 Oct 2009, 15:46:08 UTC I believe the bakerlab guys had to do some rewriting of rosetta to get it working efficiently on bulewaters (or was it a different machine?), which I would assume meant getting it to run on a single task in parallel rather than having each CPU running a separate simulation (otherwise it wouldn't have made use of the fact that the CPUs could communicate with each other, which I believe was the whole point of using a supercomputer rather than BOINC). If i'm right (long odds!) then I'd guess that'd be the way to go for GPGPU as well as you only need one copy of the protein in RAM then (rather than one per core). I can't begin to imagine where you'd start with getting the cores to work on the same task together though. ID: 63612 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0	Message 63616 - Posted: 5 Oct 2009, 18:12:46 UTC - in response to Message 63604. Last modified: 5 Oct 2009, 19:01:16 UTC Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU? Not much of a dent, since minirosetta currently requires about 500MB total memory to run on just one processor - a few hundred times as much. If you plan to use all the cores in order to get the maximum speedup, multiply 500MB by the number of cores to get the approximate amount of memory needed on the GPU card without a major, and therefore rather slow, rewrite of the program. It should be easier to start with a version that runs minirosetta on only as many cores as enough memory is available for, though, with one more core used to combine the various data streams. Much less of a speedup, but still some. Also, it looks to me like the compilers to allow the use of languages such as C++ and Fortran to prepare the computer code are likely to only prepare it for cards with the new GT300 series of GPU chips, and not the GPU cards sold in the past. Something to ask Nvidia about, at least. Wouldn't writing to RAM be useful? Yes, but just how useful is being able to write to an extra less than one percent of the amount of memory needed to run minirosetta on one processor without a major rewrite of the program? The total memory I referred to IS RAM, with a significant slowdown if swapping to the hard drive is used instead. Reaching the hard drive typically takes over a hundred times as long as reaching the RAM. Sharing sections of the database that contain the same values regardless of which GPU core uses them is a good second step, although I'd assume that's a rather small fraction of the total amount of memory each processor needs to use. I believe that the Milkyway@home project has found a way to get some GPU acceleration without a major rewrite of a program with high memory per processor requirements - just don't try to get the maximum speedup by using all the GPU cores. Instead, use only as many as there is enough graphics board memory for. Much less speedup, but allows getting at least some with much less work for the project team. Some people might consider rewriting a computer program in a different computer language, even without rearranging its database, to be a major rewrite. However, even though this is required with current versions of the compiler software, it should be less of a major rewrite than both rewriting the program in a different computer language and making a drastic rearrangement of the database at the same time. The new compilers Nvidia is planning to release in the next few weeks should reduce the amount of effort needed for such a rewrite; but Nvidia hasn't made it very clear whether those new compilers will work with the chips they've sold in the past as well as the new GT300, or only the new GT300 series chips. ID: 63616 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,716,125 RAC: 13	Message 63631 - Posted: 8 Oct 2009, 9:13:37 UTC - in response to Message 63616. The new compilers Nvidia is planning to release in the next few weeks should reduce the amount of effort needed for such a rewrite; but Nvidia hasn't made it very clear whether those new compilers will work with the chips they've sold in the past as well as the new GT300, or only the new GT300 series chips. May be time to upgrade?! Actually it may be time to have multiple versions available depending on which version of gpu a user has. That of course may even lead to different types of units being made available depending on the gpu. High level gpu's can crunch all units, lower level ones can only crunch some units. In short keep what they have and add to it, not replace it. Yes that could be a whole lot more work down the road support wise, but should be a bit easier in the short term. And then as time goes and more and better gpu's become available and more popular, the 400's?, then the pre 300 ones could be dropped off. Each project kind of does this already although they do it with the cpu and type of OS, ie Mac, Linux, Windows etc. Microsoft has always said that keeping Windows backwards compatible has always been the sticking point to making Windows all that it can be. Keep making units like they do now, just make the new version and new units just for it. Maybe even make the new units better and more detailed as far as the research end goes, to take advantage of the new cards capabilities. ID: 63631 · Rating: 0 · rate: / Reply Quote