Rosetta@home using AVX / AVX2 ?

Message boards : Number crunching : Rosetta@home using AVX / AVX2 ?

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 9 · Next

Profile Dr. Merkwürdigliebe

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 77541 - Posted: 4 Oct 2014, 9:21:47 UTC

Hi everybody,

I just wanted to ask if there are plans to use AVX or AVX2 or possibly even the coming AVX-512 in Rosetta?

I heard there is not much sense in using GPUs to crunch but AVXx could really speed things up.

It's certainly true that in order to gain the full speedup, you would need to rewrite parts of that program but compiling with the appropriate compiler flags should still give you some performance advantage without changing the code.

It's sort of sad to see that those instructions lay dormant and unused.

I think Folding@Home already supports AVX through gromacs. Why not Rosetta?
ID: 77541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 77542 - Posted: 4 Oct 2014, 11:30:17 UTC - in response to Message 77541.  

Here is a conversation from a thread about Android development. David E K, one of the project administrators, doesn't talk about AVX but replies to a participant's query that updates to the server and the current application are the immediate priority.

David E K wrote:
Yes, there are definitely issues with android and boinc apps. The main issues now I believe are with the BOINC client and current android versions which put background processes to sleep. For now, I am not going to spend much time on our android version until they fix this issue. The motivation for an android arm version has come from BOINC and their partnership with HTC power to give. Samsung is also interested in running R@h on their phones.

VENETO boboviz wrote:
What's next? Update server side? Avx/Avx2? :-)

David E K wrote:
Probably server updates including software and hardware. Also, there's been some recent large scale code changes/refactoring of Rosetta so our next application update may not be trivial.

ID: 77542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 77543 - Posted: 4 Oct 2014, 14:16:12 UTC - in response to Message 77542.  

Here is a conversation from a thread about Android development. David E K, one of the project administrators, doesn't talk about AVX but replies to a participant's query that updates to the server and the current application are the immediate priority.


thanks for the info. I figure the use of AVXx would be a nice task for ralph@home.

All they need to do is to provide a binary compiled with the appropriate flags.

It either works or it doesn't. ;-)

IMHO this has much more precedence that getting Rosetta to work on Android.
ID: 77543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 77551 - Posted: 6 Oct 2014, 17:56:34 UTC

Good to hear they are thinking of updating their server code... because it is ANCIENT.
ID: 77551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 77553 - Posted: 6 Oct 2014, 18:10:48 UTC - in response to Message 77543.  

Here is a conversation from a thread about Android development. David E K, one of the project administrators, doesn't talk about AVX but replies to a participant's query that updates to the server and the current application are the immediate priority.


thanks for the info. I figure the use of AVXx would be a nice task for ralph@home.

All they need to do is to provide a binary compiled with the appropriate flags.

It either works or it doesn't. ;-)

IMHO this has much more precedence that getting Rosetta to work on Android.

I'm not familiar with AVXx. I believe we'd have to upgrade our compiler versions which isn't much of an issue (depending on how well/easy Rosetta ports). But would it crash on non-compatible machines?
ID: 77553 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2035
Credit: 10,415,785
RAC: 14,307
Message 77558 - Posted: 7 Oct 2014, 6:28:53 UTC - in response to Message 77553.  
Last modified: 7 Oct 2014, 6:29:42 UTC

I'm not familiar with AVXx.

In Intel and AMD developer sites there are a lot of docs, examples, etc.. :-)

But would it crash on non-compatible machines?

Why? Other projects use SSE/AVX with scheduler that assigns correctly works based on cpu capabilities
ID: 77558 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 77559 - Posted: 7 Oct 2014, 18:11:46 UTC - in response to Message 77553.  

I'm not familiar with AVXx. I believe we'd have to upgrade our compiler versions which isn't much of an issue (depending on how well/easy Rosetta ports). But would it crash on non-compatible machines?

Hi there and thanks for your reply. I have to confess I'm not really an expert on these things.

I think you do have to upgrade your compiler to a fairly recent version in order to take advantage of the new extensions. Unless you specifically compile for a certain architecture (as in -march=core-avx2), the binary will just use a different code path, resulting in a larger binary.

But than again, I'm not sure. I'm also aware that in the past there were ludicrous expectations concerning these new cpu extensions, i.e. MMX and 3Dnow.

But I think this time with AVX2 it will be different.

If you have some time to spare, you should read the relevant thread on Anandtech. The user Benchpress goes to some length to explain what the use of AVX2 can do to the performance of your code.

Thread about AVX2

Again, this should have much more precedence than running rosetta@home on a tablet.
ID: 77559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2035
Credit: 10,415,785
RAC: 14,307
Message 77564 - Posted: 8 Oct 2014, 15:10:05 UTC - in response to Message 77559.  

If you have some time to spare, you should read the relevant thread on Anandtech. The user Benchpress goes to some length to explain what the use of AVX2 can do to the performance of your code.

Some programs are 20% faster with AVXx, others 40% (!!), others 10%, depends of code...
Here some docs/tools about Avx/Avx2
First program with Avx2
Processing arrays with Avx2
CodeXL benefits

There are lot, as i say, of tools, docs, examples
ID: 77564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77835 - Posted: 17 Jan 2015, 1:50:15 UTC
Last modified: 17 Jan 2015, 2:05:23 UTC

yup i'd think AVX / AVX2 is a good thing, actually this is very similar (or of the same nature) to the GPU request threads, i.e. to exploit vectorized CPU or GPU functionality to significantly accelerate computations

the thing is that it may involve some code rewrites, which it seemed has been deemed 'hard to do'? :o lol

AVX / AVX2 can process 4 x 64bit double precision floats in a single clock cycle, on a naive basis against non-vectorized codes, it would imply up to 4 times the speedup per cpu core. but in practice i'd think the speedup may not really reach the that scale as many of today's CPUs are superscalar (they features instruction level parallelism for non vector codes) and that it's likely not all pieces of codes can be parallelized

as for GPUs the very *high end / expensive* cards is said to be able to process many times that. (unfortunately GPU is not consistent in this respects, a lot of GPU use software emulation for double precision floats computation, this cut that GPU prowess to 1/8 of it or more). note also that desktop GPU is normally clocked as about 1Ghz which is some 1/3 of today's CPU clock frequencies (e.g. 3-4 Ghz)

link to GPU thread discussion:
ID: 77835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2035
Credit: 10,415,785
RAC: 14,307
Message 77840 - Posted: 17 Jan 2015, 7:26:09 UTC - in response to Message 77835.  
Last modified: 17 Jan 2015, 7:27:00 UTC

the thing is that it may involve some code rewrites, which it seemed has been deemed 'hard to do'? :o lol

I know that rosetta's admins don't try to use avx extension.
I know they tried to use android and it was a waste of time.
So, why not try avx?

AVX / AVX2 can process 4 x 64bit double precision floats in a single clock cycle, on a naive basis against non-vectorized codes, it would imply up to 4 times the speedup per cpu core. but in practice i'd think the speedup may not really reach the that scale[/url]

A simply 10% plus per core is a BIG gain!! :-)
ID: 77840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77842 - Posted: 18 Jan 2015, 4:21:40 UTC - in response to Message 77840.  
Last modified: 18 Jan 2015, 4:59:44 UTC

the thing is that it may involve some code rewrites, which it seemed has been deemed 'hard to do'? :o lol

I know that rosetta's admins don't try to use avx extension.
I know they tried to use android and it was a waste of time.
So, why not try avx?

AVX / AVX2 can process 4 x 64bit double precision floats in a single clock cycle, on a naive basis against non-vectorized codes, it would imply up to 4 times the speedup per cpu core. but in practice i'd think the speedup may not really reach the that scale[/url]

A simply 10% plus per core is a BIG gain!! :-)

actually that's *almost* the same as optimizing the programs for GPUs, as a common technology based on 'higher level' languages that's optimised to vector cpu computation be they AVX/AVX2 or vector GPU cores is OpenCL and CUDA.

the thing is that part of rosetta commons code would need to be rewritten / redesigned to use OpenCL. And in addition, the *compiled* target binaries would certainly be *platform specific* (i.e. differs between each Intel or AMD, Nvidia CPU platforms). However, apparently OpenCL uses some just-in-time methods where the codes are basically stored as text scripts and is compiled at run time by the specific platforms.

note this other issue is that there is specific bindings / libraries / SDK for each platform hence it may means quite a lot more maintenance issues as there would at least be a need to target the different runtime OpenCL platforms (and even underlying hardware CPU/GPU platforms - they are different after all), it may mean needing to maintain multiple versions of the rosetta codes even if OpenCL is used.
ID: 77842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2035
Credit: 10,415,785
RAC: 14,307
Message 77856 - Posted: 26 Jan 2015, 10:47:12 UTC

ID: 77856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 78060 - Posted: 24 Mar 2015, 11:03:08 UTC

Any news on this? 200 TFlops (which is a probably bad estimate) is starting to look a bit low!
ID: 78060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 78061 - Posted: 24 Mar 2015, 17:54:13 UTC - in response to Message 77840.  

I know that rosetta's admins don't try to use avx extension.
I know they tried to use android and it was a waste of time.
So, why not try avx?

We do have a somewhat stable android build but android 5 gave me a curve ball with the requirement of PIE and unfortunately it's not so easy to build Rosetta with PIE even though they say it just requires -PIE -fpie compile/link commands etc... Yes, it compiles and links but seg faults and debugging has been tough. Such is the case sometimes when things are said to be easy but in practice it can be a different story.

It has been on the backburner as with avx etc due to other research related priorities, for example, we have been invited to write 3 papers for the CASP11 meeting and I'm also in the process of making the builds based on current Rosetta source.
ID: 78061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 78063 - Posted: 25 Mar 2015, 21:10:50 UTC - in response to Message 78061.  
Last modified: 25 Mar 2015, 21:11:55 UTC

I know that rosetta's admins don't try to use avx extension.
I know they tried to use android and it was a waste of time.
So, why not try avx?

We do have a somewhat stable android build but android 5 gave me a curve ball with the requirement of PIE and unfortunately it's not so easy to build Rosetta with PIE even though they say it just requires -PIE -fpie compile/link commands etc... Yes, it compiles and links but seg faults and debugging has been tough. Such is the case sometimes when things are said to be easy but in practice it can be a different story.

It has been on the backburner as with avx etc due to other research related priorities, for example, we have been invited to write 3 papers for the CASP11 meeting and I'm also in the process of making the builds based on current Rosetta source.

I was just watching the video posted on the front page, you're aging really well! (comparing to the Rosetta@home promo video).

Is it possible to realease the code as open-source and have two versions of it (one propietary and one open-source)? Open-source development could really help with things like this, specially when you're short on coders and/or time.

EDIT: Profile pictures are not loading when updated :( (for example, mine)
ID: 78063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 78068 - Posted: 27 Mar 2015, 19:48:58 UTC - in response to Message 78063.  

I know that rosetta's admins don't try to use avx extension.
I know they tried to use android and it was a waste of time.
So, why not try avx?

We do have a somewhat stable android build but android 5 gave me a curve ball with the requirement of PIE and unfortunately it's not so easy to build Rosetta with PIE even though they say it just requires -PIE -fpie compile/link commands etc... Yes, it compiles and links but seg faults and debugging has been tough. Such is the case sometimes when things are said to be easy but in practice it can be a different story.

It has been on the backburner as with avx etc due to other research related priorities, for example, we have been invited to write 3 papers for the CASP11 meeting and I'm also in the process of making the builds based on current Rosetta source.

I was just watching the video posted on the front page, you're aging really well! (comparing to the Rosetta@home promo video).

Is it possible to realease the code as open-source and have two versions of it (one propietary and one open-source)? Open-source development could really help with things like this, specially when you're short on coders and/or time.

EDIT: Profile pictures are not loading when updated :( (for example, mine)

I'll check up on this profile picture bug. Don't know why that's happening. This is David Kim not David Baker :)

The Rosetta source is freely available to academics. Source development however is limited to RosettaCommons developers/researchers, institutions/groups can join if they agree to the UW rosetta commons terms and align with the same research interests I believe. You can check out the site for more info.
ID: 78068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 78086 - Posted: 2 Apr 2015, 19:59:23 UTC - in response to Message 78068.  

I know that rosetta's admins don't try to use avx extension.
I know they tried to use android and it was a waste of time.
So, why not try avx?

We do have a somewhat stable android build but android 5 gave me a curve ball with the requirement of PIE and unfortunately it's not so easy to build Rosetta with PIE even though they say it just requires -PIE -fpie compile/link commands etc... Yes, it compiles and links but seg faults and debugging has been tough. Such is the case sometimes when things are said to be easy but in practice it can be a different story.

It has been on the backburner as with avx etc due to other research related priorities, for example, we have been invited to write 3 papers for the CASP11 meeting and I'm also in the process of making the builds based on current Rosetta source.

I was just watching the video posted on the front page, you're aging really well! (comparing to the Rosetta@home promo video).

Is it possible to realease the code as open-source and have two versions of it (one propietary and one open-source)? Open-source development could really help with things like this, specially when you're short on coders and/or time.

EDIT: Profile pictures are not loading when updated :( (for example, mine)

I'll check up on this profile picture bug. Don't know why that's happening. This is David Kim not David Baker :)

The Rosetta source is freely available to academics. Source development however is limited to RosettaCommons developers/researchers, institutions/groups can join if they agree to the UW rosetta commons terms and align with the same research interests I believe. You can check out the site for more info.

Ah, well, sorry for the mix up.

It was just an idea to help boost R@H's FLOPS. Doubt it's that easy to implement AVX just like that though.

ID: 78086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2035
Credit: 10,415,785
RAC: 14,307
Message 78165 - Posted: 28 Apr 2015, 12:44:55 UTC - in response to Message 78086.  

Doubt it's that easy to implement AVX just like that though.

Yeap not easy, but there are tools/documentation that help, like this:
Intel Intrinsics Guide

ID: 78165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Send message
Joined: 22 Nov 10
Posts: 274
Credit: 23,407,489
RAC: 3,912
Message 78198 - Posted: 15 May 2015, 22:42:25 UTC - in response to Message 78165.  

Doubt it's that easy to implement AVX just like that though.

Yeap not easy, but there are tools/documentation that help, like this:
Intel Intrinsics Guide

The executing code seems to be compiled for a i386 and uses the 387 floating point 8-register stack model. The code (on my machine) spends about 5% of the time waiting for the "fmul st0,st1" ("====" below) to complete.


Rosetta instruction clip ...

address instruction
0x6b3d82 add ebx, ecx
0x6b3d84 lea ebx, ptr [edi+ebx*8]
0x6b3d87 fld st0, qword ptr [edi+eax*8]
0x6b3d8a mov eax, dword ptr [ebp-0x20]
0x6b3d8d mov edi, dword ptr [ebp-0x14]
0x6b3d90 fmul st0, st1
0x6b3d92 inc ecx =========================
0x6b3d93 add eax, 0x8
0x6b3d96 fsubr st0, qword ptr [ebx]
0x6b3d98 add edx, 0x8

All post-Pentium4 CPU (newer than Nov. 2000) support the SSE2 register model. Simply adding the SSE2 target option to the builds would require the machines to be made this century but would use the SSE registers. The 16 directly addressable registers would reduce register stores to the stack and code scheduling (less shuffling of data around and more computation).

A simple recompile should make a noticeable difference without any side effects. If you compile newer than SSE2 or GPUs, you have to start worrying about and managing the population of target machines you deliver workloads to.

Beyond that, the developers would need to look more closely at the code.

ID: 78198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Send message
Joined: 10 Nov 13
Posts: 40
Credit: 397,847
RAC: 0
Message 78200 - Posted: 15 May 2015, 23:25:23 UTC - in response to Message 78198.  

The executing code seems to be compiled for a i386 and uses the 387 floating point 8-register stack model. The code (on my machine) spends about 5% of the time waiting for the "fmul st0,st1" ("====" below) to complete.


Rosetta instruction clip ...

address instruction
0x6b3d82 add ebx, ecx
0x6b3d84 lea ebx, ptr [edi+ebx*8]
0x6b3d87 fld st0, qword ptr [edi+eax*8]
0x6b3d8a mov eax, dword ptr [ebp-0x20]
0x6b3d8d mov edi, dword ptr [ebp-0x14]
0x6b3d90 fmul st0, st1
0x6b3d92 inc ecx =========================
0x6b3d93 add eax, 0x8
0x6b3d96 fsubr st0, qword ptr [ebx]
0x6b3d98 add edx, 0x8

All post-Pentium4 CPU (newer than Nov. 2000) support the SSE2 register model. Simply adding the SSE2 target option to the builds would require the machines to be made this century but would use the SSE registers. The 16 directly addressable registers would reduce register stores to the stack and code scheduling (less shuffling of data around and more computation).

A simple recompile should make a noticeable difference without any side effects. If you compile newer than SSE2 or GPUs, you have to start worrying about and managing the population of target machines you deliver workloads to.

Beyond that, the developers would need to look more closely at the code.

Interesting. Which tool did you use to get that info may I ask?
ID: 78200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 9 · Next

Message boards : Number crunching : Rosetta@home using AVX / AVX2 ?

©2025 University of Washington