Message boards : Number crunching : Report Problems with Rosetta Version 5.25
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next
Author | Message |
---|---|
![]() Send message Joined: 9 Apr 06 Posts: 9 Credit: 372,288 RAC: 0 |
Still having issues with 5.25. Same problems as I posted before, and apparently, many others are posting on. All problems with previous versions were minor and few in number. I'm running it on 6 hosts, all with varying CPU/OS's. All with plenty of RAM. Anxiously awaiting some fixes =) Welcome all newcommers. ![]() |
Tim Send message Joined: 8 Jul 06 Posts: 2 Credit: 52,584 RAC: 0 |
I have a graphical problem with all the workunits. I have a laptop with widescreen (1280*800), and when I see the graphics in full screen or using the screensaver it is cropped and the bottom line is missing. Here is a screenshoot: This happens on Mac OS X too. It's related to widescreen displays. It may be related to using the default boinc graphics code which was written for normal aspect ratio displays. See http://www.ssl.berkeley.edu/pipermail/boinc_dev/2006-July/006034.html and the following messages in the thread. Tim |
Tim Send message Joined: 8 Jul 06 Posts: 2 Credit: 52,584 RAC: 0 |
Rosetta 5.25 on Mac OS X 10.4.7 incorrectly classifies the target t314__CASP7_FOLLOWUP_ABRELAX_SAVE_ALL_OUT_BARCODE_perfectss__1066_42278_0 as unknown, even though the display clearly shows it as known. Use the link to email me. |
![]() Send message Joined: 9 Apr 06 Posts: 9 Credit: 372,288 RAC: 0 |
Well i've waited as long as I care to wait spending my $ contributing. PS.... 5.25 has been the worst release, it has made me quit when I spent most of my time trying to recuit people to Rosetta. I posted months ago on the issue... nothing. Now I retire Rosetta from my machines, and can enjoy my lower power bill too. 5 machines were eating plenty. ![]() |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Well i've waited as long as I care to wait spending my $ contributing. That is sad. For most people it seems 5.25 works quite well including me. I had this "sit with 0% CPU usage" only over at Ralph with some pre 5.25 version. Have you ever attached to RALPH to see whether you have the same problems there as here? I think we'll see new versions soon. Now that CASP is over they probably resume tweaking the app. |
jjgb10 Send message Joined: 29 Sep 05 Posts: 21 Credit: 6,152,959 RAC: 0 |
I have had ZERO problems with this release of Rosetta. I have Rosetta installed on 7 computers and it runs on every computer just fine with no problems. I am running the BOINC version 5.4.9. |
![]() Send message Joined: 9 Apr 06 Posts: 9 Credit: 372,288 RAC: 0 |
Perhaps all of you who think 5.25 is flawless need to read this thread. Nothing but posts of issues. All the same for every person. I posted a month+ ago about it, nothing was said in any way. I posted again a few weeks ago, still nothing. Now I quit and people notice. I've added somewhere about 40 machines to this project because I think it's a great cause, a great idea, and a new path. So I lost who knows how many hours with 5.25 stopping at 99-100%, then dumping the WU on reboot. It's just frustrating. If I didn't have so many boxes to check all the time to make sure they were running, It wouldn't bother me. I realize the attention has been focused on CASP, but there is MORE than enough people bringing forth issues since 5.25's debut. And for the record, I had no problems UNTIL 5.25. ![]() |
![]() ![]() Send message Joined: 30 Apr 06 Posts: 115 Credit: 1,307,916 RAC: 0 |
I have had ZERO problems with this release of Rosetta. I have Rosetta installed on 7 computers and it runs on every computer just fine with no problems. I am running the BOINC version 5.4.9. I have to restart 1 or 2 machines each week. But that's not bad for 22. I check my stats every day to make sure all of them are connecting. When one of them gets late, it means I have to restart the software. If I didn't have to do that my efficiency would be higher, but not significantly. What can I provide to help figure out what is happening? Team Starfire World BOINC ![]() |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
I don't know what happened to resultid=32044316 but it failed, sped through 3 waiting work units at the speed of light, and thoroughly mixed up the computer. |
![]() ![]() Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Sangamon I would think it would be helpful if you could note any patterns in your farm. Is it the same 2 or 3 boxes getting hung up? Or random? Do they all run same BOINC version? Same OS udpates? Same BOINC Preferences and locations? You might make it easy on yourself, whenever a machine gets hung up, attach it to Ralph for like a 10% resource share. So you can just see if the machine is already on Ralph, and start to get a feel if there are perfect machines in your farm, or if all are effected. Then screutenize the Ralph work done closely. If failures occur on Ralph, more diagnostic data is returned to the project. Post on the boards there the specific WUs you see hanging up or crashing BOINC. If you have specific Rosetta WUs that caused problems, post their IDs here. Sometimes there are issues with how the WUs are created. Other times it's a specific random number for a model that uncovers a problem. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
![]() ![]() Send message Joined: 11 Oct 05 Posts: 153 Credit: 4,387,904 RAC: 23 |
The people who have had no problems, I believe have been lucky and are probably running Windows. I have 4 computers at home, 2 do Windows and 2 do Linux. One of the Windows machines has had about 2 or 3 Wu's lock up but that is all. The 2 Linux machines have had numerous WU's lockup (around 2 dozen (24) at least). In all cases the "CPU time" and "To completion" times stop and the host CPU drops back to idle but does not move from that WU, saying it is running when in fact it is not, even after many hours to days. Suspending the WU and resuming has no effect, restarting the Boinc Client/Manager has no effect, only a reboot will get the unit working again. More often than not after restarting the WU will error out anyway. I do not intend to reboot every time a WU stops so at this stage I have been aborting the WU's in question to keep my computer doing something useful. Seems to be more a Linux problem than Windows and mostly I get a "Segmentation Error" as the cause of the failure. The time to failure can be from a few minutes/percent to almost complete (80 to 90% done). Happens on both Rosetta@home and also on Ralph@home, with more failures on Ralph than on Rosetta. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
The people who have had no problems, I believe have been lucky and are probably running Windows. I have 4 computers at home, 2 do Windows and 2 do Linux. One of the Windows machines has had about 2 or 3 Wu's lock up but that is all. The 2 Linux machines have had numerous WU's lockup (around 2 dozen (24) at least). My linux crunchers have been going 24/7 with no problems for a long time now. (I've had 17 Linux crunchers going 100% on Rosetta during casp, but have now dropped back to 13 Linux crunchers.) I'm using the standard BOINC 5.2.13. |
![]() Send message Joined: 9 Apr 06 Posts: 9 Credit: 372,288 RAC: 0 |
I've detached from rosetta for the time, and moved to 100% share on ralph. I don't want to just give up, 5.25 is just making me loopy. ![]() |
![]() ![]() Send message Joined: 11 Oct 05 Posts: 153 Credit: 4,387,904 RAC: 23 |
Still getting same problems on my 2 Linux machines, Windows machines ok. WU says it is running but nothing is happening, no timers are moving and CPU is idle. Will not switch to another project as per preferences, my only solution is to abort as I can not be rebooting all the time to restart WU. Getting "process exited with code 131" ERROR:SIGSEGV:segmentation violation" https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130067 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130003 (aborted) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129983 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129982 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129975 (aborted) Plus these 2 the day before (12th) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27385306 (aborted) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27385307 (aborted) Also have been getting the same error/problems on Ralph, again with Linux. |
[AF>Linux]Arnaud![]() Send message Joined: 17 Sep 05 Posts: 38 Credit: 10,490 RAC: 0 |
|
![]() ![]() Send message Joined: 6 Jun 06 Posts: 8 Credit: 5,771 RAC: 0 |
Hello Conan...would appreciate knowing where you get that download for the core client 5.5.0 that you are using....you are getting the same credits in ROSETTA using that client as I did using 5.3.12, but not 5.4.11....thanks, Stan user stats |
![]() ![]() Send message Joined: 11 Oct 05 Posts: 153 Credit: 4,387,904 RAC: 23 |
Still getting same problems on my 2 Linux machines, Windows machines ok. A follow up to the above, all of the new Wu's that I have processed today have died. All with the same errors as above and here are another 2 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130084 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129985 2 have died at 3741 and 3758 seconds (wu 28130067,28129983), with the others going between 5347 and 5400 seconds (about 1 1/2 hours). Coincidently my switch time between projects is 90 minutes (1 1/2 hours). The WU's either error out at the times mentioned or hang with nothing happening. This problem has been increasing over the last few days. The 2 Linux machines having most of the problems are both AMD, both dual CPUs, one a 275 with 4 GB RAM and one a 848 with 2 GB RAM, both running Linux Fedora Core 3. |
NJMHoffmann Send message Joined: 17 Dec 05 Posts: 45 Credit: 45,891 RAC: 0 |
Rosetta 5.25 writes its checkpoints before calculating, if it shouldn't stop now. With the actual betas of boinc this leads to the funny situation, that boinc waits for a checkpoint to switch task - sees the checkpoint - switches task - and Rosetta sits there with 100% done, till the other projects got their share, before the result is uploaded. Norbert |
TCU Computer Science Send message Joined: 7 Dec 05 Posts: 28 Credit: 12,861,977 RAC: 0 |
Another stuck work unit: 2f21X_BOINC_ABRELAX_SAVE_ALL_OUT_BARCODE__1075_31308 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27912763 It has been running for more than 2 days but accumulated only 6 hours of CPU time and is stuck at 74.4%. When I stopped BOINC, the Rosetta processes did not terminate. I rebooted the machine and the WU immediately terminated with the error ERROR:: Exit at: initialize.cc line:1618 |
Tino Ruiz Send message Joined: 12 Oct 05 Posts: 13 Credit: 397,392 RAC: 0 |
Hi, I'm having the same problems with Rosetta@Home "hanging" (it shows "running" but the CPU is at 0%). Usually it occurs within 23%~26% of processing the unit. The same thing happens with World Community Grid as well, but I know that's another project. ALL other projects work fine. I'm on a P4C 2.6 GHz, 512 MB RAM running Xubuntu. Nothing is overclocked. The workunits below I know are "stuck": FRA_t370_CASPR_hom001_6_t370_4_2a2jA_IGNORE_THE_REST_223_1078_61_0 FRA_t322_CASPR_hom001_6_t322_3_1u1zA_IGNORE_THE_REST_17_1079_65_1 There are a lot more that I've had to abort over the weeks, but my log only goes so far. |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.25
©2025 University of Washington
https://www.bakerlab.org