Message boards : Number crunching : Report problems with Rosetta version 5.32
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
Hmmm, it may just be my imagination, but it does seem that those using the BOINC Version 5.4.11 client are having more trouble with Rosetta 5.32 than those with BOINC 5.4.9. Also, FWIW I have a FRA 2rio WU running on an old laptop at ~400 MHz, 160 MB RAM, 900 MB swap space with Win2K SP4, BOINC 5.4.9, and Rosetta 5.32. Memory usage is 114 MB real, 229 MB virtual, and almost no hard page fault activity. Long-term average page fault rate is about 40 per second (mostly 'soft'). It has been running for about 12.5 CPU hours for 54,000 steps on the first model. Not fast, but it's persistent. :-) |
Astro![]() Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Hmmm, it may just be my imagination, but it does seem that those using the BOINC Version 5.4.11 client are having more trouble with Rosetta 5.32 than those with BOINC 5.4.9. you must be near the end, my celeron 500 took 10.82 hours for one of those. |
Bad_Wolf Send message Joined: 31 Jul 06 Posts: 4 Credit: 191,553 RAC: 0 |
Not an error but strange thing: new 5.32 works fine on my P4 1.8Ghz, 1Gb RAM (XP pro sp2), but after the first hour working it suspended at 19.9%... when it restarted it jumped immediatly at 41.2%. Is it normal? |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
It is normal as Rosetta adjusts the progress reporting based on the avaiable cpu time and the time consumption per model each time when it restarts... Not an error but strange thing: |
![]() ![]() Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Dunno what's wrong here, but it seems to have crunched fine, until Upload. Then I get this: Die 17 Okt 2006 00:19:30 CEST|rosetta@home|Resuming task 5croA_BOINC_ABRELAX_SAVE_ALL_OUT_truess__1259_1641_0 using rosetta version 532 It's this WU. I usually had no probs, and on other projects none as well. Here's the stderr.txt: stderr out <core_client_version>5.4.9</core_client_version> <stderr_txt> Graphics are disabled due to configuration... # random seed: 2366360 # cpu_run_time_pref: 43200 No heartbeat from core client for 31 sec - exiting Graphics are disabled due to configuration... # cpu_run_time_pref: 43200 Graphics are disabled due to configuration... # cpu_run_time_pref: 43200 SIGSEGV: segmentation violation SIGSEGV: segmentation violation Stack trace (15 frames): [0x895338b] [0x896decc] [0xffffe420] [0x8949656] [0x89494b4] [0x888f33f] [0x88923e7] [0x8832848] [0x88e5107] [0x833cff5] [0x806015d] [0x84137bb] [0x8514229] [0x89cd3b4] [0x8048111] Exiting... Stack trace (11 frames): [0x895338b] [0x896decc] [0xffffe420] [0x89ee663] [0x89bf7d1] [0x89c11f9] [0x8885461] [0x89d49bf] [0x8965a24] [0x896f135] [0x8a0096a] Exiting... Graphics are disabled due to configuration... # cpu_run_time_pref: 43200 ====================================================== DONE :: 1 starting structures built 131 (nstruct) times This process generated 131 decoys from 131 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> Any suggestions? |
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
Hmmm, it may just be my imagination, but it does seem that those using the BOINC Version 5.4.11 client are having more trouble with Rosetta 5.32 than those with BOINC 5.4.9. Thanks for the info, mmciastro! Your Celeron is around three times faster by BOINC benchmark values, and IIRC, your WU runtime was set at three hours. Runtime preference for my micro-cruncher is set at 18 hours. It's still crunching away on the '2rio' WU, now around 31 CPU hours consumed with about 85,800 steps completed on Model #1. I'm guessing it will take a total of 50 to 60 CPU hours to complete the model. I just hope that it can finish before it's canceled by the watchdog timer at around 72 CPU hours! Ah, well, at least it's very frugal with electrical power... :-) |
![]() Send message Joined: 26 Sep 06 Posts: 7 Credit: 536,631 RAC: 0 |
I'm getting very strange behaviour, too. I have two machines "running" Rosetta. One never gets any work-units nowadays, and hasn't crunched for weeks - that's running Fedora. If I stop it running SETI, it just sits there doing nothing. I don't know what's preventing it getting work from Rosetta. The other one (this one) seems to get a heck of a lot of Error messages (Client Error/Compute Error) but I'm not sure what those are. I don't get them with SETI. A couple of days ago, ZoneAlarm warned me that an .exe file (with a name starting with Rosetta) was trying to run on this machine. Since I didn't know anything about it, I told ZoneAlarm to block it - whereupon the machine crashed. I had to reboot. The .exe file didn't reappear, and we seem to be crunching again. I'm running BOINC 5.4.9 on this machine - I don't know which version the Fedora box is using. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Two Errors in a row. This is a First for me and my lowly P4. 10/16/2006 10:07:41 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 10/16/2006 10:07:41 AM|rosetta@home|Reason: Requested by user 10/16/2006 10:07:41 AM|rosetta@home|Reporting 1 tasks 10/16/2006 10:07:46 AM|rosetta@home|Scheduler request succeeded 10/16/2006 11:39:04 AM|rosetta@home|Unrecoverable error for result DOC_1EO8_pose_u_pert_with_bbmin_1282_856_0 ( - exit code -1073741819 (0xc0000005)) 10/16/2006 11:39:04 AM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 10/16/2006 11:39:04 AM||Rescheduling CPU: application exited Probably graphic related. 10/16/2006 7:39:49 PM|rosetta@home|Computation for task 1hz6A_BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_6609_0 finished 10/16/2006 7:39:49 PM|rosetta@home|Starting task 1di2__BOINC_NEWRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1275_9516_0 using rosetta version 532 10/16/2006 7:39:51 PM|rosetta@home|Started upload of file 1hz6A_BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_6609_0_0 10/16/2006 7:40:08 PM||Project communication failed: attempting access to reference site 10/16/2006 7:40:08 PM|rosetta@home|Temporarily failed upload of 1hz6A_BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_6609_0_0: http error 10/16/2006 7:40:08 PM|rosetta@home|Backing off 1 minutes and 0 seconds on upload of file 1hz6A_BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_6609_0_0 10/16/2006 7:40:10 PM||Access to reference site succeeded - project servers may be temporarily down. 10/16/2006 7:41:09 PM|rosetta@home|Started upload of file 1hz6A_BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_6609_0_0 10/16/2006 7:41:18 PM|rosetta@home|Finished upload of file 1hz6A_BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_6609_0_0 10/16/2006 7:41:18 PM|rosetta@home|Throughput 26604 bytes/sec This UW produced a "Validate Error". Probably an upload error. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Keith (Jillings), your ZoneAlarm prompt was caused by the new Rosetta version (the v5.32 that just came out this passed week). It was prompting you about a slightly different .exe then the one you had previously allowed (the version number is in the .exe file name). So, by denying that .exe access to run, you've prevented it from operating normally. Open ZoneAlarm, click on Program Control over on the left, then over on the top, click Programs, check and see what is displayed for Rosetta with the 5.32 in the file name. Rosetta Moderator: Mod.Sense |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
From the call trackback, the first error was caught somewhere in the code where Rosetta uses an external libary to dynamically allocate memory for a Fortran-like (now you know Rosetta is not a brand-new C++ program) array (though not related to graphic at all). It does not seem to be a generic error, but I have seen quite a few of them (same place and same error) in earlier bug-reporting posts. We will look into it and hopefully come up with a solution. The second one was actually ended by the watchdog since the simulation got stuck and the score was not changed for a long time. This could also be protein specific and sometimes it is just hard to keep the simulation on track as it is supposed to. Two Errors in a row. This is a First for me and my lowly P4. |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Wish I could be more helpful on this, but from the log message I can not tell whether it is a Rosetta problem or not. Looks like Rosetta was restarted several times and then crashed out. I hope that is just one-time thing. Dunno what's wrong here, but it seems to have crunched fine, until Upload. Then I get this: |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I just noticed this in you latest result is it a problem or something new for this app. <core_client_version>5.4.9</core_client_version> <stderr_txt> # random seed: 1881751 # cpu_run_time_pref: 28800 # cpu_run_time_pref: 28800 # cpu_run_time_pref: 28800 WARNING! error deleting file .xx1r69.out ====================================================== DONE :: 1 starting structures built 50 (nstruct) times This process generated 51 decoys from 51 attempts ![]() |
![]() Send message Joined: 15 Jun 06 Posts: 9 Credit: 163,610 RAC: 0 |
I am getting so many timeout errors (shows up as validation error) and computing errors that my machine is wasting half or more of its time on failed WUs. This seemed to start only with the new Rosetta 5.32 https://boinc.bakerlab.org/rosetta/results.php?hostid=296651 I'm running stock BOINC 5.4.11 and my PC is at stock speeds. I'm stumped and frustrated =/ |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Peter: WARNING! error deleting file .xx1r69.out I believe David Kim mentioned that the "can't delete *.out" error message was due to the fact that the file was being put into an archive/being compressed. It's a new warning that can be ignored. The file is removed when the results are returned. |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Hi TyroPyro, I found this message at the end of all your recent WUs w/ validator errors. ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! ********************************************************************** This means somehow the Rosetta simulations got stuck somewhere and was forced to stop by the watchdog. It looks like a random thing because your PC had finished some WUs from the same batch successfully, but I admit that this kind error seems to occur a bit more frequently than expected (on your host computer recently). The reason that you start to see this kind of error from 5.32 is because in this new application we add more functionalities to allow us to search even more complicated energy landscape. For example, we are now able to vary protein bond lengths and angles in addition to torsion angles and with these new degrees of freedom to search, the chance is higher to get caught by those errors. We are looking into them now and hopefully can uncover some bugs we are not aware of and address them in the next update. Thank you and everyone here for your generous support and contribution. ps -- if this error continues to happen on your computer, I would also suggest to re-update Rosetta application and database files as sometimes they could get corrupted. I am getting so many timeout errors (shows up as validation error) and computing errors that my machine is wasting half or more of its time on failed WUs. This seemed to start only with the new Rosetta 5.32 |
![]() Send message Joined: 12 Nov 05 Posts: 360 Credit: 17,728,716 RAC: 9,289 ![]() |
Hmmm... I guess this is a good place to ask about this... Not sure if I can explain the problem as well as I'd like to, but I'll give it a try. :) This doesn't happen very often, but when the screen saver kicks in and I come back to the computer, the screen saver seems to freeze, and I just found out that when I stop the screen saver task, the system recovers from the error and sends an error report to Microsoft. I'm not sure this'll help, but I've copied the error message that the BOINC manager reported: 18/10/2006 1:20:13 PM|rosetta@home|Unrecoverable error for result 1dtj__BOINC_NEWRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1275_19065_0 ( - exit code 1073807364 (0x40010004)) Any ideas as to how to correct this problem on my end? :) There are 10 types of people in the world: Those who understand binary, and those who don't. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
WU 37550474 *** Dump of the Worker thread (b34): *** - Information - Status: Ready, Base Priority: Above Normal, Priority: Above Normal, Kernel Time: 269375008.000000, User Time: 135861248000.000000, Wait Time: 8433195.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0076D5D2 read attempt to address 0x00000017 - Registers - eax=00000006 ebx=00000000 ecx=010846c8 edx=0e7d49c8 esi=0ede65d8 edi=0999dc38 eip=0076d5d2 esp=0999dc04 ebp=00b05b7c cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206 WU 37502803 <core_client_version>5.4.9</core_client_version> <stderr_txt> # cpu_run_time_pref: 28800 # random seed: 3284021 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -0.935522 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .xx1hz6.out </stderr_txt> |
![]() ![]() Send message Joined: 4 Oct 05 Posts: 51 Credit: 96,906 RAC: 0 |
I got a new wu, at stage 6 near the end of the backbone search its crapping out on me, three times now. Anyone one else ? |
Mod.Tymbrimi Volunteer moderator ![]() Send message Joined: 22 Aug 06 Posts: 148 Credit: 153 RAC: 0 |
|
Elgyn Send message Joined: 15 Oct 06 Posts: 1 Credit: 62,124 RAC: 0 |
I've recieved two errors from Rosetta@home, both of which resulted in the rosetta exe crashing. I've reported both of them to the microsoft crash analysis if that's of an use to you: 2006-10-17 06:55:32 [rosetta@home] Unrecoverable error for result 1dtj__BOINC_NEWRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1275_8344_0 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003)) 2006-10-18 10:22:08 [rosetta@home] Unrecoverable error for result 1dcj__BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT__1278_12985_0 ( - exit code -1073741819 (0xc0000005)) |
Message boards :
Number crunching :
Report problems with Rosetta version 5.32
©2025 University of Washington
https://www.bakerlab.org