Report Problems with Rosetta Version 5.25

Author	Message
Vester Send message Joined: 2 Nov 05 Posts: 259 Credit: 4,625,443 RAC: 1	Message 23867 - Posted: 20 Aug 2006, 19:12:52 UTC Last modified: 20 Aug 2006, 19:21:09 UTC This happened when I opened BOINC Manager to Run benchmarks using BOINC Manager 5.4.11. I have 550 MB free RAM of 1 GB installed, AMD Barton core at 2113 MHz and stable for a long time without any errors. Running Windows Vista Beta 2 build 5384. 8/20/2006 10:50:22 AM\|rosetta@home\|Resuming task 1wit__BOINC_BACKBONE_O_PENALTY_ABRELAX_SAVE_ALL_OUT__1176_756_0 using rosetta version 525 8/20/2006 2:52:41 PM\|rosetta@home\|Unrecoverable error for result 1wit__BOINC_BACKBONE_O_PENALTY_ABRELAX_SAVE_ALL_OUT__1176_756_0 ( - exit code -1073741819 (0xc0000005)) Here is the result: https://boinc.bakerlab.org/rosetta/result.php?resultid=33562744. My display didn't blink and I'd have missed the event if not looking for my unoptimized benchmarks. Edit for clarification: This is a new installation that has never been optimized. ID: 23867 · Rating: 0 · rate: / Reply Quote

Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 0	Message 23984 - Posted: 21 Aug 2006, 1:22:42 UTC Last modified: 21 Aug 2006, 1:25:01 UTC Just noticed this one. Happened about 24 hours ago: https://boinc.bakerlab.org/rosetta/result.php?resultid=33412986 Charlie -Charlie ID: 23984 · Rating: 0 · rate: / Reply Quote

Alan Roberts Send message Joined: 7 Jun 06 Posts: 61 Credit: 6,901,926 RAC: 0	Message 23989 - Posted: 21 Aug 2006, 1:54:41 UTC Hi, Across the machines I have running Rosetta, I've seen a handful of failures recently. Majority have reported an incorrect funtion in dock_structure.cc, with at least one in pack.cc. Seems to be very similar to what others have reported recently. A bit surprising since I don't remember 5.25 throwing any errors during CASP. Is it valuable to document the WUs here? My first thought is that the project team must have tools to filter error results from all the returned results for investigation, and documentation here wouldn't be necessary. If that is incorrect and it would help, I'll hang another reply onto the thread. Cheers, Alan ID: 23989 · Rating: 0 · rate: / Reply Quote

Ethan Volunteer moderator Send message Joined: 22 Aug 05 Posts: 286 Credit: 9,304,700 RAC: 0	Message 23990 - Posted: 21 Aug 2006, 2:02:38 UTC - in response to Message 23989. Is it valuable to document the WUs here? Yes! See this thread https://boinc.bakerlab.org/forum_thread.php?id=2144. Rhiju is the scientist who submitted the WU. ID: 23990 · Rating: 0 · rate: / Reply Quote

Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 0	Message 23991 - Posted: 21 Aug 2006, 2:06:59 UTC Last modified: 21 Aug 2006, 2:07:15 UTC And another. Same error, similar WU (I think) as I and others have reported. Could there be a bad batch of WUs? https://boinc.bakerlab.org/rosetta/result.php?resultid=33514109 Charlie -Charlie ID: 23991 · Rating: 0 · rate: / Reply Quote

Alan Roberts Send message Joined: 7 Jun 06 Posts: 61 Credit: 6,901,926 RAC: 0	Message 23994 - Posted: 21 Aug 2006, 2:25:24 UTC Per Ethan's reply: 33559891 error in dock_structure.cc 33584990 error in dock_structure.cc 33513409 error in dock_structure.cc 32315514 error in dock_structure.cc I'm sure I've seen pack.cc errors as well, but these seem to have fallen off the back end of the results list. Cheers, Alan ID: 23994 · Rating: 0 · rate: / Reply Quote

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 24010 - Posted: 21 Aug 2006, 5:15:41 UTC 1opd__BOINC_BACKBONE_HN_PENALTY_ABRELAX_SAVE_ALL_OUT__1175_174_0 <core_client_version>5.4.9</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1669857 # cpu_run_time_pref: 86400 ERROR:: Exit at: .dock_structure.cc line:401 </stderr_txt> 33445584 Seems to be a few of them popping up.. (that's 3 of the 10 WUs that have run on this machine this last week.) ID: 24010 · Rating: 0 · rate: / Reply Quote

Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0	Message 24013 - Posted: 21 Aug 2006, 5:31:11 UTC I got one too: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=29059205 Result: https://boinc.bakerlab.org/rosetta/result.php?resultid=33499101 stderr out <core_client_version>5.5.13</core_client_version> <![CDATA[ <message> Forkert funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1690643 # cpu_run_time_pref: 10800 ERROR:: Exit at: .dock_structure.cc line:401 </stderr_txt> ]]> [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] ID: 24013 · Rating: 0 · rate: / Reply Quote

Jack Shaftoe Send message Joined: 30 Apr 06 Posts: 115 Credit: 1,307,916 RAC: 0	Message 24110 - Posted: 21 Aug 2006, 15:52:38 UTC - in response to Message 24010. Last modified: 21 Aug 2006, 16:29:30 UTC Incorrect function. (0x1) - exit code 1 (0x1) Seems to be a few of them popping up.. (that's 3 of the 10 WUs that have run on this machine this last week.) Me too. https://boinc.bakerlab.org/rosetta/results.php?hostid=288399 Got 5 of them on this host this morning. The new credit system gives me zero credit for them too. They all seem to have been resubmitted to other hosts. I am running 5.4.11 on this machine. Edit to say that I have another machine with identical hardware running 5.4.11 and it has zero failed WU's. Maybe a config problem? I dunno.. Team Starfire World BOINC ID: 24110 · Rating: 0 · rate: / Reply Quote

AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0	Message 24121 - Posted: 21 Aug 2006, 17:07:10 UTC - in response to Message 24110. The new credit system gives me zero credit for them too. You seem to be getting credit for them. The script only runs once a day, though, so you haven't gotten credit for the ones returned today yet. Also, the credit doesn't show in the list, but it does show if you look at the result. ID: 24121 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0	Message 24123 - Posted: 21 Aug 2006, 17:14:22 UTC - in response to Message 24110. Incorrect function. (0x1) - exit code 1 (0x1) Seems to be a few of them popping up.. (that's 3 of the 10 WUs that have run on this machine this last week.) Me too. https://boinc.bakerlab.org/rosetta/results.php?hostid=288399 Got 5 of them on this host this morning. The new credit system gives me zero credit for them too. They all seem to have been resubmitted to other hosts. I am running 5.4.11 on this machine. Edit to say that I have another machine with identical hardware running 5.4.11 and it has zero failed WU's. Maybe a config problem? I dunno.. Do I understand this right: In the old system you even got credit for invalid results? Why should this be? ID: 24123 · Rating: 0 · rate: / Reply Quote

NJMHoffmann Send message Joined: 17 Dec 05 Posts: 45 Credit: 45,891 RAC: 0	Message 24142 - Posted: 21 Aug 2006, 18:31:34 UTC - in response to Message 24123. Last modified: 21 Aug 2006, 18:33:36 UTC Do I understand this right: In the old system you even got credit for invalid results? Why should this be? Because here at Rosetta@home the software is tested. Part of the data, sent to you with a new WU, is code to test. So bugs in the software should not effect credit. Norbert PS: It's not new. IIRC Seti does this for years, when aborted WUs get credit. At Seti it's corrupt data (or useless data), that causes the aborts (Error 9??). ID: 24142 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 24143 - Posted: 21 Aug 2006, 18:33:17 UTC - in response to Message 24123. Do I understand this right: In the old system you even got credit for invalid results? Why should this be? Saenger, just before Ralph started (and probably the reason for ralph existence) is that they were have many wus fail most the way through or get stuck at 1% for days. People were screaming about tying up there puters for that period and not getting some form of reward. They started handing out credit as claimed for the run time up to a certain limit (can't remember the limit, 300 I think). Now that ralph is here, the incidents of failed wus is very low, and I hope they stop that practice. tony ID: 24143 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0	Message 24145 - Posted: 21 Aug 2006, 18:36:32 UTC - in response to Message 24142. Last modified: 21 Aug 2006, 18:37:40 UTC Do I understand this right: In the old system you even got credit for invalid results? Why should this be? Because here at Rosetta@home the software is tested. Part of the data, sent to you with a new WU, is code to test. So bugs in the software should not effect credit. Norbert PS: It's not new. IIRC Seti does this for years, when aborted WUs get credit. At Seti its corrupt data (or useless data), that causes the aborts (Error 9??). OK, I understand. But those results are not marked "invalid", that's the difference. The only other project I know that grants anything for "invalid" is LHC. There you get half of the credits granted as those with valid results. So I think the labelling should be changed, as it's also possible that a result is really invalid, for example when the hardware is faulty and delivers no useful results. ID: 24145 · Rating: 0 · rate: / Reply Quote

NJMHoffmann Send message Joined: 17 Dec 05 Posts: 45 Credit: 45,891 RAC: 0	Message 24146 - Posted: 21 Aug 2006, 18:37:26 UTC - in response to Message 24143. Now that ralph is here, the incidents of failed wus is very low, and I hope they stop that practice. No. (see my answer to Saenger for my argument). Norbert ID: 24146 · Rating: 0 · rate: / Reply Quote

carl.h Send message Joined: 28 Dec 05 Posts: 555 Credit: 183,449 RAC: 0	Message 24147 - Posted: 21 Aug 2006, 18:37:28 UTC I hope the practice continues, if the WU is what is wrong nothing to do with your system and you have spent say 23 hours of a 24 hour unit working why should you not get credits ? Not all Czech`s bounce but I`d like to try with Barbar ;-) Make no mistake This IS the TEDDIES TEAM. ID: 24147 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 24148 - Posted: 21 Aug 2006, 18:38:29 UTC - in response to Message 24142. Last modified: 21 Aug 2006, 18:39:14 UTC Do I understand this right: In the old system you even got credit for invalid results? Why should this be? Because here at Rosetta@home the software is tested. Part of the data, sent to you with a new WU, is code to test. So bugs in the software should not effect credit. Norbert PS: It's not new. IIRC Seti does this for years, when aborted WUs get credit. At Seti it's corrupt data (or useless data), that causes the aborts (Error 9??). Norbert, Yes, -9 "result overflow" is the ONLY error that will get credit at seti. It just means there was too much RFI in the signal. There's a limit and when reached it aborts the wu and you get proportional credit for it. It's usually on a matter of a minute or two runtime before it terminates though. Here's 3 examples from my file: 361014709 86507692 1 Aug 2006 2:45:43 UTC 2 Aug 2006 19:13:29 UTC Over Success Done 138.39 0.11 0.11 2.8615 2.8615 361337990 86585100 1 Aug 2006 21:05:59 UTC 2 Aug 2006 19:13:29 UTC Over Success Done 139.64 0.12 0.12 3.0937 3.0937 360389054 86356170 30 Jul 2006 11:05:05 UTC 30 Jul 2006 17:05:44 UTC Over Success Done 58.78 0.11 0.11 6.7370 6.7370 as you can see they only ran 138 seconds, 139 seconds, and 58 seconds respectively. Now there are a few that run into hours before failing. ID: 24148 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0	Message 24149 - Posted: 21 Aug 2006, 18:39:00 UTC - in response to Message 24147. I hope the practice continues, if the WU is what is wrong nothing to do with your system and you have spent say 23 hours of a 24 hour unit working why should you not get credits ? That's right, and that's why they grant something over @LHC. But how is it determined that it was the software, and not the hardware? ID: 24149 · Rating: 0 · rate: / Reply Quote

anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0	Message 24150 - Posted: 21 Aug 2006, 18:40:20 UTC - in response to Message 24147. Last modified: 21 Aug 2006, 18:48:20 UTC I hope the practice continues, if the WU is what is wrong nothing to do with your system and you have spent say 23 hours of a 24 hour unit working why should you not get credits ? My opinion is that if the "decoys" is ok you should get credit for them. Anders n [edit] I assume that if the computer has done 5 decoys and fails on no 6 it reports the 5 that was ok ?! [/edit] ID: 24150 · Rating: 0 · rate: / Reply Quote

NJMHoffmann Send message Joined: 17 Dec 05 Posts: 45 Credit: 45,891 RAC: 0	Message 24151 - Posted: 21 Aug 2006, 18:43:08 UTC - in response to Message 24145. So I think the labelling should be changed, as it's also possible that a result is really invalid, for example when the hardware is faulty and delivers no useful results. It would be difficult to decide: Is the result invalid, because the computer failed? Or is the result invalid, because the used "routines / parameter combination" doesn't work? The second is a very useful result for Rosetta. Norbert ID: 24151 · Rating: 0 · rate: / Reply Quote