Message boards : Number crunching : Minirosetta 1.97
Previous · 1 · 2 · 3
Author | Message |
---|---|
![]() ![]() Send message Joined: 5 Jun 08 Posts: 9 Credit: 1,307,108 RAC: 0 |
Had a validate error again (https://boinc.bakerlab.org/rosetta/result.php?resultid=283360235), though the wu ran through smoothly. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I'm seeing several workunits with names like histone_loopbuild_run1_* (sample 283470540) fail with a validate error after about 20 minutes on Mac OS X 10.6.1, but there's nothing in the log to hint at the problem. etting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ====================================================== DONE :: 2 starting structures 1201 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I've had a few tasks that have bailed out early with the new app, that could have run for hours and done more models, like this one only ran 32min. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=258477161 histone_loopbuild_run1_14925_27036_0 # cpu_run_time_pref: 14400 ====================================================== DONE :: 2 starting structures 1949.47 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== Got credit by the way. ![]() |
![]() ![]() Send message Joined: 5 Jun 06 Posts: 154 Credit: 279,018 RAC: 0 |
Seems like these "histone" ones are real trouble. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5081 |
![]() Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,372,156 RAC: 313 |
Too many tasks are stalling. I'm switching my resources to World Community Grid for a while, hoping this will be fixed. The last I looked, Ralph@home was still testing 1.95 and hadn't started 1.96 or 1.97. Typical of several versions lately. I've decided to stop Rosetta@home and Ralph@home participation on my computer with the least memory per processor until the memory leak problem is fixed, and also put off even starting them on my new laptop, but continue them on a third computer. I've noticed one item that might have something to do with the memory leak - Windows Task Manager on the computer where I've participated the longest reports 485596 handles, on the second one where I started participating it reports 27185, and on my new laptop it reports 21283. The Vista help file says very little about handles, but if you search long enough, you'll eventually find a statement that ordinary users don't need to know what they are; they should talk to a programmer or administrator about any problems with them; and it does not offer any way to determine how many are attached to what program, or even a statement about whether they are normally attached to programs. Ignore the statement that the proper name for them is object handles, since that appears to be the only time the help file even mentions object handles if you haven't installed any software with more specific information about them. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi Robert. The way i understand it is that when mini 1.95 was released here, there was a bit of a foul up with some files on the server being sent out to us. Only then the version number was changed to 1.96 which that caused some other problems. So the version number was changed again to 1.97 to go with all the new files, so you see the app is really the same as 1.95 that is on Ralph just the version number is different here. ( don't quote me ;) Hope that helps. ![]() |
![]() ![]() Send message Joined: 5 Jun 06 Posts: 154 Credit: 279,018 RAC: 0 |
I have had another "histone" work unit error out on me. I'm just going to abort any of them that get sent my way. I don't feel like wasting time running junk work units. |
![]() Send message Joined: 5 Aug 09 Posts: 5 Credit: 1,356,008 RAC: 0 |
I have had another "histone" work unit error out on me. I'm just going to abort any of them that get sent my way. I don't feel like wasting time running junk work units. I’m doing the same thing. The histone units have been getting to around 4% done and then just stopping there. I can’t view any of the graphics for these work units and even though there original work time is set at 4 hours I had two that had spent 10 hours and were still only at 4% |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
At Mod.Sence request, This is one of 3 errors I got yesterday. 283639845 258715590 26 Sep 2009 22:00:57 UTC 27 Sep 2009 8:19:32 UTC Over Validate error Done 1,157.33 --- --- The wingman has also a Validate error for this. Sorry I don't know how to make it "clickable". Greetings, TJ. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2219 Credit: 42,280,090 RAC: 24,002 ![]() |
At Mod.Sense request, This is histone_loopbuild_run1_14925_69353_1 Running on an Intel i7 920 Vista Ultimate 64 SP2 As stated, the wingman also produced an error on a Phenom II 955 running Windows 7 64 bit, though both were given credit for their short runtimes and 2 decoys processed. There does seem something odd about this kind of job, but people aren't actually being penalised for it. More a problem for the project to work out. ![]() ![]() |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Sid thanks for linking, and while you were editing, I was moving ;) Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2219 Credit: 42,280,090 RAC: 24,002 ![]() |
^ I know - spooky! Anyway... I've come up with 2 errors in the last week, both coming from the same type of job. sel_core_5.0_low50_beta_low200_start_hb_t374__IGNORE_THE_REST_14879_226_1 sel_core_5.0_low50_beta_low200_start_hb_t374__IGNORE_THE_REST_14879_799_1 - Unhandled Exception Record - ![]() ![]() |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Just returned this, it finished short of my runtime, looks odd no models. I've cut down the txt from result. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=258992201 symm_lr8_seq_score12_ss_1.7_rlbd_1t2i_IGNORE_THE_REST_DECOY_14923_1529_0 Continuing computation from checkpoint: chk_NoTag_SequenceRelax__chk46_fa ... success! ERROR: Could not find disulfide partner for residue 7 ERROR:: Exit from: src/core/scoring/disulfides/FullatomDisulfideEnergyContainer.cc line: 562 called boinc_finish </stderr_txt> ]]> Validate state Valid Claimed credit 20.3091706964679 Granted credit 17.1487999873651 application version 1.97 ![]() |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I've had a number of "frb" WUs run out of disk space with multi-gigabyte stderr.txt files. Those files were full of "bounds error" statements. example: https://boinc.bakerlab.org/rosetta/result.php?resultid=284162008 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Why are these tasks that have failed on Ralph let loss here! The same type of error as seen over their. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=259185621 Tue 29 Sep 2009 19:31:59 EST|rosetta@home|Aborting task frb_0_8_mike_chosen_cst_oct09_hb_t313__IGNORE_THE_REST_1I5SA_2_14958_15_1: exceeded disk limit: 301.57MB > 286.10MB EDIT// Just to add i happen to see it running and it was wrighting to disk every 5 sec 1.3MB. ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2219 Credit: 42,280,090 RAC: 24,002 ![]() |
https://boinc.bakerlab.org/rosetta/result.php?resultid=284162008 I'm sure it's unrelated seeing as the wingman errored out with a more recent version (plus it's none of my business really) but why are you running Boinc 5.2.13? That can't be good, can it? ![]() ![]() |
288VKYUjwsXfAaTXn6SFJC4LVPRf Send message Joined: 16 Dec 05 Posts: 31 Credit: 153,110 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=284162008 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=259177853 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=259177853 2 WU's with same problem. Both frb_ . Should I allow more disk space for those WU's ? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2219 Credit: 42,280,090 RAC: 24,002 ![]() |
Over the last week, these two errored out with Server state Over frb_0_8_mike_chosen_cst_oct09_hb_t290__IGNORE_THE_REST_1XO7A_3_14951_1_0 frb_0_8_mike_chosen_cst_oct09_hb_t293__IGNORE_THE_REST_1NV8A_16_14952_18_0 While the following three errored out with - Unhandled Exception Record - frb_0_8_mike_chosen_cst_oct09_hb_t374__IGNORE_THE_REST_1TIQA_7_14969_28_0 frb_0_8_mike_chosen_cst_oct09_hb_t374__IGNORE_THE_REST_1Y9KA_7_14969_28_0 frb_0_8_mike_chosen_cst_oct09_hb_t374__IGNORE_THE_REST_1Z4EA_4_14969_28_0 ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2219 Credit: 42,280,090 RAC: 24,002 ![]() |
Over the last week I had no errors at all out of 150 WUs. Well done guys. ![]() ![]() |
![]() Send message Joined: 10 Jan 06 Posts: 28 Credit: 139,737 RAC: 0 |
Had this message: 10/12/2009 12:49:02 PM|rosetta@home|Task abinitio_withrelax_homfrag_129_B_1ctf__SAVE_ALL_OUT_15148_499_0 exited with a DLL initialization error. It is still running, should finish in the next hour. |
Message boards :
Number crunching :
Minirosetta 1.97
©2025 University of Washington
https://www.bakerlab.org