Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 276 · 277 · 278 · 279 · 280 · 281 · 282 . . . 316 · Next
Author | Message |
---|---|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 202 Credit: 6,913,506 RAC: 10,695 ![]() |
in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks. Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified? Application Rosetta Beta 6.05 Name 7a_hal_l_hal_7aa_391_d694_ce_0001_SAVE_ALL_OUT_2977935_67 State Running Received Fri 26 Apr 2024 02:37:53 AM EDT Report deadline Mon 29 Apr 2024 02:37:53 AM EDT Estimated computation size 80,000 GFLOPs CPU time 05:15:37 CPU time since checkpoint 00:17:21 Elapsed time 05:19:11 Estimated time remaining 02:44:47 Fraction done 65.667% Virtual memory size 468.18 MB Working set size 364.18 MB Directory slots/11 Process ID 2777585 Progress rate 12.240% per hour Executable rosetta_beta_6.05_x86_64-pc-linux-gnu ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified?Exactly the way i posted- they start with RosettaVS_ The one you posted starts with 7a_hal_l_hal_ Application Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 277 Credit: 523,512 RAC: 500 |
Mine look like this. (This is really one of them.)
|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 202 Credit: 6,913,506 RAC: 10,695 ![]() |
OK. I now have three of the RosettaVS_ Tasks and they are as you say. Since I have 128 GBytes of RAM, I do not expect problems. Application Rosetta Beta 6.05 Name RosettaVS_SAVE_ALL_OUT_NOJRAN_KCa2_homology_fulldb_IGNORE_THE_REST_vF8nFW_8_1999_2977959_2 Estimated computation size 80,000 GFLOPs Virtual memory size 1.19 GB Working set size 1.03 GB Progress rate 10.440% per hour Executable rosetta_beta_6.05_x86_64-pc-linux-gnu Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified? ![]() |
![]() Send message Joined: 1 Dec 05 Posts: 2030 Credit: 10,121,026 RAC: 12,565 ![]() |
The validation server is down... |
mrchips Send message Joined: 11 Nov 09 Posts: 10 Credit: 15,306,930 RAC: 4,549 ![]() |
issues State: All (3339) · In progress (163) · Validation pending (154) · Validation inconclusive (0) · Valid (2933) |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
The validation server is down...Not again... At least the rest are still up (for now). Yep, boinc-process is down again. It wouldn't be a big ask to run a Cron job on a system remote from the servers to check if they're there & running or not, and send an email and text to someone to let them know if they've go MIA... Looking at the hardware list, it is getting on (and the OS is 8 years old!). Even a single socket mid-range CPU of the lower end EPYC systems could replace all of the existing systems, with not only significantly more performance, but all while using way, way, way less power. Price wise they're a bargain for what they can do, but they're still not exactly cheap in absolute terms. Grant Darwin NT |
![]() Send message Joined: 1 Dec 05 Posts: 2030 Credit: 10,121,026 RAC: 12,565 ![]() |
Yep, boinc-process is down again. Insert, during the boinc project server creation/configuration, a MANDATORY e-mail to use for emergency (daemon crash, problem with queues, etc) But i think it needs to be done by the boinc developers... Looking at the hardware list, it is getting on (and the OS is 8 years old!). I also noticed that os and hw is old. But another volunteer said to me that, maybe, the status server page is not updated and that, maybe, the hw and os is updated. I don't think so. P.S. Now, over 200k wus pending validation!! |
![]() Send message Joined: 1 Dec 05 Posts: 2030 Credit: 10,121,026 RAC: 12,565 ![]() |
P.S. Now, over 200k wus pending validation!! Now 270k And no news from admins |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
Server is still dead. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 202 Credit: 6,913,506 RAC: 10,695 ![]() |
Server is still dead. It seem mostly up for me. top - 20:51:09 up 2 days, 12:17, 2 users, load average: 13.33, 13.65, 13.72 Tasks: 474 total, 14 running, 460 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 0.2 sy, 80.3 ni, 18.4 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 128074.1 total, 33544.1 free, 6219.7 used, 88310.2 buff/cache MiB Swap: 15992.0 total, 15992.0 free, 0.0 used. 120200.2 avail Mem PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 469545 2039 boinc 39 19 R 1.4g 1.2 98.8 15 287:51.62 ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-li+ 504299 2039 boinc 39 19 R 444456 0.3 98.8 5 26:25.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 482867 2039 boinc 39 19 R 213072 0.2 98.6 13 208:50.81 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 504592 2039 boinc 39 19 R 212384 0.2 99.1 6 24:10.34 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 2039 1 boinc 30 10 S 73336 0.1 0.1 6 44900:08 /usr/bin/boinc ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
Nope.Server is still dead.It seem mostly up for me. The boinc-process server is still dead, that's according to the Server Staus page & the number of Tasks that are piling up waiting for Validation & Assimilation. Waiting for Validation is over 325,000 now. That's why even though people are returning work, their Credit isn't increasing & their RAC is going down. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
I don't want to tempt fate, but the boinc-process server appears to be alive again (at least for now). Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
I really wish they'd fix the application error handling, or at least the data they send out to process. Got a bunch of Tasks that have errored out. ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.*deep sigh* Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
I don't want to tempt fate, but the boinc-process server appears to be alive again (at least for now).And the backlog has cleared. Grant Darwin NT |
![]() Send message Joined: 18 May 16 Posts: 2 Credit: 5,562,366 RAC: 778 |
I am receiving a constant error message via BOINC re Rosetta@Home and I am not sure how to resolve it. The message (relating solely to Rosetta@Home) is: "Could not determine location of executable. Could not find database. Either specify -database or set variable ROSETTA3_db" Can someone advise where in user files (I assume) a configuration file relating to BOINC and Rosetta@Home needs modification? Many thanks, Chris Raisin |
![]() Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,372,156 RAC: 1,028 |
I am receiving a constant error message via BOINC re Rosetta@Home and I am not sure how to resolve it. I've seen that message many times. Until those workunits get some hard to guess change, expect many more workunits running under Windows to have the same problem. |
![]() Send message Joined: 28 Mar 20 Posts: 1759 Credit: 18,534,891 RAC: 318 |
I am receiving a constant error message via BOINC re Rosetta@Home and I am not sure how to resolve it.Where are those error messages being shown? Looking at your results, there are only 2 that have errored out, ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.Which has been an issue with some Tasks for ages now. Other than what appears to be a heavily loaded system (11.5 hours to do 8 hours work, 4 hrs 15 min to do 3 hrs work), other than the 2 errored Tasks(due to a configuration issue with the Tasks themselves), all the others have processed & Validated without issue. Grant Darwin NT |
![]() Send message Joined: 1 Dec 05 Posts: 2030 Credit: 10,121,026 RAC: 12,565 ![]() |
Where are those error messages being shown? Seems the message of the screensaver... |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 412 Credit: 12,566,785 RAC: 12,602 ![]() |
A strange error, sadly I can only give a sketchy report but I hope it’s enough :- Host = https://boinc.bakerlab.org/rosetta/results.php?hostid=6231982 Boinc 7.24.1, Ubuntu 22.04.4 I allowed Ubuntu to update and then rebooted, subsequent to this Boinc Manager disconnected after running for about a minute - the event log showed a Rosetta task restarting and immediately Boinc closing having received signal 15. This would repeat each time I restated the host and the Boinc service restarted. I have now aborted all of the Rosetta tasks and this behaviour has now stopped. (How) can a Rosetta task kill Boinc? Just a notification as I’ve never heard this described before. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org