Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 276 · 277 · 278 · 279 · 280 · 281 · 282 . . . 316 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 202
Credit: 6,913,506
RAC: 10,695
Message 109203 - Posted: 28 Apr 2024, 4:31:24 UTC - in response to Message 109199.  

in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks.


Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified?

Application
Rosetta Beta 6.05 
Name
7a_hal_l_hal_7aa_391_d694_ce_0001_SAVE_ALL_OUT_2977935_67
State
Running
Received
Fri 26 Apr 2024 02:37:53 AM EDT
Report deadline
Mon 29 Apr 2024 02:37:53 AM EDT
Estimated computation size
80,000 GFLOPs
CPU time
05:15:37
CPU time since checkpoint
00:17:21
Elapsed time
05:19:11
Estimated time remaining
02:44:47
Fraction done
65.667%
Virtual memory size
468.18 MB
Working set size
364.18 MB
Directory
slots/11
Process ID
2777585
Progress rate
12.240% per hour
Executable
rosetta_beta_6.05_x86_64-pc-linux-gnu

ID: 109203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109205 - Posted: 28 Apr 2024, 6:01:03 UTC - in response to Message 109203.  

Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified?
Exactly the way i posted- they start with RosettaVS_
The one you posted starts with 7a_hal_l_hal_

Application
Rosetta Beta 6.05
Name
7a_hal_l_hal_7aa_391_d694_ce_0001_SAVE_ALL_OUT_2977935_67
State
Running
Received
Fri 26 Apr 2024 02:37:53 AM EDT
Report deadline
Mon 29 Apr 2024 02:37:53 AM EDT
Estimated computation size
80,000 GFLOPs
CPU time
05:15:37
CPU time since checkpoint
00:17:21
Elapsed time
05:19:11
Estimated time remaining
02:44:47
Fraction done
65.667%
Virtual memory size
468.18 MB
Working set size
364.18 MB
Directory
slots/11
Process ID
2777585
Progress rate
12.240% per hour
Executable
rosetta_beta_6.05_x86_64-pc-linux-gnu

Grant
Darwin NT
ID: 109205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 277
Credit: 523,512
RAC: 500
Message 109206 - Posted: 28 Apr 2024, 9:46:30 UTC - in response to Message 109205.  

Mine look like this. (This is really one of them.)


Project Rosetta@home

Name RosettaVS_SAVE_ALL_OUT_NOJRAN_CHIP_8EHZ_fulldb_IGNORE_THE_REST_9GjqZI_5_4100_2977977_2_0

Application Rosetta Beta 6.05
Workunit name RosettaVS_SAVE_ALL_OUT_NOJRAN_CHIP_8EHZ_fulldb_IGNORE_THE_REST_9GjqZI_5_4100_2977977_2
State Running
Received 4/28/2024 8:42:31 AM
Report deadline 5/1/2024 8:42:30 AM
Estimated app speed 2.78 GFLOPs/sec
Estimated task size 80 000 GFLOPs
CPU time at last checkpoint 00:00:00
CPU time 02:43:13
Elapsed time 02:46:22
Estimated time remaining 06:19:37
Fraction done 20.911%
Virtual memory size 2 200.04 MB
Working set size 1 973.22 MB
Directory slots/4
Process ID 193635

Debug State: 2 - Scheduler: 2

ID: 109206 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 202
Credit: 6,913,506
RAC: 10,695
Message 109207 - Posted: 28 Apr 2024, 12:06:35 UTC - in response to Message 109205.  

OK. I now have three of the RosettaVS_ Tasks and they are as you say.
Since I have 128 GBytes of RAM, I do not expect problems.

Application
Rosetta Beta 6.05 
Name
RosettaVS_SAVE_ALL_OUT_NOJRAN_KCa2_homology_fulldb_IGNORE_THE_REST_vF8nFW_8_1999_2977959_2

Estimated computation size
80,000 GFLOPs

Virtual memory size
1.19 GB
Working set size
1.03 GB

Progress rate
10.440% per hour
Executable
rosetta_beta_6.05_x86_64-pc-linux-gnu




Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified?

Exactly the way i posted- they start with RosettaVS_
The one you posted starts with 7a_hal_l_hal_

Application
Rosetta Beta 6.05
Name
7a_hal_l_hal_7aa_391_d694_ce_0001_SAVE_ALL_OUT_2977935_67

ID: 109207 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,121,026
RAC: 12,565
Message 109211 - Posted: 1 May 2024, 10:52:06 UTC - in response to Message 109207.  

The validation server is down...
ID: 109211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrchips

Send message
Joined: 11 Nov 09
Posts: 10
Credit: 15,306,930
RAC: 4,549
Message 109212 - Posted: 1 May 2024, 20:16:42 UTC

issues

State: All (3339) · In progress (163) · Validation pending (154) · Validation inconclusive (0) · Valid (2933)
ID: 109212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109214 - Posted: 2 May 2024, 0:28:14 UTC - in response to Message 109211.  

The validation server is down...
Not again...
At least the rest are still up (for now).

Yep, boinc-process is down again.
It wouldn't be a big ask to run a Cron job on a system remote from the servers to check if they're there & running or not, and send an email and text to someone to let them know if they've go MIA...


Looking at the hardware list, it is getting on (and the OS is 8 years old!).
Even a single socket mid-range CPU of the lower end EPYC systems could replace all of the existing systems, with not only significantly more performance, but all while using way, way, way less power.
Price wise they're a bargain for what they can do, but they're still not exactly cheap in absolute terms.
Grant
Darwin NT
ID: 109214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,121,026
RAC: 12,565
Message 109215 - Posted: 2 May 2024, 10:15:03 UTC - in response to Message 109214.  
Last modified: 2 May 2024, 10:16:12 UTC

Yep, boinc-process is down again.
It wouldn't be a big ask to run a Cron job on a system remote from the servers to check if they're there & running or not, and send an email and text to someone to let them know if they've go MIA...


Insert, during the boinc project server creation/configuration, a MANDATORY e-mail to use for emergency (daemon crash, problem with queues, etc)
But i think it needs to be done by the boinc developers...


Looking at the hardware list, it is getting on (and the OS is 8 years old!).

I also noticed that os and hw is old.
But another volunteer said to me that, maybe, the status server page is not updated and that, maybe, the hw and os is updated.
I don't think so.


P.S. Now, over 200k wus pending validation!!
ID: 109215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,121,026
RAC: 12,565
Message 109221 - Posted: 2 May 2024, 18:46:59 UTC - in response to Message 109215.  

P.S. Now, over 200k wus pending validation!!


Now 270k
And no news from admins
ID: 109221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109223 - Posted: 2 May 2024, 21:51:14 UTC

Server is still dead.
Grant
Darwin NT
ID: 109223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 202
Credit: 6,913,506
RAC: 10,695
Message 109224 - Posted: 3 May 2024, 0:53:43 UTC - in response to Message 109223.  

Server is still dead.

It seem mostly up for me.

top - 20:51:09 up 2 days, 12:17,  2 users,  load average: 13.33, 13.65, 13.72
Tasks: 474 total,  14 running, 460 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  0.2 sy, 80.3 ni, 18.4 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
MiB Mem : 128074.1 total,  33544.1 free,   6219.7 used,  88310.2 buff/cache
MiB Swap:  15992.0 total,  15992.0 free,      0.0 used. 120200.2 avail Mem 

    PID    PPID USER      PR  NI S    RES  %MEM  %CPU  P     TIME+ COMMAND                                                                   
 469545    2039 boinc     39  19 R   1.4g   1.2  98.8 15 287:51.62 ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-li+ 
 504299    2039 boinc     39  19 R 444456   0.3  98.8  5  26:25.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 
 482867    2039 boinc     39  19 R 213072   0.2  98.6 13 208:50.81 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 
 504592    2039 boinc     39  19 R 212384   0.2  99.1  6  24:10.34 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 
   2039       1 boinc     30  10 S  73336   0.1   0.1  6  44900:08 /usr/bin/boinc   

ID: 109224 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109225 - Posted: 3 May 2024, 2:26:32 UTC - in response to Message 109224.  

Server is still dead.
It seem mostly up for me.
Nope.
The boinc-process server is still dead, that's according to the Server Staus page & the number of Tasks that are piling up waiting for Validation & Assimilation.
Waiting for Validation is over 325,000 now.

That's why even though people are returning work, their Credit isn't increasing & their RAC is going down.
Grant
Darwin NT
ID: 109225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109226 - Posted: 3 May 2024, 4:21:57 UTC

I don't want to tempt fate, but the boinc-process server appears to be alive again (at least for now).
Grant
Darwin NT
ID: 109226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109227 - Posted: 3 May 2024, 4:25:08 UTC

I really wish they'd fix the application error handling, or at least the data they send out to process. Got a bunch of Tasks that have errored out.

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
*deep sigh*
Grant
Darwin NT
ID: 109227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109228 - Posted: 3 May 2024, 11:06:28 UTC - in response to Message 109226.  

I don't want to tempt fate, but the boinc-process server appears to be alive again (at least for now).
And the backlog has cleared.
Grant
Darwin NT
ID: 109228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Raisin

Send message
Joined: 18 May 16
Posts: 2
Credit: 5,562,366
RAC: 778
Message 109234 - Posted: 7 May 2024, 18:49:49 UTC

I am receiving a constant error message via BOINC re Rosetta@Home and I am not sure how to resolve it.

The message (relating solely to Rosetta@Home) is:

"Could not determine location of executable.
Could not find database. Either specify -database or set variable ROSETTA3_db"

Can someone advise where in user files (I assume) a configuration file relating to BOINC and Rosetta@Home needs modification?

Many thanks, Chris Raisin
ID: 109234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,372,156
RAC: 1,028
Message 109235 - Posted: 7 May 2024, 19:14:52 UTC - in response to Message 109234.  

I am receiving a constant error message via BOINC re Rosetta@Home and I am not sure how to resolve it.

The message (relating solely to Rosetta@Home) is:

"Could not determine location of executable.
Could not find database. Either specify -database or set variable ROSETTA3_db"

Can someone advise where in user files (I assume) a configuration file relating to BOINC and Rosetta@Home needs modification?

Many thanks, Chris Raisin


I've seen that message many times. Until those workunits get some hard to guess change, expect many more workunits running under Windows to have the same problem.
ID: 109235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109236 - Posted: 7 May 2024, 21:43:28 UTC - in response to Message 109234.  

I am receiving a constant error message via BOINC re Rosetta@Home and I am not sure how to resolve it.

The message (relating solely to Rosetta@Home) is:

"Could not determine location of executable.
Could not find database. Either specify -database or set variable ROSETTA3_db"

Can someone advise where in user files (I assume) a configuration file relating to BOINC and Rosetta@Home needs modification?

Many thanks, Chris Raisin
Where are those error messages being shown?
Looking at your results, there are only 2 that have errored out,
ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
Which has been an issue with some Tasks for ages now.

Other than what appears to be a heavily loaded system (11.5 hours to do 8 hours work, 4 hrs 15 min to do 3 hrs work), other than the 2 errored Tasks(due to a configuration issue with the Tasks themselves), all the others have processed & Validated without issue.
Grant
Darwin NT
ID: 109236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,121,026
RAC: 12,565
Message 109237 - Posted: 8 May 2024, 7:40:35 UTC - in response to Message 109236.  

Where are those error messages being shown?
Other than what appears to be a heavily loaded system (11.5 hours to do 8 hours work, 4 hrs 15 min to do 3 hrs work), other than the 2 errored Tasks(due to a configuration issue with the Tasks themselves), all the others have processed & Validated without issue.


Seems the message of the screensaver...
ID: 109237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 412
Credit: 12,566,785
RAC: 12,602
Message 109250 - Posted: 15 May 2024, 8:12:34 UTC

A strange error, sadly I can only give a sketchy report but I hope it’s enough :-

Host = https://boinc.bakerlab.org/rosetta/results.php?hostid=6231982

Boinc 7.24.1, Ubuntu 22.04.4

I allowed Ubuntu to update and then rebooted, subsequent to this Boinc Manager disconnected after running for about a minute - the event log showed a Rosetta task restarting and immediately Boinc closing having received signal 15. This would repeat each time I restated the host and the Boinc service restarted.

I have now aborted all of the Rosetta tasks and this behaviour has now stopped.

(How) can a Rosetta task kill Boinc?

Just a notification as I’ve never heard this described before.
ID: 109250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 276 · 277 · 278 · 279 · 280 · 281 · 282 . . . 316 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org