Problems and Technical Issues with Rosetta@home

Author	Message
Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112628 - Posted: 6 May 2025, 9:50:06 UTC - in response to Message 112625. However, when you later write... I have created a profile with the longer runtime. I will let it run for a while. And then probably revert to the 8 hour profile. Your cache seems to hold 870 tasks (including running tasks). The way Rosetta works is it initially tells Boinc tasks will take 8hrs, even if you've adjusted tasks to have a 24hr runtime, 870 tasks at 8hrs on a 128 thread server will take ~2.25days to complete - within the 3-day deadline. But if they end up running 24hrs, they'll take ~6.75days to complete - ALL missing deadline. Then your earlier unstarted tasks will get cancelled for not starting before deadline, while simultaneously grabbing more tasks because Rosetta (not Boinc) is misleading Boinc as to how big your cache is. Any cache of tasks larger than 3days*128threads=384 running 24hrs each will miss deadline The longest runtime your current cache size can successfully complete inside deadline is 10hrs - not notably different to the default 8hrs The point being, with a fixed 3day deadline, if you treble runtime you have to reduce your cache-size an equivalent amount to continue to meet that hard deadline Yeah, this is actually happening as predicted above. You currently have 1100 errored tasks, largely comprising "Not started by deadline - canceled" plus "Timed out - no response" for those tasks that have started. And for those you have returned they have been awarded credited, which is fortunate because all your 24hrs tasks missed deadline by up to a day. And you seem to have tried reducing your runtime to 22hrs or 20hrs and it's not producing any better outcomes. The other thing we can say is that while each task is getting credited more, you're not noticeably getting any better credit/hr much as predicted again, so it's a futile exercise. It's perfectly legitimate to want to run longer runtimes, if you're happy with the risk of tasks crashing in that extra time and not being rewarded with any credits, but that requires a maximum number of tasks in the 300-350 range when using 24hr runtimes - and, importantly, <waiting> for your cache to actually reduce to 300-350 <before> increasing runtime from 8 to 24hrs in order to avoid these timeouts and cancellations. By all means confirm that for yourself - your tasklist looks like a bit of a warzone atm with all its red warning messages If you want to keep your settings as they currently are I think you may squeeze through with 12hr runtimes as a certain number of tasks are crashing out of their own accord in the current batch (project-related, not user-related) I'm using 12hr runtimes quite successfully atm (albeit with a smaller cache). My personal view is that it will be a workable compromise setting for you of longer runtime vs completion by deadline. YMMV Checking in on this (because I have nothing better to do) I did note the cache had reduced from 870 to 700ish at the time of my previous post, but didn't know if that was just a random fluctuation. Now I can see runtime has been knocked back to 8hrs, cache is down to 367, deadlines were only being missed by 7hrs rather than a day and there was a delay in downloading fresh tasks of over a day so that deadlines will now start to be hit. Also there's a further delay in downloading currently going on so that (I speculate) in about a day's time, runtime can be increased to 24hrs again. If I've got that right, and Tom hasn't come back to confirm it yet, that's all exactly the right thing to do. Good job ID: 112628 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112629 - Posted: 6 May 2025, 14:41:30 UTC - in response to Message 112628. Last modified: 6 May 2025, 14:41:59 UTC Your cache seems to hold 870 tasks (including running tasks). The way Rosetta works is it initially tells Boinc tasks will take 8hrs, even if you've adjusted tasks to have a 24hr runtime, 870 tasks at 8hrs on a 128 thread server will take ~2.25days to complete - within the 3-day deadline. But if they end up running 24hrs, they'll take ~6.75days to complete - ALL missing deadline. Then your earlier unstarted tasks will get cancelled for not starting before deadline, while simultaneously grabbing more tasks because Rosetta (not Boinc) is misleading Boinc as to how big your cache is. Any cache of tasks larger than 3days*128threads=384 running 24hrs each will miss deadline The longest runtime your current cache size can successfully complete inside deadline is 10hrs - not notably different to the default 8hrs The point being, with a fixed 3day deadline, if you treble runtime you have to reduce your cache-size an equivalent amount to continue to meet that hard deadline Yeah, this is actually happening as predicted above. You currently have 1100 errored tasks, largely comprising "Not started by deadline - canceled" plus "Timed out - no response" for those tasks that have started. And for those you have returned they have been awarded credited, which is fortunate because all your 24hrs tasks missed deadline by up to a day. And you seem to have tried reducing your runtime to 22hrs or 20hrs and it's not producing any better outcomes. The other thing we can say is that while each task is getting credited more, you're not noticeably getting any better credit/hr much as predicted again, so it's a futile exercise. It's perfectly legitimate to want to run longer runtimes, if you're happy with the risk of tasks crashing in that extra time and not being rewarded with any credits, but that requires a maximum number of tasks in the 300-350 range when using 24hr runtimes - and, importantly, <waiting> for your cache to actually reduce to 300-350 <before> increasing runtime from 8 to 24hrs in order to avoid these timeouts and cancellations. By all means confirm that for yourself - your tasklist looks like a bit of a warzone atm with all its red warning messages If you want to keep your settings as they currently are I think you may squeeze through with 12hr runtimes as a certain number of tasks are crashing out of their own accord in the current batch (project-related, not user-related) I'm using 12hr runtimes quite successfully atm (albeit with a smaller cache). My personal view is that it will be a workable compromise setting for you of longer runtime vs completion by deadline. YMMV Checking in on this (because I have nothing better to do) I did note the cache had reduced from 870 to 700ish at the time of my previous post, but didn't know if that was just a random fluctuation. Now I can see runtime has been knocked back to 8hrs, cache is down to 367, deadlines were only being missed by 7hrs rather than a day and there was a delay in downloading fresh tasks of over a day so that deadlines will now start to be hit. Also there's a further delay in downloading currently going on so that (I speculate) in about a day's time, runtime can be increased to 24hrs again. If I've got that right, and Tom hasn't come back to confirm it yet, that's all exactly the right thing to do. Good job I know I'm obsessing over this, but I'm at a loose end, so why not... In progress tasks are down to 286 All "Not started by deadline - canceled" and "Timed out - no response" error messages have disappeared. Errored tasks are down by 200 with no new ones being added And task returns are already beating deadlines by as much as 1 day 10hrs, not risking not getting credit and not causing resends to other users who later find them cancelled by the server All problem issues are solved, and with quite some headroom. With a 128-thread server I wouldn't reduce the cache size any further - some might already consider that number to be on the low side, especially when tasks ready to send are so hand-to-mouth. I'd also increase task runtime from 8 to 12hrs, which I personally consider to be a sweeter spot for longer runtimes than 24hrs, reduced server hits compared to 8hrs, less problematic Boinc scheduling and all the other vagaries we have to contend with here. It all looks neatly balanced atm, with that option to slightly increase runtime as well without recreating problems. IMO ID: 112629 · Rating: 0 · rate: / Reply Quote

Tom M Send message Joined: 20 Jun 17 Posts: 178 Credit: 37,552,020 RAC: 6	Message 112630 - Posted: 6 May 2025, 15:11:14 UTC - in response to Message 112629. Last modified: 6 May 2025, 16:07:39 UTC If I've got that right, and Tom hasn't come back to confirm it yet, that's all exactly the right thing to do. Good job I know I'm obsessing over this, but I'm at a loose end, so why not... In progress tasks are down to 286 All "Not started by deadline - canceled" and "Timed out - no response" error messages have disappeared. Errored tasks are down by 200 with no new ones being added And task returns are already beating deadlines by as much as 1 day 10hrs, not risking not getting credit and not causing resends to other users who later find them cancelled by the server All problem issues are solved, and with quite some headroom. With a 128-thread server I wouldn't reduce the cache size any further - some might already consider that number to be on the low side, especially when tasks ready to send are so hand-to-mouth. I'd also increase task runtime from 8 to 12hrs, which I personally consider to be a sweeter spot for longer runtimes than 24hrs, reduced server hits compared to 8hrs, less problematic Boinc scheduling and all the other vagaries we have to contend with here. It all looks neatly balanced atm, with that option to slightly increase runtime as well without recreating problems. IMO It looks like I have switched back to the 8 hour profile overnight. I will change the 22-24 hour profile to 12. Boincmgr is set to 0.1/0.01 right now. Do we have any idea what the computation errors are triggered by? I would like to lower my computation errors if possible. I am getting them on both of my systems. A Ryzen 3700x cpu and the Epyc CPU system. Thank you. ===edit=== Bumped the cache from 0.1 to 0.2 The profile is now set to 12 hours. Proud member of the O.F.A. (Old Farts Association) ID: 112630 · Rating: 0 · rate: / Reply Quote

Tom M Send message Joined: 20 Jun 17 Posts: 178 Credit: 37,552,020 RAC: 6	Message 112631 - Posted: 6 May 2025, 16:25:52 UTC - in response to Message 112630. Do we have any idea what the computation errors are triggered by? I would like to lower my computation errors if possible. I am getting them on both of my systems. A Ryzen 3700x cpu and the Epyc CPU system. Apparently everyone is die-ing on line 2798 of the Beta tasks. "...ERROR: Error in simple_cycpep_predict app! The imported native pose has a different number of residues than the sequence provided...." Proud member of the O.F.A. (Old Farts Association) ID: 112631 · Rating: 0 · rate: / Reply Quote

Tom M Send message Joined: 20 Jun 17 Posts: 178 Credit: 37,552,020 RAC: 6	Message 112632 - Posted: 6 May 2025, 16:31:59 UTC - in response to Message 112630. ===edit=== Bumped the cache from 0.1 to 0.2 The profile is now set to 12 hours. And I have started my polling script again. Proud member of the O.F.A. (Old Farts Association) ID: 112632 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112633 - Posted: 6 May 2025, 22:31:52 UTC - in response to Message 112630. Last modified: 6 May 2025, 23:03:49 UTC If I've got that right, and Tom hasn't come back to confirm it yet, that's all exactly the right thing to do. Good job I know I'm obsessing over this, but I'm at a loose end, so why not... In progress tasks are down to 286 All "Not started by deadline - canceled" and "Timed out - no response" error messages have disappeared. Errored tasks are down by 200 with no new ones being added And task returns are already beating deadlines by as much as 1 day 10hrs, not risking not getting credit and not causing resends to other users who later find them cancelled by the server All problem issues are solved, and with quite some headroom. With a 128-thread server I wouldn't reduce the cache size any further - some might already consider that number to be on the low side, especially when tasks ready to send are so hand-to-mouth. I'd also increase task runtime from 8 to 12hrs, which I personally consider to be a sweeter spot for longer runtimes than 24hrs, reduced server hits compared to 8hrs, less problematic Boinc scheduling and all the other vagaries we have to contend with here. It all looks neatly balanced atm, with that option to slightly increase runtime as well without recreating problems. IMO It looks like I have switched back to the 8 hour profile overnight. I will change the 22-24 hour profile to 12. Boincmgr is set to 0.1/0.01 right now. Do we have any idea what the computation errors are triggered by? I would like to lower my computation errors if possible. I am getting them on both of my systems. A Ryzen 3700x cpu and the Epyc CPU system. Thank you. ===edit=== Bumped the cache from 0.1 to 0.2 The profile is now set to 12 hours. On the computation errors, this comes from the project, not from any of us. The last I heard, in the days when someone at Rosetta was speaking to me, was that it was easier to let those tasks error out after very few seconds than to try to extract them from the queue of tasks, which would take a lot of good tasks out as well as the bad. If that view holds, it's something we're going to continue to suffer, unfortunately. Not ideal but pragmatic. On the cache size, I do agree with Grant's view that it should be kept low BUT only if there's a constant supply of tasks for us. For some months now we <haven't> had a constant supply ready to send to us. And this is only made worse by all the tasks that error out. As such I can't agree with the cache only being 0.1 or 0.2 +0.01 With the number of threads you have, the hand-to-mouth supply of tasks and the regular computation errors, I would aim for a cache size somewhere between 0.5 and 1.0 + 0.01 That strikes me as the right ballpark for safety & reliability within the deadline, but tweak it to your own view of each of those competing issues within those bounds. Any less and I can see you regularly having threads free without work. Supply isn't trustworthy enough and, with all the computation errors, even what you do get you can't entirely rely on. Having a 12hr runtime rather than 8hrs gives you that little bit more time to get good tasks through - that's one of its pluses IMO Fwiw on my own machines, I've now settled on a 12hr runtime with a cache of 0.4 + 0.1 which works pretty well with just 16 threads on my main PC (and 6 threads on another and 8 on my work PC), though I run 2 other low-priority projects as a backup in case of unforseen eventualities while they're unattended. Edit: I see you're down to just 149 tasks now, which will be your 128 threads and only 21 tasks waiting to start for when others complete. This is way too tight. It looks like you're likely asking for tasks already but the project hasn't got them to send you. If you were asking for tasks with 0.5days worth left, rather than only 0.1 or 0.2, you'd stand a much better chance of getting some in time. Even 0.5days may not be enough time tbh. You can only see how this goes. It's no good swinging from having too many tasks to complete by deadline all the way to not having enough tasks to keep all your threads occupied. There's a balance somewhere between the two to find. Edit 2: Go straight to 1.0 + 0.01 - even if Rosetta had them all to send you it'd only be ~300 including running tasks which is far from excessive on a 128-thread server. It'd still be nearly 600 fewer than you were stockpiling before ID: 112633 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112634 - Posted: 7 May 2025, 1:23:19 UTC - in response to Message 112633. Edit: I see you're down to just 149 tasks now, which will be your 128 threads and only 21 tasks waiting to start for when others complete. This is way too tight. It looks like you're likely asking for tasks already but the project hasn't got them to send you. If you were asking for tasks with 0.5days worth left, rather than only 0.1 or 0.2, you'd stand a much better chance of getting some in time. Even 0.5days may not be enough time tbh. You can only see how this goes. It's no good swinging from having too many tasks to complete by deadline all the way to not having enough tasks to keep all your threads occupied. There's a balance somewhere between the two to find. Edit 2: Go straight to 1.0 + 0.01 - even if Rosetta had them all to send you it'd only be ~300 including running tasks which is far from excessive on a 128-thread server. It'd still be nearly 600 fewer than you were stockpiling before I think I panicked. Your tasks dropped to 147 (down to 19 waiting to start) - I didn't know when it was going to stop going down. Then your 12hr runtime kicked in and increased cache from 0.1 to 0.2 as well, and tasks are already up to 160 (32 waiting to start) Still increase your cache, but I now think 0.5 + 0.01 will be enough - no need to go all the way to 1.0 + 0.01 This time tomorrow I expect your cache to be close to 240 12hr tasks, which should be comfortable from every perspective ID: 112634 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 112635 - Posted: 7 May 2025, 4:38:03 UTC - in response to Message 112633. As such I can't agree with the cache only being 0.1 or 0.2 +0.01 Why? The idea of a cache is to keep your system busy if there is a lack of work/issues contacting the servers. If you run just one project, and like Rosetta, it's a poorly to not at all managed, then having a cache will help keep your system busy when the project's having issues. But running a single project, with plenty of work and good admin support, keep a few hours worth if you feel the need. But running multiple projects? No cache is really necessary or desirable, let alone multiple days worth. Rosetta isn't the only project he's participating in, so there's no need for a cache at all to keep his systems busy. 0.1 days and 0.01 days means your system will report work pretty much as it's done, and will have some Tasks on hand ready to go as others finish processing so the system isn't waiting to download work before staring on new work once a Task finishes, even if you get a few that might error out as soon they start. Grant Darwin NT ID: 112635 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 112636 - Posted: 7 May 2025, 4:44:50 UTC - in response to Message 112631. Do we have any idea what the computation errors are triggered by? I would like to lower my computation errors if possible. I am getting them on both of my systems. A Ryzen 3700x cpu and the Epyc CPU system. Apparently everyone is die-ing on line 2798 of the Beta tasks. "...ERROR: Error in simple_cycpep_predict app! The imported native pose has a different number of residues than the sequence provided...." It's been an issue that has been reported for years. No action taken by the project to improve either the BOINC science application to better handle the error or the ones they use to create work. So every so often you can get a batch of work with a few Tasks there & there that error out, or a batch where a huge percentage of them just error out. And there's another common error that's been around even longer that can error out at any time- right from the start of processing, all the way till just before it's ready to report. 8 (or more) hours of work down the toilet just like that. Grant Darwin NT ID: 112636 · Rating: 0 · rate: / Reply Quote

angel Send message Joined: 14 Jun 22 Posts: 2 Credit: 105,815 RAC: 0	Message 112637 - Posted: 7 May 2025, 7:14:47 UTC Hello , since a few weeks Rosetta is not running with the message " feeder is not running" . I installed Rosetta again and I still have the problem . Any idea of solution ? Thank you / angel ( France) ID: 112637 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 112638 - Posted: 7 May 2025, 8:31:54 UTC - in response to Message 112637. Hello , since a few weeks Rosetta is not running with the message " feeder is not running" . I installed Rosetta again and I still have the problem . Any idea of solution ? Thank you / angel ( France) From a previous post by Greg_BE It a IPV6 address error. A server went crazy so we use this work around to solve that: Here how you do it... open file explorer and then do this Goto C:>Windows>System32>drivers>etc find the hosts file then: Press the Windows key. Type Notepad in the search field. In the search results, right-click Notepad and select Run as. administrator and open the hosts file then paste the following two lines: 128.95.160.156 boinc-files.bakerlab.org 128.95.160.156 bwsrv1.bakerlab.org save the file. If it gives you any trouble, then rename the original hosts file with a .old extenstion and create a new one but make sure you do not give it any extension. So make sure to 'save as' with 'all files' selected in file type. Windows needs to see it as a 'file'. Grant Darwin NT ID: 112638 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112642 - Posted: 7 May 2025, 12:40:22 UTC - in response to Message 112635. As such I can't agree with the cache only being 0.1 or 0.2 + 0.01 Why? The idea of a cache is to keep your system busy if there is a lack of work/issues contacting the servers. For the specific reasons detailed prior to the words "as such" That's what "as such" means. (Lol) Last I looked last night, Tom's cache increased from 147 (128 running + 19 waiting) to 190 (128 running + 62 waiting) This morning it was 126 - all running + 2 threads idle. Exactly as I feared and anticipated, because the 0.1 + 0.01 didn't provide a sufficient buffer to cover for the failure of the project to supply. Which is something we've known about for months now. As I said in what I wrote prior to "as such" I completely agree with you if we have reliability of supply from the project, but we've all known, every single day for literally months, we don't have reliability of supply. I don't know what kind of extra hint there needs to be. 0.1 + 0.01 didn't survive 1 day. We don't know how long 0.2 + 0.01 will survive, but the project's reliability doesn't make me think it'd be much more than a week (I'm guessing obvs, but one borne of experience) I'm speculating that 0.5 + 0.01 will be sufficient to cover a continuation of what we've seen this year, while not hoarding an excess of tasks and not risking a failure to meet deadline. If that turns out not to be the case it'll be for a reason we can't predict right now and can revisit if it arises. There is zero risk of 0.5 + 0.01 being too large a cache, even with a 12hr runtime. One day's worth of tasks (cache plus runtime) with a 3day deadline ensures a speedy return of tasks and <no chance whatsoever> of missing deadline. The <only> risk is that the cache is too small due to the project's inability to supply and threads go unutilised. At 0.1 supply reliability makes that risk high (almost guaranteed). At 0.2 it's likely but at an unknown frequency. At 0.5 I speculate most irregularity of supply is covered unless queued tasks (front page) drops to zero for an extended period, in which case no cache size will be enough. Your preference is what 'should' happen. Mine is what we know actually happened over the last ~3 months. ID: 112642 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112644 - Posted: 7 May 2025, 12:47:31 UTC - in response to Message 112636. And there's another common error that's been around even longer that can error out at any time- right from the start of processing, all the way till just before it's ready to report. 8 (or more) hours of work down the toilet just like that. The other day I got 3 consecutive "Validation error"s. No error in the task, but failed validation. 3x12hrs down the pan. Highly annoyed at the time. The overnight job that used to run daily to credit those tasks with just a validation error can't come back soon enough <sigh> ID: 112644 · Rating: 0 · rate: / Reply Quote

Tom M Send message Joined: 20 Jun 17 Posts: 178 Credit: 37,552,020 RAC: 6	Message 112646 - Posted: 7 May 2025, 15:00:36 UTC I just bumped the cache up to 0.5 days. And set the CPU limit (a Pandora parameter) to 300. The polling script is still running. I do have several projects set for "0" resources. So I should have other tasks to process if Rosetta were to burp again. Right now the Epyc system is running 100% Rosetta with a couple of GPU tasks from Einstein at home. I am hoping that yesterdays Free-DC result is a reliable signal of good things to come. Thank you for your discussion and guidance! Respectfully, Proud member of the O.F.A. (Old Farts Association) ID: 112646 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112647 - Posted: 7 May 2025, 16:28:11 UTC - in response to Message 112646. I just bumped the cache up to 0.5 days. And set the CPU limit (a Pandora parameter) to 300. The polling script is still running. I do have several projects set for "0" resources. So I should have other tasks to process if Rosetta were to burp again. Right now the Epyc system is running 100% Rosetta with a couple of GPU tasks from Einstein at home. I am hoping that yesterday's Free-DC result is a reliable signal of good things to come. Thank you for your discussion and guidance! Respectfully I noticed. Your cache shot right up to 300 around the time you posted (570 lower than when this exercise started). It looks bang on to me with your 12hr runtimes coming through. If it ever goes wrong from there, it's my fault. I don't expect it to, short of the project itself having a major extended issue. Good stuff. ID: 112647 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 112649 - Posted: 8 May 2025, 5:32:27 UTC - in response to Message 112642. As such I can't agree with the cache only being 0.1 or 0.2 + 0.01 Why? The idea of a cache is to keep your system busy if there is a lack of work/issues contacting the servers. ..... Your preference is what 'should' happen. Mine is what we know actually happened over the last ~3 months. Unfortunately your response wasn't to what i actually posted. You are talking about cache settings for doing work for a single project that has frequent issues And as i pointed out in what i posted- if you do work for a single project that has frequent issues, then a small cache will help keep your system busy. But- and this is a big but- but if you are running multiple projects, such as Tom is, then a cache isn't necessary in order to keep the system busy. So no cache is best. No deadline issues. No issues with the system filling up with work from one or more projects when your preferred project(s) go down, then having to wait for that work to clear before being able to get more work for your preferred project(s) when they have work available again. Grant Darwin NT ID: 112649 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112651 - Posted: 8 May 2025, 10:43:03 UTC - in response to Message 112649. But - and this is a big but - but if you are running multiple projects, such as Tom is, then a cache isn't necessary in order to keep the system busy. So no cache is best. No deadline issues. No issues with the system filling up with work from one or more projects when your preferred project(s) go down, then having to wait for that work to clear before being able to get more work for your preferred project(s) when they have work available again. I invite you to visit Tom's account, go to his tasks and, in the header, view the (currently) 773 Error'd tasks (down from 1100 3 days ago) that haven't aged out just yet - out of 3134 (All) - 267 (in progress) = 2867 completed tasks (27%) Because little of what you've said is true and everything I've said is true. In an ideal environment what you've said would be right, and I'd pay a lot of attention to it, but if we know nothing else, we know we're in far from an ideal environment. I won't go further as I've made myself clear and you've made yourself clear. ID: 112651 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 112653 - Posted: 8 May 2025, 12:10:49 UTC - in response to Message 112651. Last modified: 8 May 2025, 12:17:29 UTC I invite you to visit Tom's account, go to his tasks and, in the header, view the (currently) 773 Error'd tasks (down from 1100 3 days ago) that haven't aged out just yet - out of 3134 (All) - 267 (in progress) = 2867 completed tasks (27%) You are talking about how much cache is too much- and that just boils down to missed deadlines or not. He's not missing deadlines anymore, so it's not too big. But it doesn't change the fact that he doesn't need a cache at all in order to keep his system busy. Because little of what you've said is true Everything i have said is factually correct- Please, don't just say what i have said is not true- tell me what isn't, and why. I'll split it up and summarise it to make it easier. 1 The reason for a cache is to keep a system busy. 2 A single reliable project has no need for work to be cached to keep the system busy. 3 When running multiple projects (unless all of the projects are unreliable), there is no need for a cache to keep the system busy. 4 A cache is only necessary if the person is running a single project, and that project has frequent work supply/reporting issues. 5 A cache is only necessary if the person has connectivity issues to the net in general. So which of those points do you consider not true, and why? and everything I've said is true. Just not relevant to what i was actually posting about- whether or not Tom needs a cache to keep his system busy, nothing at all about how big it is. In an ideal environment what you've said would be right, and I'd pay a lot of attention to it No, not an ideal environment. One where the user is doing work for multiple projects, which Tom is. but if we know nothing else, we know we're in far from an ideal environment. And once again, that is only the case if the person is doing a single unreliable project, which Tom isn't doing. I won't go further as I've made myself clear and you've made yourself clear. If things were clear, then you wouldn't be saying i'm wrong (when everything i've posted is factually correct). But as always, we appear to be talking around each other- no matter how hard i try to do otherwise. Grant Darwin NT ID: 112653 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 112654 - Posted: 9 May 2025, 0:32:37 UTC - in response to Message 112653. I won't go further as I've made myself clear and you've made yourself clear. If things were clear, then you wouldn't be saying I'm wrong (when everything i've posted is factually correct). Several things you posted point away from the primary issue. Being 'factually correct' in the wrong direction is of neither interest nor help. Promoting alternate solutions that rely on the failure of the primary task aren't solutions. Zero share projects are ones people don't want to run except for medium or long term shutdowns of the primary project. They aren't to cover for the intermittancy/irregularity of supply we see here. I think you've even said this yourself. But then you go on to recommend/demand others use parameters inappropriate for the latter and of little use for the former either It's not that I don't understand your points or sidestep them. It's that I do understand them, their relevance and appropriateness. As I keep saying in every one of my posts, your suggestion is very good for situations that haven't applied here for months. The moment the situation becomes appropriate to use those settings... well, I won't use them then either tbh. Sorry. Project task supply reliability simply doesn't warrant doing so. ID: 112654 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 2	Message 112655 - Posted: 9 May 2025, 2:44:53 UTC - in response to Message 112646. I just bumped the cache up to 0.5 days. And set the CPU limit (a Pandora parameter) to 300. The polling script is still running. I do have several projects set for "0" resources. So I should have other tasks to process if Rosetta were to burp again. Right now the Epyc system is running 100% Rosetta with a couple of GPU tasks from Einstein at home. I am hoping that yesterdays Free-DC result is a reliable signal of good things to come. Thank you for your discussion and guidance! Respectfully, One problem you may be having is the older versions of Boinc that you are running on the Epyc, if you update it MAY help with the other filling of the cache. BUT you have to be very careful as it's all about the math and if a Project thinks a task will take 30 minutes and your cache is set for 1.5 days be prepared for a boatload of tasks. They keep tweaking things with all the new releases. ID: 112655 · Rating: 0 · rate: / Reply Quote