Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 332 · 333 · 334 · 335
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2331 Credit: 44,194,027 RAC: 27,408 ![]() |
However, when you later write... Checking in on this (because I have nothing better to do) I did note the cache had reduced from 870 to 700ish at the time of my previous post, but didn't know if that was just a random fluctuation. Now I can see runtime has been knocked back to 8hrs, cache is down to 367, deadlines were only being missed by 7hrs rather than a day and there was a delay in downloading fresh tasks of over a day so that deadlines will now start to be hit. Also there's a further delay in downloading currently going on so that (I speculate) in about a day's time, runtime can be increased to 24hrs again. If I've got that right, and Tom hasn't come back to confirm it yet, that's all exactly the right thing to do. Good job ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2331 Credit: 44,194,027 RAC: 27,408 ![]() |
Your cache seems to hold 870 tasks (including running tasks). I know I'm obsessing over this, but I'm at a loose end, so why not... In progress tasks are down to 286 All "Not started by deadline - canceled" and "Timed out - no response" error messages have disappeared. Errored tasks are down by 200 with no new ones being added And task returns are already beating deadlines by as much as 1 day 10hrs, not risking not getting credit and not causing resends to other users who later find them cancelled by the server All problem issues are solved, and with quite some headroom. With a 128-thread server I wouldn't reduce the cache size any further - some might already consider that number to be on the low side, especially when tasks ready to send are so hand-to-mouth. I'd also increase task runtime from 8 to 12hrs, which I personally consider to be a sweeter spot for longer runtimes than 24hrs, reduced server hits compared to 8hrs, less problematic Boinc scheduling and all the other vagaries we have to contend with here. It all looks neatly balanced atm, with that option to slightly increase runtime as well without recreating problems. IMO ![]() ![]() |
Tom M Send message Joined: 20 Jun 17 Posts: 126 Credit: 27,939,808 RAC: 104,344 ![]() |
It looks like I have switched back to the 8 hour profile overnight. I will change the 22-24 hour profile to 12. Boincmgr is set to 0.1/0.01 right now. Do we have any idea what the computation errors are triggered by? I would like to lower my computation errors if possible. I am getting them on both of my systems. A Ryzen 3700x cpu and the Epyc CPU system. Thank you. ===edit=== Bumped the cache from 0.1 to 0.2 The profile is now set to 12 hours. Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Tom M Send message Joined: 20 Jun 17 Posts: 126 Credit: 27,939,808 RAC: 104,344 ![]() |
Apparently everyone is die-ing on line 2798 of the Beta tasks. "...ERROR: Error in simple_cycpep_predict app! The imported native pose has a different number of residues than the sequence provided...." Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Tom M Send message Joined: 20 Jun 17 Posts: 126 Credit: 27,939,808 RAC: 104,344 ![]() |
And I have started my polling script again. Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2331 Credit: 44,194,027 RAC: 27,408 ![]() |
If I've got that right, and Tom hasn't come back to confirm it yet, that's all exactly the right thing to do. Good job On the computation errors, this comes from the project, not from any of us. The last I heard, in the days when someone at Rosetta was speaking to me, was that it was easier to let those tasks error out after very few seconds than to try to extract them from the queue of tasks, which would take a lot of good tasks out as well as the bad. If that view holds, it's something we're going to continue to suffer, unfortunately. Not ideal but pragmatic. On the cache size, I do agree with Grant's view that it should be kept low BUT only if there's a constant supply of tasks for us. For some months now we <haven't> had a constant supply ready to send to us. And this is only made worse by all the tasks that error out. As such I can't agree with the cache only being 0.1 or 0.2 +0.01 With the number of threads you have, the hand-to-mouth supply of tasks and the regular computation errors, I would aim for a cache size somewhere between 0.5 and 1.0 + 0.01 That strikes me as the right ballpark for safety & reliability within the deadline, but tweak it to your own view of each of those competing issues within those bounds. Any less and I can see you regularly having threads free without work. Supply isn't trustworthy enough and, with all the computation errors, even what you do get you can't entirely rely on. Having a 12hr runtime rather than 8hrs gives you that little bit more time to get good tasks through - that's one of its pluses IMO Fwiw on my own machines, I've now settled on a 12hr runtime with a cache of 0.4 + 0.1 which works pretty well with just 16 threads on my main PC (and 6 threads on another and 8 on my work PC), though I run 2 other low-priority projects as a backup in case of unforseen eventualities while they're unattended. Edit: I see you're down to just 149 tasks now, which will be your 128 threads and only 21 tasks waiting to start for when others complete. This is way too tight. It looks like you're likely asking for tasks already but the project hasn't got them to send you. If you were asking for tasks with 0.5days worth left, rather than only 0.1 or 0.2, you'd stand a much better chance of getting some in time. Even 0.5days may not be enough time tbh. You can only see how this goes. It's no good swinging from having too many tasks to complete by deadline all the way to not having enough tasks to keep all your threads occupied. There's a balance somewhere between the two to find. Edit 2: Go straight to 1.0 + 0.01 - even if Rosetta had them all to send you it'd only be ~300 including running tasks which is far from excessive on a 128-thread server. It'd still be nearly 600 fewer than you were stockpiling before ![]() ![]() |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org