Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 275 · 276 · 277 · 278 · 279 · 280 · 281 . . . 316 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109178 - Posted: 25 Apr 2024, 11:28:30 UTC - in response to Message 109173.  

At some point somewhere - and quite recently - Rosetta's default appears to have changed to 3hrs, meaning tasks get completed and used up far more quickly than intended.
And I'm not sure about this, but I think Boinc is forced to assume and schedule Rosetta tasks to run for 8hrs, which is now not right.
The default Runtime is still 8 hours.
Rosetta 4.20 Tasks generally still take that long. However, Rosetta Beta Tasks generally only require 3 hours.


The initial Estimated completion time for Rosetta has been broken ever since i joined Rosetta.
When i joined, the initial Estimated Completion time was way, way, way less than the actual Runtime, and people would download hundreds (even thousands for the huge multicore systems) of Tasks and most would time out, but eventually the Estimated Completion time would reflect the actual Runtime.

The best fix would have been to use the mechanism that every other project uses- an Initial Estimated completion time based on the Estimated amount of work to do, but Rosetta doesn't work that way.
The next best fix would have been to use the average Runtime for all tasks for a given application (or the previous application, or that new application from the Ralph Runtimes) for the Initial Estimated Completion time, which would eventually end up matching the actual Runtime.
The next best fix would have been to set the Initial Estimated Completion time to match the Target CPU Runtime set by each cruncher for their systems, and which would eventually end up matching the actual Runtime.
The next best fix would have been to set the Initial Estimated Completion time to match the project's default Target CPU time, and it would eventually end up matching the actual Runtime.
The next best fix would have been to set the Initial Estimated Completion time to match the Target CPU Runtime set by each cruncher for their systems, and not update it using their actual Runtimes.
The next best fix would have been to set the Initial Estimated Completion time to match the project's default Target CPU time, and not update it using their actual Runtimes. And that's what we ended up with.
While it was a huge improvement over what was used before, it was nowhere near as good as it could have been.
Grant
Darwin NT
ID: 109178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109182 - Posted: 25 Apr 2024, 23:39:03 UTC - in response to Message 109174.  

It sets 8 hours for 4.20 and 3 for 6.05

Oh! I didn't even think of that.
I assume this is for Target CPU Runtime = Not Selected".

So, does Boinc assume they're all 8hr tasks before they run, then rapidly reduce Remaining Time as the 6.04/6.05 task works its way through?

So that Boinc schedules the same as tasks run, I'd set Target CPU Runtime explicitly to 8hrs, not "Not Selected"
ID: 109182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109183 - Posted: 25 Apr 2024, 23:57:10 UTC - in response to Message 109175.  
Last modified: 25 Apr 2024, 23:57:45 UTC

Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.
Eliminating the reason for the panic mode is the entire solution, everything else is a workaround, which might fail as soon as something changes (new WU type, new project, whatever) or even before.

The root cause reason for panic mode is holding too large an offline cache.
Aside from the number of days you chose to hold, if Rosetta actively misleads Boinc on top of that, which it certainly does, then that's what has to be resolved before anything else.

It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
It's not just not pretty, highly overcommiting the system might slow down the overall production, in particular with hyperthreading CPUs many people leave 1-2 threads for non-BOINC stuff.

Might it? Does it? To what extent?
I don't leave any cores/threads spare.
Occasionally I get WCG GPU tasks for Open Pandemics and genuinely don't notice any deterioration/extension of wall clock times as result.
There may be some, but not so that I notice.
Other GPU tasks with other projects may be different, but I don't run them.
I understand the theoretical point. I just don't see any practical difference.
ID: 109183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109184 - Posted: 26 Apr 2024, 0:44:31 UTC - in response to Message 109176.  

It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
No one in their right mind would think taking 12 hours to 6 hours work is good (which is double the time required- on another project it's taking them 4 times as long).

But that's not what's happening, is it.
It's taking 12hrs to do 6hrs work because it's taking that same 12hrs to run 6hrs of non-Boinc work. So in 12hrs it's doing 6+6hrs work=12hrs.
This isn't a problem, because Adrian (in this case) said both projects are important to him.

Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.
First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing.
You may dispute that, but it doesn't make it any less true.
And it needs fixing because the poster keeps complaining about it. If they don't complain about it, then no it doesn't need fixing.

It's not true at all because you're only counting Boinc work as work.

If the user is happy for more tasks to be running simultaneously, outside of their individual planned time, but still within the overall deadline, that's entirely up to them.
Yep.
But in this case it May cause problems with deadlines, resulting in Panic Mode, which the poster has an issue with, so it is an issue that should be addressed.
Why fix the symptom, when fixing the problem would result in more work being done- even with less cores/threads available to BOINC, the amount of work done for BOINC would be almost triple what it presently is.

Because there's more than one problem and you're not acknowledging the significance of the first of them.
Unfortunately (because it's boring) I'm going to have to spell it out.

I hold a 0.5 + 0.1day cache and deliberately run Rosetta for 12hrs rather than 8. (like Adrian unwittingly does/did)
When the website came back up, a load of tasks came down and started at the same time.

Boinc saw them as 8hr (0.33day) tasks so downloaded 1 more task per core (=0.66 days for Boinc, but Target runtime 1.0 days)

After about 2hrs, the total of tasks drops below 0.6 days according to Boinc, but actually 22hrs according to my target runtime, so Boinc downloads another task per core,
Boinc sees this as 3 8hr tasks per core, minus 2 hrs. equals 22hrs of work.
My CPU runtime settings mean it's actually 36(-2)=34hrs work.
That's 12hrs difference already because of what Rosetta does that Boinc won't recognise until 6hrs in, when Boinc tells me there's 4hrs remaining (still 2hrs short) or 7hrs in when Boinc now says there's 3h45m remaining (still 1h15m short) until finally at 8hrs in Boinc finally realises there's still 4hrs remaining. Plus 2 lots of 8hrs per core, that are really 12hrs per core.
So at the start, Boinc saw 3*8hrs of work, whereas after 8hrs of processing there's actually still 28hrs left

And all that's without Folding at home messing things up.

So while I entirely take your point about what Folding@home does to Boinc project runtimes, there's a massive elephant in the room to deal with <first> when using non-standard Rosetta runtimes.

Which is why I went on to ask what a default runtime currently means because it's making a right mess of Boinc's scheduling with some very weird and unexpected consequences
ID: 109184 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109185 - Posted: 26 Apr 2024, 1:02:07 UTC - in response to Message 109177.  

Greetings,

Well I have 3 systems running at the moment, all using the default location in preferences with target cpu set to 2hrs

Welcome.
All that sounds good except for your Target CPU runtime.
The default is supposed to be 8hrs.
By setting target CPU runtime at 2hrs, you're throwing away 6hrs of results per task for the project AND throwing away 6/8ths of the credit you could be earning, while messing up Boinc's scheduling on your PCs, and also making tasks less available to others while we're a bit hand to mouth for task availability in recent months.
Everything else is fine just as you prefer it, but if you could consider changing that runtime, it would be appreciated.
ID: 109185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109186 - Posted: 26 Apr 2024, 1:09:19 UTC - in response to Message 109178.  

At some point somewhere - and quite recently - Rosetta's default appears to have changed to 3hrs, meaning tasks get completed and used up far more quickly than intended.
And I'm not sure about this, but I think Boinc is forced to assume and schedule Rosetta tasks to run for 8hrs, which is now not right.
The default Runtime is still 8 hours.
Rosetta 4.20 Tasks generally still take that long. However, Rosetta Beta Tasks generally only require 3 hours.

The initial Estimated completion time for Rosetta has been broken ever since i joined Rosetta.
When i joined, the initial Estimated Completion time was way, way, way less than the actual Runtime, and people would download hundreds (even thousands for the huge multicore systems) of Tasks and most would time out, but eventually the Estimated Completion time would reflect the actual Runtime.

It wasn't an issue for me at the time, so I kind of glossed over the reasoning, but yes I think it was to do with estimated runtimes for new users being way out of kilter that caused what we've got right now.

My main PC had a major problem the other month (RAM failure causing endless blue screens) and I had to reinstall everything and my first Rosetta task runtimes were still all over the place before the 8hr thing finally cut in.
Same applied to WCG tbf.
ID: 109186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109188 - Posted: 26 Apr 2024, 5:03:25 UTC - in response to Message 109182.  

So that Boinc schedules the same as tasks run, I'd set Target CPU Runtime explicitly to 8hrs, not "Not Selected"
It doesn't work that way.
The project has hard coded 8 hours as the default Estimated time remaining, regardless of how long they run or what you have set your Target CPU time to.
As they run, the Estimated Completion time will go up or down as necessary till it eventually comes close to what the actual Runtime ends up being, but when the next Task starts, it's Estimated Completion time will always be 8 hours.
Grant
Darwin NT
ID: 109188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109189 - Posted: 26 Apr 2024, 5:51:31 UTC - in response to Message 109184.  
Last modified: 26 Apr 2024, 6:00:31 UTC

It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
No one in their right mind would think taking 12 hours to 6 hours work is good (which is double the time required- on another project it's taking them 4 times as long).
But that's not what's happening, is it.
Yes, YES, YES, that is exactly what is happening, as i pointed out in one of my earlier posts, that you even quoted in one of yours.

And the same issue is happening with your other projects.
Asteroids- 2hrs Runtime,1hr CPU time.
SIdock- 31.5hrs Runtime, 27hrs 40min CPU time.
Denis- 3hr 40min Runtime, 1hr CPU time.

For reference- CPU time is the amount of time spent by the CPU processing the Task. Runtime- that is the time (think of a clock on the wall) it actually takes to process the Task. From the time it starts running, to the time it finishes & uploads the result.
So for Denis, on his system, Tasks that should take 1 hour to process, it actually takes 3hrs 40 min from the time it starts to the time it ends. 220min for something that should take 60min.
That is exactly what is happening.



It's taking 12hrs to do 6hrs work because it's taking that same 12hrs to run 6hrs of non-Boinc work. So in 12hrs it's doing 6+6hrs work=12hrs.
This isn't a problem, because Adrian (in this case) said both projects are important to him.
It is a problem because that isn't what is happening. It is only taking 2 to 4 times as long to process BOINC Tasks, because they run at a very low priority.
Folding@home runs at a much higher priority, so it isn't affected in any way shape or form.

For Folding it would take 1 hour to do 1 hours worth of work (ie CPU Time= Runtime), where as here at BOINC it's taking from 31.5 hrs to do 27hrs 40min of SIdock work, to taking twice as long to do Rosetta & Asteroids work, to taking almost 4 times as long to do Denis work.



Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.
First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing.
You may dispute that, but it doesn't make it any less true.
And it needs fixing because the poster keeps complaining about it. If they don't complain about it, then no it doesn't need fixing.
It's not true at all because you're only counting Boinc work as work.
Not at all- Folding is work, watching movies on the computer is work, doing emails is work, editing photos is work, transcoding movies is work, and as far as the Compter is concerned- playing games is work, surfing the internet is work.
And each and every one of these things requires CPU time in order to do it.

In the case of the vast majority of those task the amount of CPU time they require is negligible.
In the case of Folding, and transcoding, it is not negligible & it requires a full core/thread (or more to do). If it is being fully used by those programmes, then BOINC trying to make use of it as well will result in that BOINC work running much, much, much slower than it would if it wasn't sharing that core/thread (as the times i reposted above show- up to almost 4 times slower in some cases).



Because there's more than one problem and you're not acknowledging the significance of the first of them.
It's not a case of not acknowledging the significance of it, because it's not that significant.
What you are proposing will only fix the High Priority issue with Rosetta (and as Link pointed out, it could easily occur if they add or remove a Project, a new application is released here or on another project.)
What i propose fixes the High priority issue with Rosetta, and it also fixes the ridiculously long Run times here & on the other BOINC projects as well.
An issue that affects multiple projects compared to one that affects only a single project is much more significant IMHO. And so something that fixes all of the issues is much more sgnificant than one with doesn't even fix one issue, it only fixes the symptom, not the underlying cause.



Unfortunately (because it's boring) I'm going to have to spell it out.
No you don't.
I am fully aware of what the effect of having a fixed Initial Estimated Completion time is when the actual Runtime isn't necessarily anywhere near that.
What you don't seem to be appreciating is that by having the system overcommitted, there will still be situations where problems arise because the Estimated Completion time can never, ever match the actual Runtime, because the Runtime (how long it actually takes) is so far out from the CPU time (how long it should actually take).

Just look at Rosetta 4.20 Tasks. They take 8 hours- which is the default CPU Target time. But because the system is overcommitted it will take 16 hours to actually process that Task.
The Initial Estimated completion time & the actual CPU time match perfectly, but problems will still occur because neither of those times comes close to the actual Runtime- all because the system is over committed.
Grant
Darwin NT
ID: 109189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109190 - Posted: 26 Apr 2024, 7:06:25 UTC - in response to Message 109183.  

I don't leave any cores/threads spare.
And you don't really need to as your non-BOINC CPU usage (or GPU usage requiring CPU support) is generally quite light, but sill heavier than mine.
eg- My two systems
Run time 7 hours 30 min 42 sec
CPU time 7 hours 29 min 37 sec

Run time 6 hours 54 min 13 sec
CPU time 6 hours 52 min 53 sec
My CPU Time and Run times are very close, there is a slightly bigger difference on the system that i make use of daily. The other is a cruncher only (unless the main system dies, then i've got a spare to use).

Your two systems
Run time 12 hours 12 min 6 sec
CPU time 11 hours 59 min 57 sec

Run time 12 hours 20 min 23 sec
CPU time 11 hours 59 min 56 sec
A bigger gap between CPU time and Run time, but still not large. Which indicates the systems are getting some non-BOINC use, but not a lot. Or they're running a GPU application (BOINC or otherwise), that doesn't require very much CPU support.

And from one of the systems in the top 10 of the Top Hosts list.
Run time 8 hours 0 min 44 sec
CPU time 7 hours 59 min 51 sec




Occasionally I get WCG GPU tasks for Open Pandemics and genuinely don't notice any deterioration/extension of wall clock times as result.
There may be some, but not so that I notice.

1 It's a BOINC project, so it shares it's time & processing with all the other BOINC projects.
2 It's a GPU application, and the amount of CPU support required can vary hugely between applications, from almost none at all, to needing a full CPU core/thread for each running GPU Task.



Other GPU tasks with other projects may be different, but I don't run them.
I understand the theoretical point. I just don't see any practical difference.
The practical difference is if a GPU application needs a full CPU core/thread for each running Task, and it has to share that core/thread with another Task being processed on the CPU, not only will the CPU processing times suffer, but the GPU output can tank massively.
I can't remember the actual numbers, but a GPU that can process a Task in 4 min with a full core/thread supporting it (if it needs it of course), may take 40min (or more) if it has to share that core/thread with another CPU heavy load.
Only doing 360 Tasks per day when over 3,600 is possible is a pretty poor choice to make.
That is the practical difference.
Grant
Darwin NT
ID: 109190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109191 - Posted: 26 Apr 2024, 10:59:21 UTC - in response to Message 109188.  

So that Boinc schedules the same as tasks run, I'd set Target CPU Runtime explicitly to 8hrs, not "Not Selected"
It doesn't work that way.
The project has hard coded 8 hours as the default Estimated time remaining, regardless of how long they run or what you have set your Target CPU time to.
As they run, the Estimated Completion time will go up or down as necessary till it eventually comes close to what the actual Runtime ends up being, but when the next Task starts, it's Estimated Completion time will always be 8 hours.

I have to tell you, I'm absolutely amazed that you think Boinc scheduling being wrong by 50% one way or 2-300% the other way for the bulk of the time a task is processing - and 100% of the time it's sitting waiting in the cache - is no kind of problem, but losing the odd few seconds or minutes during processing is a big issue. (Talking about my PCs here).

Your standards compared to mine are on different planets.
ID: 109191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109192 - Posted: 26 Apr 2024, 11:42:28 UTC - in response to Message 109189.  

It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
No one in their right mind would think taking 12 hours to 6 hours work is good (which is double the time required- on another project it's taking them 4 times as long).
But that's not what's happening, is it.
Yes, YES, YES, that is exactly what is happening, as i pointed out in one of my earlier posts, that you even quoted in one of yours.

And the same issue is happening with your other projects.
Asteroids- 2hrs Runtime,1hr CPU time.
SIdock- 31.5hrs Runtime, 27hrs 40min CPU time.
Denis- 3hr 40min Runtime, 1hr CPU time.

For reference- CPU time is the amount of time spent by the CPU processing the Task. Runtime- that is the time (think of a clock on the wall) it actually takes to process the Task. From the time it starts running, to the time it finishes & uploads the result.
So for Denis, on his system, Tasks that should take 1 hour to process, it actually takes 3hrs 40 min from the time it starts to the time it ends. 220min for something that should take 60min.
That is exactly what is happening.

It's taking 12hrs to do 6hrs work because it's taking that same 12hrs to run 6hrs of non-Boinc work. So in 12hrs it's doing 6+6hrs work=12hrs.
This isn't a problem, because Adrian (in this case) said both projects are important to him.
It is a problem because that isn't what is happening. It is only taking 2 to 4 times as long to process BOINC Tasks, because they run at a very low priority.
Folding@home runs at a much higher priority, so it isn't affected in any way shape or form.

For Folding it would take 1 hour to do 1 hours worth of work (ie CPU Time= Runtime), where as here at BOINC it's taking from 31.5 hrs to do 27hrs 40min of SIdock work, to taking twice as long to do Rosetta & Asteroids work, to taking almost 4 times as long to do Denis work.

First thing to say is I didn't appreciate Folding runs at a dfferent (normal compared to low I assume) priority to Rosetta or other projects. I assumed they were all low priority.
But to hear folding runs at a higher priority - nominally 1 to 1 CPU to wallclock time - makes me think that's massively better than I thought.
Yes, Denis is particularly bad, Asteroids isn't great - but their tasks are very short so bygones - but Sidock looks pretty good by my standards in that context. If I was getting 1-to-1 for Folding on top of that, I'd be pretty happy.
On the proviso they all meet their respective deadlines.

Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.
First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing.
You may dispute that, but it doesn't make it any less true.
And it needs fixing because the poster keeps complaining about it. If they don't complain about it, then no it doesn't need fixing.
It's not true at all because you're only counting Boinc work as work.
Not at all - Folding is work, watching movies on the computer is work, doing emails is work, editing photos is work, transcoding movies is work, and as far as the Computer is concerned- playing games is work, surfing the internet is work.
And each and every one of these things requires CPU time in order to do it.

In the case of the vast majority of those tasks the amount of CPU time they require is negligible.
In the case of Folding, and transcoding, it is not negligible & it requires a full core/thread (or more to do). If it is being fully used by those programmes, then BOINC trying to make use of it as well will result in that BOINC work running much, much, much slower than it would if it wasn't sharing that core/thread (as the times i reposted above show- up to almost 4 times slower in some cases).

This is all self-evident. But you've missed out where the problem is.
All the things you've pointed out are things you've chosen to do.
And from the outset we all understand that Boinc runs in the gaps when we're not fully utilising our computers, not ever 100% of the time.
And if you chose to do one thing you're prioritising that over Boinc.
Personally I insist on that because if I ever got bogged down in writing or viewing or whatever I'd consider that a big problem.
So if I <only> got the "losses" in task processing time that you later point me to, the first thing I'd think is I'm wasting my time having a computer because I'm not doing anything with it but donating it to distributed computing.
Frankly, I'm not that rich nor generous.
I like distributed computing, but not that much. If I didn't already have a computer for my own needs, I wouldn't buy one to run Boinc (or non-Boinc) tasks.

Unfortunately (because it's boring) I'm going to have to spell it out.
No you don't.
I am fully aware of what the effect of having a fixed Initial Estimated Completion time is when the actual Runtime isn't necessarily anywhere near that.
What you don't seem to be appreciating is that by having the system overcommitted, there will still be situations where problems arise because the Estimated Completion time can never, ever match the actual Runtime, because the Runtime (how long it actually takes) is so far out from the CPU time (how long it should actually take).

Just look at Rosetta 4.20 Tasks. They take 8 hours - which is the default CPU Target time. But because the system is overcommitted it will take 16 hours to actually process that Task.
The Initial Estimated completion time & the actual CPU time match perfectly, but problems will still occur because neither of those times comes close to the actual Runtime- all because the system is over committed.

I think our difference here is in your definition of over-commiting.
There are two causes for this imo. One is over-scheduling (my issue) and the other is under-processing <compared to the ideal> (your issue).
The first is a Boinc issue, the second a user issue.
Rosetta actively misleads Boinc when using non-standard runtimes (and, we now discover, standard/default runtimes).
If that's what the user intended, fine but account for it in the way I said because Boinc is slow to adapt during processing and is prevented from adapting for the cache.
Regarding under-processing, first the user expects a certain amount of that (at my level) and it may only be a consequence of the conflict between Boinc and non-Boinc processing so CPUs are fully utilised.
Imo you only need to 'take a view' on that to ensure your offline cache is small enough to account for the discrepancy - even if it's a big one - because none of it matters, within tasks, as long as deadlines are met.
Any non-processing of particular tasks is taken up either by non-Boinc processing or what you've chosen to do as a user, all of which the user actively prioritises ahead of low-priority Boinc processing and by definition is not a problem.

That's my view. I'm clear it's not yours.
ID: 109192 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2199
Credit: 41,992,905
RAC: 18,065
Message 109193 - Posted: 26 Apr 2024, 12:27:26 UTC - in response to Message 109190.  
Last modified: 26 Apr 2024, 12:31:22 UTC

On my systems, yes but it's a little deceptive.
I have the 5800X at home which is my only PC now. I'm there half the week and do my daily stuff with Boinc in the gaps in the background. When I'm away it runs Boinc 100% which is what you're looking at.
When I'm away (eg right now) I have the i5-9600K at the place I stay. It's 100% Boinc when I'm at home or work, but when I'm using it at night it usually takes 12h30 to 13h to run a 12hr task. I'm fine with that.
And I've set up another PC at work with a different user name as part of my team - an old i7-4770 that does what it does and gets turned off each night.

Losing 20, 30mins per task is fine by me. Boinc is, by definition, an occasional background job making use of downtime, not hogging uptime.
Only losing seconds per task is mental. Losing 20/30/50mins per task shows they're working machines - just as it should be.

Other GPU tasks with other projects may be different, but I don't run them.
I understand the theoretical point. I just don't see any practical difference.
The practical difference is if a GPU application needs a full CPU core/thread for each running Task, and it has to share that core/thread with another Task being processed on the CPU, not only will the CPU processing times suffer, but the GPU output can tank massively.
I can't remember the actual numbers, but a GPU that can process a Task in 4 min with a full core/thread supporting it (if it needs it of course), may take 40min (or more) if it has to share that core/thread with another CPU heavy load.
Only doing 360 Tasks per day when over 3,600 is possible is a pretty poor choice to make.
That is the practical difference.

Only in the self-contained context of that individual task.
If another task is also completing work over those same cores/threads in the same time, you have to add their processing together, not view them separately as, say, two badly running tasks.
Thousands of tasks only sounds bad, because they're 4mins v 40mins each.
I'd want to know the system-wide totals over 24hrs, with contention and without, to know if there were any losses at all.

Are you trying to tell me it's 90% losses? Because I don't believe that at all.
Is it rather single-digit %? I'd guess that's much nearer the mark.
If someone worked it out and it was 1% losses or less, because every second of the day is doing something, I wouldn't automatically disbelieve it before going through the workings.

In short, I think you're making a mountain out of a molehill.
I wouldn't take an entire core out of Boinc processing to give CPU support to a GPU task, because that's 12.5% of an 8C/T or 6.25% of a 16C/T machine and I wouldn't guess the losses from contention would be as high as that.
I'd let them all run - overcommit the CPU in your terms - and let the PC fight it out, knowing it'll do as much as it possibly can without me making assumptions about what it can or can't do that I'll never know in advance.
ID: 109193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,121,026
RAC: 12,565
Message 109194 - Posted: 26 Apr 2024, 18:54:04 UTC - in response to Message 109193.  

The screensaver of "ROSETTAVS_SAVE_ALL_OUT" wus crashes everytime on my Win11 machines...
ID: 109194 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 277
Credit: 523,512
RAC: 500
Message 109195 - Posted: 26 Apr 2024, 18:56:20 UTC - in response to Message 109194.  

It resolves database path incorrectly.
ID: 109195 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109196 - Posted: 26 Apr 2024, 23:08:03 UTC - in response to Message 109191.  
Last modified: 26 Apr 2024, 23:46:53 UTC

I have to tell you, I'm absolutely amazed that you think Boinc scheduling being wrong by 50% one way or 2-300% the other way for the bulk of the time a task is processing - and 100% of the time it's sitting waiting in the cache - is no kind of problem,
And i am absolutely amazed & astounded you would think something that at no stage have i ever said or i suggested.

No where have is said it is not a problem.
What i have said is that it is not as big a problem as you make it out to be. What i have said is it is that it is not the root cause for the High Priority issues. It contributes to it, but it is not the cause.
How on earth do you turn "it is not as big a problem as you make it out to be" in to "is no kind of a problem?"
Seriously? How on earth can you think that???

It is a problem for Scheduling.
But as i keep on repeating because you don't appear to be listening, it's not the cause of the High Priority issue. It's a contributing factor, but not the cause. The cause is the huge discrepancy between CPU time and Run time.



but losing the odd few seconds or minutes during processing is a big issue. (Talking about my PCs here).
Again, seriously??? Did you actually read what i posted there? I'll repost it.


Your two systems
...
A bigger gap between CPU time and Run time, but still not large. Which indicates the systems are getting some non-BOINC use, but not a lot.
How the hell does "but still not large" become "a big issue." Seriously- how???




Please do not try putting works in my mouth or attribute to me things that i have not said in any way, shape or form.
Grant
Darwin NT
ID: 109196 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109197 - Posted: 26 Apr 2024, 23:43:35 UTC - in response to Message 109192.  
Last modified: 26 Apr 2024, 23:49:26 UTC

First thing to say is I didn't appreciate Folding runs at a dfferent (normal compared to low I assume) priority to Rosetta or other projects. I assumed they were all low priority.
But to hear folding runs at a higher priority - nominally 1 to 1 CPU to wallclock time - makes me think that's massively better than I thought.
Yes, Denis is particularly bad, Asteroids isn't great - but their tasks are very short so bygones - but Sidock looks pretty good by my standards in that context. If I was getting 1-to-1 for Folding on top of that, I'd be pretty happy.
On the proviso they all meet their respective deadlines.

This is all self-evident. But you've missed out where the problem is.
All the things you've pointed out are things you've chosen to do.
And from the outset we all understand that Boinc runs in the gaps when we're not fully utilising our computers, not ever 100% of the time.
And if you chose to do one thing you're prioritising that over Boinc.
Personally I insist on that because if I ever got bogged down in writing or viewing or whatever I'd consider that a big problem.
So if I <only> got the "losses" in task processing time that you later point me to, the first thing I'd think is I'm wasting my time having a computer because I'm not doing anything with it but donating it to distributed computing.
Frankly, I'm not that rich nor generous.
I like distributed computing, but not that much. If I didn't already have a computer for my own needs, I wouldn't buy one to run Boinc (or non-Boinc) tasks.
I hear by give up- as you've said above, how efficient your system is (ie how many Tasks are done each day), is of no importance. All that matters is not missing the deadlines.
It's obvious you don't understand what i'm saying, no matter how many ways i try to present it. And what i do say, as other posts you have quoted show, you mis-interpret what is posted (i for the life me cannot understand how "it is not as big a problem as you make it out to be" could be interpreted to mean it "is no kind of a problem" or "but still not large" becomes "a big issue.").





But for anyone else that's been reding these posts-

BOINC makes use of unused computing time.
Running other heavy CPU usage programmes isn't an issue, and if you set BOINC to recognise that there are other heavy usage processes running it won't have an adverse impact on your BOINC processing.
If you limit the number of cores/threads available to BOINC, you will maximise your BOINC processing. You will get the maximum possible amount of work done each day that your system is capable of, you won't have issues with deadlines (unless of course you have inappropriate cache settings), or Panic Mode or any of those types of issues.

So whether you have 2 cores/threads or 256, if you're running other CPU intensive Tasks then set your "Use at most 100 % of the CPUs" to an appropriate value.
If you've got 2 cores/threads, set it to 50%, 256 cores/threads set it to 0.5% (or 1% if it won't accept 0.5), 7% if you have 16 cores/threads. It's not hard to work out.
Then your Tasks will run for as long as they needed to- ie Run time will match (or be damn close to) CPU time, not 1.5, 2 or 4 or more times longer than they need to.

If you don't do much CPU heavy stuff with your system, then there's no need to reserve some cores/threads.
If you're doing considerable non-BOINC work, and how efficient your system is at doing BOINC work is of no importance at all (ie how many Tasks you actually process each day), along with the occasional missed deadline, then don't bother with reserving any cores/threads for non-BOINC work.
Grant
Darwin NT
ID: 109197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109198 - Posted: 26 Apr 2024, 23:47:29 UTC

And now i've done all that, i'll just wait for the next Rosetta server crash to occur.
Grant
Darwin NT
ID: 109198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109199 - Posted: 27 Apr 2024, 1:04:27 UTC
Last modified: 27 Apr 2024, 1:04:55 UTC

Oh, and in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks.
So those with large multicore/thread systems & low amounts of system RAM may have some issues if they get a full load of them.
Grant
Darwin NT
ID: 109199 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 109200 - Posted: 27 Apr 2024, 9:43:19 UTC - in response to Message 109183.  

Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.
Eliminating the reason for the panic mode is the entire solution, everything else is a workaround, which might fail as soon as something changes (new WU type, new project, whatever) or even before.

The root cause reason for panic mode is holding too large an offline cache.
The root cause for the panic mode is highy misconfugured client, too large cache is just a small part of it.


This isn't a problem, because Adrian (in this case) said both projects are important to him.
Than he should configure BOINC properly so it can coexist with Folding without any issues, currently it seems he doesn't really care if BOINC works properly or not.
.
ID: 109200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1759
Credit: 18,534,891
RAC: 318
Message 109201 - Posted: 28 Apr 2024, 0:16:35 UTC - in response to Message 109199.  

Oh, and in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks.
So those with large multicore/thread systems & low amounts of system RAM may have some issues if they get a full load of them.
Getting a few of those Tasks using 2GB of RAM each.
Grant
Darwin NT
ID: 109201 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 275 · 276 · 277 · 278 · 279 · 280 · 281 . . . 316 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org