Our ML server is hosted by a machine with 16 cores with hypterthreading so
it looks like it has 32 cores total.

Our main ML app is configured to use 32 threads with a backlog of 256
. Should we stick to this setting or can we bump up the number of threads
to handle more requests simultaneously ?

My colleague thinks we should match the number of ML threads to the number
of threads accepted by the application servers call ML services. I,
however, think we can't and shouldn't go beyond 32 because that's the
actual number of cores our machine has.

Would appreciate any advice on this matter.

