Standalone implementation
Standalone crawler implementaion
Different coordination mechanisms in multithreads:
sleep: Stop a random interval and come back to see whether the resource is available to use.
condition variable: As soon as the resource is released by other threads, you could get it immediately.
semaphore: Allowing multiple number of threads to occupy a resource simultaneously. Number of semaphore set to 1.
Note: More threads doesn't necessarily mean more performance. The number of threads on a single machine is limited because:
Context switch cost ( CPU number limitation )
Thread number limitation
TCP/IP limitation on number of threads
Network bottleneck for single machine
Problematic impl with lock
Problems of this implementation:
Consumers could not identify queue empty state and continue running.
Correct behavior:
When there was nothing in the queue, consumer should have stopped running and waited instead of continuing consuming from the queue.
And once producer adds something to the queue, there should be a way for it to notify the consumer telling it has added something to queue.
First workable solution with Condition
Use case of condition: Condition object allows one or more threads to wait until notified by another thread.
Consumer should wait when the queue is empty and resume only when it gets notified by the producer.
Producer should notify only after it adds something to the queue.
Internal mechanism of condition: Condition uses a lock internally
A condition has acquire() and release() methods that call the corresponding methods of the associated lock internally.
Consumer needs to wait using a condition instance and producer needs to notify the consumer using the same condition instance.
Threadsafe queue
Queue encapsulates the behaviour of Condition, wait(), notify(), acquire() etc.
Reference
Last updated