Standalone implementation

Standalone crawler implementaion

  • Different coordination mechanisms in multithreads:

    • sleep: Stop a random interval and come back to see whether the resource is available to use.

    • condition variable: As soon as the resource is released by other threads, you could get it immediately.

    • semaphore: Allowing multiple number of threads to occupy a resource simultaneously. Number of semaphore set to 1.

  • Note: More threads doesn't necessarily mean more performance. The number of threads on a single machine is limited because:

    • Context switch cost ( CPU number limitation )

    • Thread number limitation

      • TCP/IP limitation on number of threads

    • Network bottleneck for single machine

Problematic impl with lock

  • Problems of this implementation:

    • Consumers could not identify queue empty state and continue running.

  • Correct behavior:

    • When there was nothing in the queue, consumer should have stopped running and waited instead of continuing consuming from the queue.

    • And once producer adds something to the queue, there should be a way for it to notify the consumer telling it has added something to queue.

First workable solution with Condition

  • Use case of condition: Condition object allows one or more threads to wait until notified by another thread.

    • Consumer should wait when the queue is empty and resume only when it gets notified by the producer.

    • Producer should notify only after it adds something to the queue.

  • Internal mechanism of condition: Condition uses a lock internally

    • A condition has acquire() and release() methods that call the corresponding methods of the associated lock internally.

    • Consumer needs to wait using a condition instance and producer needs to notify the consumer using the same condition instance.

Threadsafe queue

  • Queue encapsulates the behaviour of Condition, wait(), notify(), acquire() etc.

Reference

Last updated

Was this helpful?