python|August 02, 2019|4 min read

Multithreading in Python, Lets clear the confusion between Multithreading and Multiprocessing

TL;DR

Python's GIL limits true parallelism with threads to I/O-bound tasks; use multiprocessing for CPU-bound work since each process gets its own memory space and GIL.

Multithreading in Python, Lets clear the confusion between Multithreading and Multiprocessing

So, you want to run your code in parallel so that your can process faster, or you can get better performance out of your code.

Python provides two ways to achieve this:

  1. Multithreading
  2. Multiprocessing

The basic difference between a thread and a process is that a process has a completely isolated memory space for its own purpose. No two process can share the same memory (whether heap or stack). whereas in threads, they can share the same heap memory of the process, but they have separate stack memory. So, chances are higher that they try to access the same variables/data. And, here comes operating system concepts of locks/semaphores.

Python GIL

Many other languages like Java has a great support for multithreading and providing lock mechanisms. But, in python there is a concept of GIL(Global Interpreter Lock) which restrict only one thread at a time to run. Even if you have multi-core CPU. You will not get real benefit from multithreading. But hold on. In simpler terms, this GIL restrict that only one thread can be interpreted at a time. At any point of time, the interpreter is with a single thread.

Multithreading and Multiprocessing

So python developers provided another way for parallelism: Multiprocessing. It allows you to create multiple processes from your program, and give you a behavior similar to multithreading. Since, there will be multiple processes running. Each process will have a different GIL.

Multithreading OR Multiprocessing - Which one to choose

This is a million dollor question, and is quite easy to answer.

  • If your program is CPU bound, then you should go for multiprocessing
  • If your program is IO bound, then you should go for multithreading IO bound process includes waiting for file transfer, doing http calls and waiting for result etc.

Pros and Cons

Multiprocessing Pros

  • Have full separate heap space
  • Code is understandable and simple
  • Since each process has different GIL, this issue will not be problematic.
  • Can take advantage of multiple CPU cores.
  • You can see each process by ps command, and can kill those processes too.

Multiprocessing Cons

  • Creating a process is heavy duty and require more resources than thread
  • This will effect in large memory consumption by your total program
  • Usually communicating between your main process and forked processes is bit tedius and complex.

Multithreading Pros

  • Low memory usage, everything is in same process
  • Each thread will share the heap memory, and hence can access the state of program. Note: This can be a disadvantage too
  • Great for IO bound processing

Multithreading Cons

  • GIL issue
  • Novice programmers find it hard to write thread-safe code.
  • Threads are not killable from outside
  • Synchronization issues/Deadlocks can happen

Multithreading Code

from threading import Thread
from time import sleep

def foo(n):
    for i in range(n):
        print('foo ', i)
        sleep(1)

def bar(n):
    for i in range(n):
        print('bar ', i)
        sleep(1)

t1 = Thread(target=foo, args=(3,))
t2 = Thread(target=bar, args=(5,))

t1.start()
t2.start()

t1.join()
print('foo finish')
t2.join()
print('bar finish')

Output

foo  0
bar  0
foo  1
bar  1
foo  2
bar  2
bar  3
foo finish
bar  4
bar finish

Note: If you do not use join() method. You will see foo finish, bar finish statements executed early. join() statement is used to wait for a thread to finish processing its task.

Problem with Many threads

Above code is simple when you are dealing with 1 or 2 threads. But, what if you are playing around with 10 or 20 threads. You probably would be using a list, and managing their instances and then deal with each individual join methods. It can become complex.

Thread Pool - ThreadPoolExecutor

For reference visit: https://docs.python.org/3/library/concurrent.futures.html

Python provides a manager kind of class which manages n number of threads, and you just have to manage single instance of that pool executor.


import time
from concurrent.futures import ThreadPoolExecutor as Executor

def square(a):
    time.sleep(1)
    return a*a

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

with Executor(max_workers=8) as workers:
    res = workers.map(square, data)
    print(list(res))

print('Finish')

Output

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Finish

Multiprocessing

from multiprocessing import Process
from time import sleep

def foo(n):
    for i in range(n):
        print('foo ', i)
        sleep(1)

def bar(n):
    for i in range(n):
        print('bar ', i)
        sleep(1)

t1 = Process(target=foo, args=(3,))
t2 = Process(target=bar, args=(5,))

t1.start()
t2.start()

t1.join()
print('foo finish')
t2.join()
print('bar finish')

This code is almost similar to thread code. Just that we have used multiprocessing module.

ProcessPoolExecutor

In above code, just replace ThreadPoolExecutor with ProcessPoolExecutor

import time
from concurrent.futures import ProcessPoolExecutor as Executor

def square(a):
    time.sleep(1)
    return a*a

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

with Executor(max_workers=8) as workers:
    res = workers.map(square, data)
    print(list(res))

print("Done")

And, the output will be same. It is jsut that multiple processes will be launched.

Related Posts

Python SMTP Email Code - How to Send HTML Email from Python Code with Authentication at SMTP Server

Python SMTP Email Code - How to Send HTML Email from Python Code with Authentication at SMTP Server

Introduction This post has the complete code to send email through smtp server…

Python - How to Maintain Quality Build Process Using Pylint and Unittest Coverage With Minimum Threshold Values

Python - How to Maintain Quality Build Process Using Pylint and Unittest Coverage With Minimum Threshold Values

Introduction It is very important to introduce few process so that your code and…

Python - How to Implement Timed-Function which gets Timeout After Specified Max Timeout Value

Python - How to Implement Timed-Function which gets Timeout After Specified Max Timeout Value

Introduction We often require to execute in timed manner, i.e. to specify a max…

How to Solve Circular Import Error in Python

How to Solve Circular Import Error in Python

Introduction To give some context, I have two python files. (Both in same folder…

Python Code - How To Read CSV with Headers into an Array of Dictionary

Python Code - How To Read CSV with Headers into an Array of Dictionary

Introduction Lets assume we have a csv something similar to following: Python…

Python Code - How To Read CSV into an Array of Arrays

Python Code - How To Read CSV into an Array of Arrays

Introduction In last post, we saw How to read CSV with Headers into Dictionary…

Latest Posts

Claude Code Skills — Build a Better Engineering Workflow with AI-Powered Code Reviews, Security Scans, and More

Claude Code Skills — Build a Better Engineering Workflow with AI-Powered Code Reviews, Security Scans, and More

Most developers use Claude Code like a search engine — ask a question, get an…

Building an AI Voicebot for Visitor Check-In — A Practical Guide to Handling the Messy Parts

Building an AI Voicebot for Visitor Check-In — A Practical Guide to Handling the Messy Parts

Every office lobby has the same problem: a visitor walks in, nobody’s at the…

Server Security Best Practices — Complete Hardening Guide for Production Systems

Server Security Best Practices — Complete Hardening Guide for Production Systems

Every breach post-mortem tells the same story: an unpatched service, a…

Staff Engineer Study Plan for MAANG Interviews — The Complete 12-Week Roadmap

Staff Engineer Study Plan for MAANG Interviews — The Complete 12-Week Roadmap

If you’re a Senior Engineer (L5) preparing for Staff (L6+) roles at MAANG…

XSS and CSRF Explained — The Complete Guide with Real Attack Examples and Defenses

XSS and CSRF Explained — The Complete Guide with Real Attack Examples and Defenses

XSS and CSRF have been in the OWASP Top 10 for over a decade. They’re among the…

OWASP Top 10 (2021) — Every Vulnerability Explained with Code

OWASP Top 10 (2021) — Every Vulnerability Explained with Code

The OWASP Top 10 is the industry standard for web application security risks. If…