A Starter’s Guide to Parallel and Multiprocessing Python

Author

Sonali Rawat

Created

July 24, 2022April 24, 2024

Updated

April 24, 2024July 24, 2022

Reading time

12 min

Views

Categories: Technology

A Starter Guide to Parallelism

This is an important idea in computer science that we need to talk about before we start writing Python code.

Most of the time, when you run a Python script, the code turns into a process, which uses a single core of your CPU to run. But modern computers have more than one core. What if you could do your work with more cores? It turns out that your maths will work faster.

For now, let’s just say this as a general rule. We’ll see later in the piece that this isn’t always true.

To sum up, parallelism means writing your code in a way that makes it possible for it to use more than one CPU core.

Let’s look at an example to help you understand.

Computing in Parallel and Serial

You’re by yourself and you need to solve a big problem. You have to find the square root of eight different numbers. How do you do it? It looks like you don’t have many choices. You start with the first number and figure out what the answer is. After that, you move on to the next ones.

What if three math-savvy friends are ready to help you? Each of them will find the square root of two numbers, which will make your job easier since the work is split evenly among your friends. This will help you get your problem fixed faster.

Okay, so everything is clear? Each friend in these cases stands for a CPU core. In the first example, you do the whole job one step at a time. This is known as serial computing. Because you’re using four cores in the second case, you’re using parallel computing. Parallel computing uses processes that run at the same time or processes that are split up and run on different processor parts.

Models for programming in parallel

Now that we know what parallel programming is, let’s look at how to use it. We already said that parallel computing means running multiple jobs on different processor cores at the same time. Before you start with parallelization, there are a few things you should think about. For instance, are there any other tweaks that could make our calculations go faster?

Let us assume for now that parallelization is the best choice for you. In parallel computing, there are three main types of models:

Exactly the same. The jobs can be done separately, and they don’t need to talk to each other.
Parallelism with shared memory. A global address space is used by all processes (or threads) to talk to each other.
sending a message. When they need to, processes need to share messages.

We’ll show the first model, which is also the simplest, in this piece.

How to use process-based parallelism in Python for multiprocessing

The multiprocessing function is one way to use Python to do parallelism. You can make multiple processes, and each one has its own Python interpreter, with the multiprocessing tool. This is why Python multiprocessing makes process-based parallelism possible.

You may have heard of other tools, like threading, which is also part of Python. However, there are important differences between them. New processes are made by the multiprocessing module, and new threads are made by the threading module.

We’ll talk about the pros of multiprocessing in the next part.

Pros of Using Multiple Processors

Here are some good things about multiprocessing:

better use of the CPU when doing chores that use a lot of CPU power
better power over a child than threads
simple to code

The first benefit has to do with how well it works. To get the most out of your CPU’s working power, split your work between the other cores. This is possible because multiprocessing creates new processes. These days, most computers have more than one core, and if you optimise your code, you can save time by doing calculations at the same time.

Multiple threads are used instead of multiple processors in the second benefit. What this means is that threads are not processes, and this has effects. Not stopping or killing a thread, like you would with a normal process, is not a good idea after you start it. This article doesn’t compare multiprocessing and multithreading, so I suggest you read more about it.

In addition to these benefits, multiprocessing is also very simple to set up if the job you want to do can be done in parallel.

How to Start Multiprocessing with Python

Now we can write some Python code!

We’ll start with a very simple example that will show you the most important parts of Python multiprocessing. We’ll have two processes in this case:

The main process. There is only one parent process, but it can have more than one kid.
The process for children. This is what the dad made. Every child can also have more children.

The child process will be used to run a certain code. This way, the parent can keep running its code.

A simple example of multiprocessing in Python

In this case, we’ll use the following code:

from multiprocessing import Process

def bubble_sort(array):
    check = True
    while check == True:
      check = False
      for i in range(0, len(array)-1):
        if array[i] > array[i+1]:
          check = True
          temp = array[i]
          array[i] = array[i+1]
          array[i+1] = temp
    print("Array sorted: ", array)

if __name__ == '__main__':
    p = Process(target=bubble_sort, args=([1,9,4,5,2,6,8,4],))
    p.start()
    p.join()

We set up a function called bubble_sort(array) in this code piece. Bubble Sort is a really simple sorting method that is used in this function. It’s not that important if you don’t know what it is. What you need to know is that it’s a function that does something.

The class Process

We bring in the class Process from multiprocessing. This class is an activity that will be run in a different way. In fact, you can see that we’ve won a few points:

Our new process will run the bubble_sort function because target=bubble_sort.
that was passed as an argument to the target method is args=([1,9,4,52,6,8,4]).

We only need to start the process after making an instance of the Process class. Writing p.start() does this. This is the beginning of the process.

We have to wait for the child process to finish its work before we can leave. The method join() waits for the process to end.

We’ve only made one kid process in this case. As you might guess, we can make more kid processes by giving the Process class more instances.

The class Pool

What if we need to make more than one process to handle jobs that use a lot of CPU power? Do we have to start and wait for end all the time? The Pool class is what you need to do here.

You can set up a pool of worker tasks with the Pool class. In the next example, we’ll see how to use it. Here’s our new example:

from multiprocessing import Pool
import time
import math

N = 5000000

def cube(x):
    return math.sqrt(x)

if __name__ == "__main__":
    with Pool() as pool:
      result = pool.map(cube, range(10,N))
    print("Program finished!")

This piece of code has a method called cube(x) that takes an integer and returns its square root. Simple, right?

Then, we make a Pool class object without giving it any attributes. By default, the pool class makes one process for each CPU core. After that, we call the map method with some extra information.

The map method takes an iterable and applies the cube function to each item in it. In this case, the iterable is a list of all the numbers from 10 to N.

This is very helpful because it lets the math on the list be done at the same time!

How to Get the Most Out of Python Multiprocessing

Making multiple processes and doing calculations in parallel isn’t always faster than doing calculations one at a time. Serial computation is faster than parallel computation for jobs that don’t use a lot of CPU power. This is why it’s important to know when to use multiprocessing—it depends on the work you’re doing.

Here’s a simple example that will show you what I mean:

from multiprocessing import Pool
import time
import math

N = 5000000

def cube(x):
    return math.sqrt(x)

if __name__ == "__main__":
    # first way, using multiprocessing
    start_time = time.perf_counter()
    with Pool() as pool:
      result = pool.map(cube, range(10,N))
    finish_time = time.perf_counter()
    print("Program finished in {} seconds - using multiprocessing".format(finish_time-start_time))
    print("---")
    # second way, serial computation
    start_time = time.perf_counter()
    result = []
    for x in range(10,N):
      result.append(cube(x))
    finish_time = time.perf_counter()
    print("Program finished in {} seconds".format(finish_time-start_time))

This piece of code is based on the last one. There are two ways to solve the same problem, which is to find the square root of N numbers. In the first one, Python multiprocessing is used, but not in the second one. To check how fast the time is running, we use the perf_counter() method from the time library.

This is what I get on my laptop:

> python code.py
Program finished in 1.6385094 seconds - using multiprocessing
---
Program finished in 2.7373942999999996 seconds

You can see that the gap is longer than one second. In this case, having more than one processor is better.

Let’s change the number of N or something else in the code. Let us set N to 100,000 and see what happens.

Now this is what I get:

> python code.py
Program finished in 0.3756742 seconds - using multiprocessing
---
Program finished in 0.005098400000000003 seconds

What took place? Multiprocessing doesn’t look like a good idea right now. Why?

Compared to the work that is done, the extra work that is needed to split the calculations between the processes is too much. You can see how different the times are.

In conclusion

This piece talked about how Python multiprocessing can be used to make Python code run faster.

We began by quickly going over what parallel computing is and the main ways it can be used. Then we talked about multiprocessing and the good things about it. In the end, we saw that working on multiple jobs at once isn’t always the best idea. For tasks that can only be done by one CPU, the multiprocessing module should be used. It’s always important to think about the unique problem you’re having and weigh the pros and cons of each solution.

I hope it was helpful for you to learn about Python multiprocessing.

What You Need to Know About Multiprocessing and Parallel Programming in Python

What’s the best thing about using multiprocessing in Python?

The main benefit of using multiprocessing in Python is that it lets you run more than one process at the same time. This is especially helpful when working on jobs that use a lot of CPU power because it lets the programme use multiple cores of the CPU, which makes the programme much faster and more efficient. In Python, the Global Interpreter Lock (GIL) doesn’t affect multiprocessing like it does threading. This means that each process can run on its own without being affected by other processes. Because of this, multiprocessing is a great way to run Python programmes in parallel.

How does Python’s Multiprocessing Module work?

Python’s multiprocessing module makes a new process for each job that needs to be run at the same time. Because each process has its own memory space and Python engine, it can run without other processes getting in the way. There are many classes and methods in the multiprocessing module that make it easy to set up and control these processes. The Pool class is used to handle a group of worker processes, while the Process class is used to make a new process.

What’s the Difference Between Many Processes and Many Threads in Python?

In Python, multiprocessing and multithreading are different in how they handle jobs. Multithreading adds a new thread to the same process, while multiprocessing makes a new process for each job. If you have more than one CPU core, you can use them all for multiprocessing. But if you only want to use one for multithreading, the Global Interpreter Lock (GIL) in Python stops you. Even so, multithreading can still be useful for I/O-bound jobs, where the programme waits for operations to finish before moving on to the next one.

How can I copy data from one Python process to another?

The multiprocessing module in Python has shared memory methods that can be used to let processes share data. Some of these are the Value and Array classes, which let you make shared variables and collections, respectively. As you can see, each process has its own memory space. This means that changes made to shared variables or arrays in one process will not show up in other processes unless they are explicitly synchronised using locks or other synchronisation primitives provided by the multiprocessing module.

What are some bad things that could happen if you use multiprocessing in Python?

When you use multiprocessing in Python, your programme will run faster and more efficiently, but it also has some problems that you need to solve. One of the big problems is that your code gets more complicated. Keeping track of multiple processes can be harder than keeping track of a single-threaded programme, especially when it comes to sharing data and keeping processes in sync. Also, starting a new process uses more resources than starting a new thread, which can cause more memory to be used. Lastly, parallelizing not all jobs is a good idea, and sometimes the extra work of setting up and managing multiple processes can be worse than the possible performance gains.

If I use multiprocessing in Python, how do I handle exceptions?

When Python is multiprocessing, it can be hard to handle exceptions because exceptions that happen in child processes don’t always get sent to the parent process. But the multiprocessing tool gives you more than one way to deal with errors. The is_alive() method of the Process class can be used to see if a process is still running. What does it mean if the method returns False? It could mean that an exception caused the process to end. Another way is to use the exitcode attribute of the Process class. This can give you more details about why a process ended.

Is it possible to use multiprocessing with other Python libraries?

If you use other Python tools, you can use multiprocessing too. But it’s important to keep in mind that not all tools are made to work with multiple processors. Some tools might not be thread-safe or might not let you run multiple tasks at the same time. So, before you use a library, you should always check its instructions to see if it supports many processes.

In Python, how do I debug a programme that uses more than one processor?

Most debugging tools may not work right in a multiprocessing situation, which can make it hard to fix a Python programme that uses more than one processor. You can, however, fix your programme in a number of different ways. One way to keep track of how your programme is run is to use print statements or logging. You can also use the set_trace() method in the pdb module to fix bugs in your code. There are also debugging tools that are designed to work with multiple processes. For example, the multiprocessing module has a method called log_to_stderr() that lets you write information about what your processes are doing to the standard error.

When I use multiprocessing in Python, can I do it on different computers?

Yes, you can use Python’s multiprocessing on more than one machine. The standard Python library includes the multiprocessing module. This means that it can be used on any device that supports Python. Different operating systems may handle processes in slightly different ways, so the multiprocessing module may not work the same way in all of them. Because of this, you should always test your programme on the intended OS to make sure it works as planned.

What are some good ways to use Python’s multiprocessing?

Here are some of the best ways to use multiprocessing in Python:
– When you can, don’t let processes share info with each other. This can cause sync problems that are hard to fix.
– To handle your worker processes, use the Pool class. It has a higher-level interface that makes it easier to set up and run processes.
– To end a process, you should always call the join() method of the Process class. This makes sure that the process is finished before the programme moves on.
– Make sure you handle errors correctly so your programme doesn’t crash without warning.
Do a lot of tests on your programme to make sure it works right in a multiprocessing setting.