Intro to Parallel

parallel makes the process of writing parallel code simple and enjoyable, bringing parallelism closer to mainstream developers. parallel is NOT a pipeline library (check Dask or Luigi for that).

parallel is inspired (and perfectly summarized) by Scala's parallel collections:

An effort to facilitate parallel programming by sparing users from low-level parallelization details, meanwhile providing them with a familiar and simple high-level abstraction.

The hope was, and still is, that implicit parallelism behind a high-level abstraction will bring reliable parallel execution one step closer to the workflow of mainstream developers.

Installation (keep on reading for examples):

$ pip install python-parallel

parallel makes it extremely simple to parallelize functions with features like:

  • Multi Threading and Multi Processing support
  • *args, **kwargs, default arguments and extras
  • Error handling and silencing
  • Automatic retries
  • Non-blocking API

Here's a quick look at what you can achieve with parallel:

def download_and_store(url):
    resp = requests.get(url)
    result = store_in_db(resp.json())
    return result

urls = [
    'https://python.org',
    'https://python-requests.com',
    'https://rmotr.com'
]

# instant parallelism (Threads used by default)
results = parallel.map(
    download_and_store,  # the function to invoke
    urls,                # parameters to parallelize
    timeout=5,           # timeout per thread
    max_workers=4        # max threads
)

# results are ordered and are logically organized as you'd expect
results == [
    'https://python.org',
    'https://python-requests.com',
    'https://rmotr.com']

parallel is designed to transparently provide support for multithreading and multiprocessing. Switch between executors with just an argument:

# Using multi-threading
results = parallel.thread.map(download_and_store, urls)

# Using multi-processing
results = parallel.process.map(download_and_store, urls)

If a certain function needs recurrent parallelization, you can choose to decorate it:

@parallel.decorate
def download_and_store(url):
    # ...

results = download_and_store.map(  # parallelize the function directly
    urls,                          # parameters to parallelize
    timeout=5,                     # timeout per thread
    max_workers=4                  # max threads
)

and the function can still be used normally:

res = download_and_store('https://rmotr.com')

All the functionality of parallel is presented with a blocking version (default) and non-blocking one:

def download_and_store(url):
    pass

urls = ['https://python', 'https://rmotr.com', '...']

# instant parallelism (Threads used by default)
with parallel.async_map(download_and_store, urls, timeout=5) as ex:
    # do something else while parallel processes
    values = db.read_data()
    # access results when needed (this might block)
    results = ex.results()

# resources are cleaned up when the context manager exits

Blocking vs Non-blocking

Please don't confuse non-blocking with asynchronous execution (provided by the async module). Non blocking in this context means that your threads (or processes) are executed in the background, and you can keep working in the main thread meanwhile.

If you have multiple different functions that can be parallelized, you can use the parallel.par function:

def fetch_album_info(album): ...
def fetch_artist_info(artist): ...
def fetch_song_list(album, prefetch_mp3=False, page_size=None): ...

results = parallel.par({
    'album': (fetch_album_info, 'Are You Experienced')
    'artist': parallel.future(fetch_artist_info, artist='Jimi Hendrix'),
    'songs': parallel.future(fetch_song_list, 'Are You Experienced', page_size=10),
})

results['songs']

Next steps

Check out the Quickstart for more examples. If you want to contribute, please head for the Github repo or submit an issue.