Intro to Parallel
parallel
makes the process of writing parallel code simple and enjoyable, bringing parallelism closer to mainstream developers. parallel
is NOT a pipeline library (check Dask or
Luigi for that).
parallel
is inspired (and perfectly summarized) by
Scala's parallel collections:
An effort to facilitate parallel programming by sparing users from low-level parallelization details, meanwhile providing them with a familiar and simple high-level abstraction.
The hope was, and still is, that implicit parallelism behind a high-level abstraction will bring reliable parallel execution one step closer to the workflow of mainstream developers.
Installation (keep on reading for examples):
$ pip install python-parallel
parallel
makes it extremely simple to parallelize functions with features like:
- Multi Threading and Multi Processing support
*args
,**kwargs
, default arguments and extras- Error handling and silencing
- Automatic retries
- Non-blocking API
Here's a quick look at what you can achieve with parallel
:
def download_and_store(url): resp = requests.get(url) result = store_in_db(resp.json()) return result urls = [ 'https://python.org', 'https://python-requests.com', 'https://rmotr.com' ] # instant parallelism (Threads used by default) results = parallel.map( download_and_store, # the function to invoke urls, # parameters to parallelize timeout=5, # timeout per thread max_workers=4 # max threads ) # results are ordered and are logically organized as you'd expect results == [ 'https://python.org', 'https://python-requests.com', 'https://rmotr.com']
parallel
is designed to transparently provide support for multithreading and multiprocessing. Switch between executors with just an argument:
# Using multi-threading results = parallel.thread.map(download_and_store, urls) # Using multi-processing results = parallel.process.map(download_and_store, urls)
If a certain function needs recurrent parallelization, you can choose to decorate it:
@parallel.decorate def download_and_store(url): # ... results = download_and_store.map( # parallelize the function directly urls, # parameters to parallelize timeout=5, # timeout per thread max_workers=4 # max threads )
and the function can still be used normally:
res = download_and_store('https://rmotr.com')
All the functionality of parallel
is presented with a blocking version (default) and non-blocking one:
def download_and_store(url): pass urls = ['https://python', 'https://rmotr.com', '...'] # instant parallelism (Threads used by default) with parallel.async_map(download_and_store, urls, timeout=5) as ex: # do something else while parallel processes values = db.read_data() # access results when needed (this might block) results = ex.results() # resources are cleaned up when the context manager exits
Blocking vs Non-blocking
Please don't confuse non-blocking with asynchronous execution (provided by the async
module).
Non blocking in this context means that your threads (or processes) are executed in the background,
and you can keep working in the main thread meanwhile.
If you have multiple different functions that can be parallelized, you can use the parallel.par
function:
def fetch_album_info(album): ... def fetch_artist_info(artist): ... def fetch_song_list(album, prefetch_mp3=False, page_size=None): ... results = parallel.par({ 'album': (fetch_album_info, 'Are You Experienced') 'artist': parallel.future(fetch_artist_info, artist='Jimi Hendrix'), 'songs': parallel.future(fetch_song_list, 'Are You Experienced', page_size=10), }) results['songs']
Next steps
Check out the Quickstart for more examples. If you want to contribute, please head for the Github repo or submit an issue.