A Hitchhiker’s Guide to Asynchronous Programming

BiteCode_dev · on Feb 29, 2020

I get that this guide tries to ease you in into low level concurrency concepts.

However, if you try to just get work done in Python, this is not what you want.

Don't do threads/processes yourself, use pools:

    import random
    import time
    from concurrent.futures import ProcessPoolExecutor, as_completed

    def hello():
        seconds = random.randint(0, 5)
        print(f"Start blocking for {seconds}s")
        time.sleep(seconds)
        print(f"Stopped blocking after {seconds}s")
        return seconds

    if __name__ == "__main__":

        with ProcessPoolExecutor(max_workers=2) as exec:

            a = exec.submit(hello)
            b = exec.submit(hello)

            for future in as_completed((a, b)):
                print(future.result())

And don't manage the loop yourself. Use Python 3.7, and replace:

    loop = asyncio.get_event_loop()
    loop.run_until_complete(loop.create_task(foo()))
    loop.close()

With: asyncio.run(foo())

The code is not just shorter, it is way, wayyyyyyyyyyyy, more correct.

Also don't program asyncio by hand. Use a lib. E.G: wanna do http, use aiohttp.

This is Python, don't suffer more than you need to.

crazyguitar · on Feb 29, 2020

I agree with you. We should use a reliable library, as you said. The primary purpose of this article is to help people understand what a coroutine and an event loop are. Therefore, programmers can use asyncio API fluently without misuse.

However, I don't think use threads/processes is a bad idea. A pool gives you a constrain to utilize threads/processes, but sometime we may want to adjust the number of threads/processes based on system load. Under this circumstance, using a pool is not the best choice.

heavenlyblue · on Feb 29, 2020

You’re sending the poor newbie on a journey of self-discovering of picklable-non-picklable, passing of arguments back and forth, working with Queues (which have well-known yet undocumented race conditions), missing exception stack traces due to dead processes and all sorts of useless garbage they don’t need to know about.

Also what exactly would that newbie be building that starts and stops threads depending on the system load? What kind of a contraption is that? What are you doing?

Finallizing all of the above: under the circumstance you mentioned, you should check whether you have just seriously over-architected the solution.

crazyguitar · on March 1, 2020

I understand you are worried about newbies misuse APIs. You remind me that I should add a warning to inform the sample code in this article should not use in programs. Thanks.

Also, I did not advocate a newbie should start and stop threads by themselves. I want to say I agree that we should use high-level APIs in most cases, but, in some cases, we may need to use low-level APIs to achieve our missions. I am unwilling to limit what kind of APIs should use. In my opinion, like you said: "you have just seriously over-architected the solution," we should be careful to use APIs. Even though high-level APIs are safer, programmers may misuse them.

BiteCode_dev · on Feb 29, 2020

Just create a new pool and close the previous one, depending of the system load.

crazyguitar · on Feb 29, 2020

I think this is not an excellent solution. You have to wait for all threads finish. I agree that we should use high-level API in most of the time, but, in some cases, we still need low-level APIs to support us to reach some goals.

skrtskrt · on Feb 29, 2020

+1 for this, Aio libs make all of this so effortless - aiohttp, aiopg, aiomysql, aiobotocore, I’m about to try aiokafka next.

Just an effortless programming experience.

rhizome31 · on Feb 29, 2020

Also, if you do a lot of concurrent programming, you should consider platforms with lightweight processes (as provided by Erlang/Elixir, among others). I find code based on this paradigm much easier to write and debug than async code and it comes with additional benefits such as error isolation and the ability to parallelize CPU-bound tasks.

leetrout · on Feb 29, 2020

> Obviously, A coroutine is just a term to represent a task that is scheduled by an event-loop in a program instead of operating systems.

This full of less-than-ideal technical writing like this example.

crazyguitar · on Feb 29, 2020

I understand. I will review my contents persistently. BTW, if you are available, could you give me some writing tips? Thank you so much.

leetrout · on Feb 29, 2020

https://styleguide.mailchimp.com/voice-and-tone/ (previously voiceandtone.com)

https://mkaz.blog/misc/notes-on-technical-writing/

https://spin.atomicobject.com/2014/09/09/never-use-the-word-...

https://developers.google.com/tech-writing/overview

crazyguitar · on Feb 29, 2020

Awesome! Thank you

tylerl · on Feb 29, 2020

I think the problem they're printing out isn't the writing, it's the content. Coroutines and event loops are two independent concepts. And scheduling tasks cooperatively in userspace is another concept still. You've got three different concepts: coroutines, event loops, and cooperative multitasking; and you're saying they're the same thing.

crazyguitar · on Feb 29, 2020

Oh! I understand. You're right. Some descriptions mix coroutines and event loops together. thx

sk0g · on Feb 29, 2020

Maybe "for Python" appended to the title might be handy.

crazyguitar · on Feb 29, 2020

I agree.

Myrmornis · on Feb 29, 2020

Here are two good reads on asynchronous programming

http://krondo.com/an-introduction-to-asynchronous-programmin...

https://nullprogram.com/blog/2019/03/10/

crazyguitar · on Feb 29, 2020

Nice. Thanks

dang · on Feb 29, 2020

Please don't put "Show HN" on reading material. It's against the rules (https://news.ycombinator.com/showhn.html) because if it were allowed, everyone would put Show HN on everything.

crazyguitar · on Feb 29, 2020

Oh! sorry! thx

hnews_account_1 · on Feb 29, 2020

This is only marginally related to the article in general, but python's implementation of concurrency and multi threading is fantastic in my experience. It took me literally a full 3 hours to get the basic hang of it, and I went from that to writing embarrassingly parallel code to do very large data operations in a matter of weeks.

Not to sound ignorant, but I had zero idea about semaphores and locking even a month into using their implementation and my code worked perfectly. Big fan of that library since my work involves both querying REST APIs for data and doing computationally intensive operations on it. My cloud system is very low grade but with GIL, what now takes 12 minutes to complete on a good day would've taken literal hours to finish if I'd written it serially.

RossBencina · on Feb 29, 2020

I thought that Python's GIL (Global Interpreter Lock) precluded implementing parallel code in Python. Has something changed recently?

BiteCode_dev · on Feb 29, 2020

Python always could use multiprocessing to do parallel processing leveraging several CPU.

However, it became especially easy with Python 3.2 (10 years ago) which introduced the ProcessPoolExecutor (https://docs.python.org/dev/library/concurrent.futures.html#...):

    import random
    import time
    from concurrent.futures import ProcessPoolExecutor, as_completed


    def hello():
        seconds = random.randint(0, 5)
        print(f"Start blocking for {seconds}s")
        time.sleep(seconds)
        print(f"Stopped blocking after {seconds}s")
        return seconds

    if __name__ == "__main__":

        with ProcessPoolExecutor(max_workers=2) as exec:

            a = exec.submit(hello)
            b = exec.submit(hello)

            for future in as_completed((a, b)):
                print(future.result())

The same exact same API exist for thread BTW. I don't think a tutorial should introduce you to concurrency using manual process/thread management any more. I makes no sense to me.

You may note that multiprocessing still eats more RAM than with typical Go/Rust code since max_workers=n means n+1 Python VM spawing but on modern servers you don't really feel it. That's what most WSGI setup do anyway.

Now, before 2019, there was one more use case that wasn't served well: how do you share some computation between isolated processes that need to communicate to make their work ? Typical use case was people using numpy or pandas crunching numbers that depended on each others. Indeed, communicating between processing using piping is expensive given the cost of message passing serialization.

However, in the previous Python release (3.8), we introduced a mechanism to share memory for almost free (https://docs.python.org/3/library/multiprocessing.shared_mem...):

    from multiprocessing.managers import SharedMemoryManager

    with SharedMemoryManager() as smm:
        sl = smm.ShareableList(name="unique_name", range(2000))

The sl object can then contain int, float, bool, str, bytes and None and its reference can be shared among processes. Each item can be deleted and replaced. You an also get an hold on sl by using the "unique_name" if you don't have the reference at hand.

There is a raw ShareableMemory object for stuff like numpy/pandas array buffers if this is your main concern.

hnews_account_1 · on Feb 29, 2020

Thank you for this. I did not know of this change in implementation in Python 3.8.

Are you a core dev btw? I have a complaint about single threads and processes and how they have absolutely no way to return values back to the main thread except through some shared memory object. Am I too ignorant to understand how big of a challenge it is to do this?

  from threading import Thread

  def func1(*args):
     #something

  def main():
     new_thread, return_val = Thread(target=func1, args=(1, 2))
     new_thread.start()
     new_thread.join()
     print(return_val)

Instead to implement this, I keep having to use like single process pools that already have return mechanisms encoded. All I need is for a way of starting off a thread (or process) and joining it once my main thread / process is done and retrieving its return value for my use (assuming there is one).

BiteCode_dev · on March 1, 2020

Not a core dev.

The clean way to return a value is to pass it to a Queue (https://docs.python.org/fr/3/library/queue.html), and this is what the executor does, but for your use case, it's overkill.

For very simple use cases, you can inherit from Thread and force join() to return the value:

    class SimpleReturningThread(Thread):
        result = None
        def run(self):
            try:
                if self._target:
                    self.result = self._target(*self._args, **self._kwargs)
            finally:
                del self._target, self._args, self._kwargs
        def join(self, *args, **kwargs):
            super().join(*args, **kwargs)
            return self.result

Personally, I'd stick to using a pool with only one worker in it. It's not worth the trouble of doing all this work. Wrap it in a few functions if you do the same thing repeatedly and call it a day, it's unlikely going to be what most your code is about anyway.

RossBencina · on March 1, 2020

Thanks for the detailed answer.

Correct me if I'm wrong, but from what you've said I take it that the GIL issue has not been resolved (i.e. the Python interpreter is not thread-safe, Python still can't have a single process, with a single Python interpreter, running multiple CPU-bound threads all doing useful compute in Python, running against a single unified local memory space containing all of the objects, as you would in say C or C++). The only option is to resort to more elaborate schemes involving multiple processes, multiple interpreters, and explicitly managed shared memory windows -- but all of these options are in place now, so there is at least a solution to most common use cases, even if it's not as simple as in other languages with proper support for threads and shared local memory.

BiteCode_dev · on March 1, 2020

There are been many attempt to create a different design without the GIL. None of them made it to cPython.

The whole language assume it's safe to mutate things around. E.G: list.append() is assumed to be thread safe.

ndr · on Feb 29, 2020

GIL is a problem only if you're CPU bound. For I/O bound problems Python can get you a long way.

RossBencina · on March 1, 2020

That may be so. I was responding to the claim that "python's implementation of concurrency and multi threading is fantastic". Under such circumstances I'd expect to be able to do multi-threaded compute.

hguant · on Feb 29, 2020

TL;DR - implementing parallel code in Python means using multiprocessing as opposed to multithreading, but it depends on if you're CPU bound or I/O bound.

The GIL prevents more than one instance of the python interpreter from running _per processor_. This effectively means you can only have one Python thread per processor running at a time, which is frustrating because that kinda defeats the point of threading. However, most python programs aren't CPU bound, but I/O bound. For those programs, using the python multithreading abstraction is fine; CPUs are fast enough to do context switches while one thread waits for data, etc. For CPU bound tasks, the solution is to write using the multiprocess module; each process spun up has its own interpreter and its own GIL. Things have changed somewhat with Python 3, where there are some more robust internal scheduling tools inside the multithreading module, but the rule of thumb still stays the same.

kabacha · on Feb 29, 2020

I've been digging through asyncio for few weeks now and I actually really didn't like this article.

The thing that made me finally click with asyncio was simple explanation that coroutines are "pausable functions" and few hello world/sleep examples. While this article goes into servers, threading and all sort of overly complex and long explanations.

For some people this might be more approachable but I don't see anything "hitchiker's guide" about this article in particular.

clarry · on Feb 29, 2020

All these intros to asynchronous programming fail to address the most interesting (and arguably most important yet also most difficult) case, which is asynchronicity on a modern multi-core server. Instead, threads and event loops are presented as mutually exclusive strategies.. naively using one thread for every connection doesn't scale, and naively using event loops means you're stuck running it all on one core, which doesn't scale.

RossBencina · on Feb 29, 2020

I guess that's because they are "intros," and often multi-core event loops are implemented by the language runtime or some threadpool library. This is a good watch:

Dmitry Vyukov — Go scheduler: Implementing language with lightweight concurrency (Oct 14, 2019)

https://www.youtube.com/watch?v=-K11rY57K7k

jojo14 · on Feb 29, 2020

I for one have always thought that keeping things synchronous based on select() is a better intellectual discipline and less error prone. Only in few cases you are forced to use asynchronous programming. However that's not why I comment here.

I just want to inform that "yield from" and "@coroutine" are now deprecated. So the article needs a bit of an update:

Note: Support for generator-based coroutines is deprecated and is scheduled for removal in Python 3.10.

References: - https://docs.python.org/3/library/asyncio-task.html#generato... - https://docs.python.org/3/whatsnew/3.7.html - https://bugs.python.org/issue36921

crazyguitar · on Feb 29, 2020

Yeah! I know that @coroutine and yield from are deprecated. This article focuses on how coroutines cooperate with event loops in Python. You remind me that I should add a warning to inform your information. Thanks.

nurettin · on Feb 29, 2020

what if we yield from an async function without the @coroutine decorator? That is also an AsyncGenerator. Or is the scope of deprecation solely concerned with the @coroutine decorator?

crazyguitar · on Feb 29, 2020

I think the syntax, `yield from`, and `@coroutine` are two things. `async def` + `yield from` means we delegate generator to another generator. Therefore, in the async function, using `yield from` is equal to declare an asynchronous generator function.

However, using `@coroutine` + `yield from` means we transform a generator into a generator coroutine. Because a generator is a form of coroutine, in Python 3.4, `@coroutine` turns a function or a future into a generator function. Note that if a function is a generator function, `@coroutine` does not do anything. Based on the document, Python recommends using `async def` instead of `@coroutine` to declare a coroutine because `@coroutine` will be removed in Python 3.10.

nurettin · on Feb 29, 2020

we can transform async generators into coroutines (async functions) by creating a new async function that simply starts iterating over the async generators, so every name is overloaded which makes communication kinda hard. I will just assume the happy case of @coroutine getting a downgrade. Don't use it anyway.

vips7L · on Feb 29, 2020

I wish more languages took the Go/Zig approach for async/await and didn't introduce colored functions [0] that pollute your whole scope.

[0] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...

32gbsd · on Feb 29, 2020

> Python introduced a concept, async/await, to help developers write understandable code with high performance is it actually better or is it just a new kind of threading/loop? being understandable is second fiddle.

gigatexal · on Feb 29, 2020

I think C# had it long before python did...

crazyguitar · on Feb 29, 2020

async/await is a better design pattern without considering low-level APIs such as epoll. It is not a new kind of threading/loop. Python provides a user-level scheduler for developers, so they don't need to implement their scheduler from scratch. In my opinion, the reason why using an event loop can acquire better performance is to decrease the number of times to lock some critical sections. Also, this pattern can increase the cache hit and mitigate CPU context switch frequency.

signa11 · on Feb 29, 2020

fwiw, dave beazley writings on similar (and everything else as well) are excellent.