Welcome!

Industrial IoT Authors: Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog, Liz McMillan

Related Topics: Open Source Cloud, Java IoT, Industrial IoT, Microsoft Cloud, Machine Learning , Python

Open Source Cloud: Article

Python and gevent

Let’s talk about event loops

The easiest way to make your code run faster is to do less. At some point, though, you don’t want to do less. Maybe you want to do more, without it being any slower. Maybe you want to make what you have fast, without cutting out any of the work. What then? In this enlightened age, the answer is easy — parallelize it! Threads are always a good choice, but without careful consideration, it’s easy to create all manner of strange race conditions or out-of-order data access. So today, let’s talk about a different route: event loops.

Event whats?
If you’re not familiar with evented code, it’s a way to parallelize execution, similar to threading or multiprocessing. Unlike threads, though, evented code is typically cooperative — each execution path must voluntarily give up control. Each of these execution units actually runs in serial, and when finished, returns control to the main loop. The parallelization gain comes from cleverly dividing the work so that when they make a blocking call (i.e. a DB call, HTTP request, or disk access), they give up control, letting the main event loop run other functions and wait for the call to return. This is perfect for cases where you do a lot of I/O and relatively little work in the evented thread itself.

In my case, I’m interested in doing a number of heterogeneous, but related, data lookups in response to a web request. We’re running all of this behind our data access layer, which is an evented Thrift server. I have a number of functions (with a common API) I’m interested in running, and a naive implementation would look like this:

Python-Gevent-1

What does that do?

If we run this on a machine with TraceView installed, we’ll see the following request structure:

Pretty predictable. We called into each function serially, which is exactly what we said we’d do. We can also look at the raw events TraceView collected, and they tell a similar story:

All together now!

This seems like a good baseline, but let’s see what happens when we parallelize it. Let’s use Python’s gevent, which has two major selling points. First, it implements an event loop based on libevent, which means we won’t have to worry about actually implementing the event loop. We can just spawn separate greenlet (i.e., non-native) tasks, and gevent will handle all the scheduling. The other big advantage is that gevent knows about and can monkeypatch existing synchronous libraries to cede control when they block. This means that outside of the actual event spawning, we can leave our existing code untouched, and our external calls will just do the right thing.

That just leaves the questions of how to break up the work into parallel coroutines. It seems natural to give each of these functions their own, and then have our “main thread” wait for them all the finish. We do this by firing off each function in a separate task and collecting them in a list. We then wait for all of those tasks to finish and collect the results. Easy! Here’s what the same function as above looks like, but evented:

Python-Gevent-2

This calling change is all* that’s necessary to parallelize these functions! The next question is, did that help? Let’s look at the same graphs we had before, but now for the evented case:

Definitely different! Instead of running everything sequentially, we can see all seven functions running at the same time. As we’d hoped, this has a major impact on our total response time, as well. It’s 500ms faster — a speedup of 2x!

*Caveats
OK, so it’s not as perfectly simple as this example. There are a few “gotchas” that are worth bearing in mind when you start to use this in a real example.

This first one is that gevent mimics separate threads for each coroutine. This means that if you’re storing global state that’s thread-aware, gevent may discard it. Notably, Pylons/Pyramid uses thread-safe object proxies to store global request state, which means that new coroutines will hide that information from you. In our production version of this code, we explicitly pass that state from caller to callee, then set it in the global “pylons.request” object before running the function. It lets us seemlessly mix evented and non-evented functions, while only thinking about the details of gevent in one place.

The second big gotcha is error handling. Since these aren’t normal function calls, exceptions don’t propagate to the caller. They must be explicitly checked for on the task and re-thrown, if appropriate. This sort of error-case-checking is familiar to any C programmer, but it’s different than the normal Python idiom, so it’s worth thinking about.

Another caveat is that spawning multiple events doesn’t actually get you code-level parallelization. It runs blocking calls in parallel, but you still only get one interpreter thread to run your Python (no magic GIL sidestep here!). If you’re looking to speed up heavy computations or other CPU-intensive work, check out the multiprocessing module. Eventing really shines when the majority of your work is database calls, file access, or other blocking, out-of-process work.

Finally, if you’re looking to trace these kinds of calls with TraceView (like we did here), it’s pretty straightforward. The only thing to remember is to wrap your evented function calls using “oboe.log_method”, and pass “entry_kvs={‘Async’: True}”. The ensures that we calculate the timing information properly for all your parallel work.

But that’s it! You can use this technique to speed up existing projects, or build something an entirely new with gevent at the core. What are you planning on doing with it?

And, of course, if you want to instrument your evented project now, sign up for a 14-day trial of TraceView today!

Related Articles

More Stories By TR Jordan

A veteran of MIT’s Lincoln Labs, TR is a reformed physicist and full-stack hacker – for some limited definition of full stack. After a few years as Software Development Lead with Thermopylae Science and Techology, he left to join Tracelytics as its first engineer. Following Tracelytics merger with AppNeta, TR was tapped to run all of its developer and market evangelism efforts. TR still harbors a not-so-secret love for Matlab-esque graphs and half-baked statistics, as well as elegant and highly-performant code. Read more of his articles at www.appneta.com/blog or visit www.appneta.com.

IoT & Smart Cities Stories
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...