Asynchronous HTTP Requests in Python
Feb 25, 2017
Eli Uriegas
3 minute read

Overview

Asynchronous programming is a new concept for most Python developers (or maybe it’s just me) so utilizing the new asynchronous libraries that are coming out can be difficult at least from a conceptual point of view.

The library I’ll be highlighting today is aiohttp. If you’re familiar with the popular Python library requests you can consider aiohttp as the asynchronous version of requests.

Usage is very similar to requests but the potential performance benefits are, in some cases, absolutely insane.

I’ll be taking you through an example using the NBA’s statistics API (a notoriously slow API) to show you the performance benefits of asynchronous HTTP requests.

Example Explanation

So for our example we will be sending requests to the NBA’s statistics API to gather information concerning common player statistics like points per game, rebounds per game, etc. We would also like to store this data in local json files as to do repeated analysis on them without having to re-hit the API.

If you haven’t used the NBA’s statistics API you should know that it can be extremely slow. Calls can take upwards of 5-6 seconds and collecting data from the API can be major pain.

The Non-Asynchronous Way

import requests

base_url = 'http://stats.nba.com/stats'
HEADERS = {
    'user-agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) '
                   'AppleWebKit/537.36 (KHTML, like Gecko) '
                   'Chrome/45.0.2454.101 Safari/537.36'),
}


def get_players(player_args):
    endpoint = '/commonallplayers'
    params = {'leagueid': '00', 'season': '2016-17', 'isonlycurrentseason': '1'}
    url = f'{base_url}{endpoint}'
    print('Getting all players...')
    resp = requests.get(url, headers=HEADERS, params=params)
    data = resp.json()
    player_args.extend(
        [(item[0], item[2]) for item in data['resultSets'][0]['rowSet']])


def get_player(player_id, player_name):
    endpoint = '/commonplayerinfo'
    params = {'playerid': player_id}
    url = f'{base_url}{endpoint}'
    print(f'Getting player {player_name}')
    resp = requests.get(url, headers=HEADERS, params=params)
    print(resp)
    data = resp.text
    with open(f'{player_name.replace(" ", "_")}.json', 'w') as file:
        file.write(data)


player_args = []
get_players(player_args)
for args in player_args:
    get_player(*args)

So what does it do?

The function get_players calls out to the API to gather all of the player ID’s along with their names and edits a list in-place to put them in there (I know a terrible but I wanted to keep it close to the async example).

After gathering the player ID’s and player names the program synchronously gathers player information and stores it in files with the format FIRSTNAME_LASTNAME.json.

It’s a fairly straightforward program and takes around 12 minutes of total time.

The Asynchronous Way

import asyncio
import aiofiles
import aiohttp

base_url = 'http://stats.nba.com/stats'
HEADERS = {
    'user-agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) '
                   'AppleWebKit/537.36 (KHTML, like Gecko) '
                   'Chrome/45.0.2454.101 Safari/537.36'),
}

async def get_players(player_args):
    endpoint = '/commonallplayers'
    params = {'leagueid': '00', 'season': '2016-17', 'isonlycurrentseason': '1'}
    url = f'{base_url}{endpoint}'
    print('Getting all players...')
    async with aiohttp.ClientSession() as session:
        async with session.get(url, headers=HEADERS, params=params) as resp:
            data = await resp.json()
    player_args.extend(
        [(item[0], item[2]) for item in data['resultSets'][0]['rowSet']])

async def get_player(player_id, player_name):
    endpoint = '/commonplayerinfo'
    params = {'playerid': player_id}
    url = f'{base_url}{endpoint}'
    print(f'Getting player {player_name}')
    async with aiohttp.ClientSession() as session:
        async with session.get(url, headers=HEADERS, params=params) as resp:
            print(resp)
            data = await resp.text()
    async with aiofiles.open(
            f'{player_name.replace(" ", "_")}.json', 'w') as file:
        await file.write(data)

loop = asyncio.get_event_loop()
player_args = []
loop.run_until_complete(get_players(player_args))
loop.run_until_complete(
    asyncio.gather(
        *(get_player(*args) for args in player_args)
    )
)

So what does it do?

So this, like the synchronous, version follows pretty much the same steps and provides the same output but execution time is about 22 seconds, which in my opinion is a ridiculous performance boost.

Conclusion

So 22 seconds versus 12 minutes is a very huge difference in performance and it should be noted that the outcomes of the 2 programs are exactly the same.

So while asynchronous programming may seem new and mysterious to a lot of developers out there. The performance benefits are very real and tangible right now.

Have any questions? ping me over at @_seemethere