Motivation

I need some processing for json files in a server. Those files are not small, but also not to big (in the order of tens of megs).

The tool is not for production use, so the first approach I thought was to make something dirty, like PUTing the file into cloud storage, and create a lambda to perform the operation.

But at this moment it only supports JavaScript for those functions (OK, Node.js). You can set up notifications to build a pipeline using Google Cloud Pub-Sub service, but that is too much work for the job.

So, it looks like is a good opportunity to play, and know a little bit more about Python's AIOHttp library.

Using the official docs

At first sight, something that catches my attention is the fact that to declare a route, the function has the HTTP verb to use:

app = web.Application()
app.router.add_get('/path', handler_func)

but we can use the more generic method, and using '*' as HTTP verb:

app.router.add_route('*', '/path', handler_func)

Run the server from command line:

python -m aiohttp.web -H 0.0.0.0 -P 8080 datafetch.api.server.endpoints:run_service --sqlite=sample.db

We can run the server easily, but a way to 'hot reload' the code would be nice (specially when toying with it). So we have aiohttp-devtools package, that gives us exactly that.

pip install aiohttp-devtools

but we have to create an special function that returns the created app, and the signature does not allow access to extra arguments

def create_app(loop):
    app = web.Application()
    app.router.add_get('/', ping)
    app.router.add_post('/update_tile', update_tile)
    return app

There is also the option of using Gunicorn, with the --reload option.

curl -X POST -d @temp/deleteme.json http://localhost:8080/update_tile

Once we have the basic request working, we can directly use the reference to

Given the size of the JSON we are posting, we can directly read all it into memory :

async def update_tile(request):
    print('received request')
    content = await request.content.read()
    content = str(content, 'utf-8')

Going the dirty way

For each request we should create a db session / connection that could be rolled back like in most frameworks do (or, better said, provide a connection from a DB connection pool). However, given this is only an small tool, I'm going the dirty way and I will share a single connection to the db that I would reuse for each request :/.

We can store "global vars" (yeah! that's ugly), into the app directly, like a dictionary:

    app['session'] = create_session()

(Here we should put a reference to a function that creates the session instead of directly put the created session).

The same can be done for a request. We can hold that for that 'context' in using the dictionary syntax. So the clean way to do it would be :

def setup_app(args):
    db_session_factory = create_db_session_factory_from_args(args)
    app = web.Application()
    app['db_session_factory'] = db_session_factory

def handler_func(request):
    db_session_factory = request.app['db_session_factory']
    db_session = db_session_factory()
    request['db_session'] = db_session
    call_whatever_function(request)

Actually the handler function should be more like a decorator to wrap all other handler functions, this way we would provide DB access per request. But, for that we have aiohttp web Middlewares

Deploying the app

We can run the server from the command line:


There is an official documentation about aiohttp deployment

Testing on a local environment

Creating a VMs

We can actually use Vagrant to spin up a new VM, however, I have a reference VM that I clone to create new instances. It only requires to have the proper public key for the root user to login through ssh.

This is the script:

export SRCNAME=debian9base
export SRCIP=192.168.56.180
export DSTIP=192.168.56.70

if [ $# -gt 0 ]
then
    export DSTNAME=$1
fi

if [ $# -gt 1 ]
then
    export DSTIP=$2
fi

echo "Creating $DSTNAME from $SRCNAME at $DSTIP"

VBoxManage clonevm $SRCNAME --mode machine --name $DSTNAME --register
VBoxManage startvm --type headless $DSTNAME

sleep 10

ssh -T -l root $SRCIP bash -c "'
echo $DSTNAME | cat > /etc/hostname
sed s/$SRCIP/$DSTIP/ /etc/network/interfaces > tst
cp tst /etc/network/interfaces
rm tst
shutdown -h now
'"

echo "Created VM $1 at $2"
  • cloning a virtual machine snapshot: VBoxManage clonevm $SRCNAME --mode machine --name $DSTNAME --groups /AIOHTTP --register
  • then we start the VM: VBoxManage startvm --type headless $DSTNAME
  • since it is a clone of the previous one, we must log in and change its hostname, and static IP. We could create an Ansible Playbook, but it is too much job for this simple 'setup' of our clean install machine

  • to delete the VMs: VBoxManage unregistervm AIOHTTPServer --delete

The add a friendly name to your local /etc/hosts file

192.168.56.90   server.aiohttp.lh
192.168.56.91   client.aiohttp.lh

Deploying a client process

We could use a cron process to run every X time the script, to make some job, but instead of that I prefer to run a process that keeps his own schedule.

In order to be sure that is running, I'm going to use supervisord

References


Finding image duplicates

Tue 23 May 2017 by David Hontecillas

Finding image duplicates when they come from different sources have some details that must taken into account, and differs a lot from finding file duplicates.

read more

Moving out code

Sun 30 October 2016 by David Hontecillas

Creating modules from reusable units of code

read more

Bot subscription process and Celery tasks

Thu 15 September 2016 by David Hontecillas

Playing with the FB bot api to notify users about events.

read more

First steps with the Facebook Messenger Bot API with Python

Sun 28 August 2016 by David Hontecillas

Playing with the FB bot api to notify users about events.

read more

Setting up Pelican Blog

Mon 15 August 2016 by David Hontecillas

Setting up a pelican site generator blog for my personal web page, with ipython integration.

read more