Scaling Web Applications with AWS

Simeon Franklin

Feb 2013

Getting Started

About the Instructor

@simeonfranklin / http://simeonfranklin.com/
Marakana Instructor (Python, Javascript, Django, jQuery)
Long time web application developer
Lots of other languages and technologies (webapps in Perl. ASP. C, PHP, Javascript)
Large web apps for various values of large: terabyte datasets, high traffic, complex data models, large codebases

Class Objectives

In this class we will explore:

common performance limitations encounted in scaling web applications
a particular web application (Python/Flask)
a particular set of solutions (AWS)
emphasizing general concepts/patterns that are broadly applicable

By the end of this class you will:

understand the difference between performance and scaling
understand common bottlenecks in scaling web applications
understand how to talk about and measure capacity limits
understand typical cloud based solutions to scaling problems

Due to the short nature of the class we won’t cover in detail:

HTTP, HTML, CSS/Javascript
Python (the language of implementation for our sample web app)
Details of using AWS services, API’s, and tools

The Plan

I will present:

Lots of theoretical material to cover. Interspersed with hands-on measurements, interactive API
explorations, quiz questions in short essay form.

I hope you will:

Take the time to play with the examples. Try to answer the quiz questions. No grading - the
important thing is being able to reason about the environment and concepts.

Please interrupt! Ask questions.

Resources

Most of the work we’ll be doing will be on remote computers - you’ll need to have an SSH client (Putty on windows, standard openssh client on Mac/Linux) and basic familiarity with a command line environment.

None of the following resources are necessary for the class - but if you’re interested in learning more about AWS:

start at http://aws.amazon.com/documentation/
Boto is a python library to access/use a variety of AWS. You can find out more at https://boto.readthedocs.org/en/latest/
I really appreciate the new unified AWS cli app to control various servies http://aws.amazon.com/cli/

Sample Web Application

Lesson Objectives

review a simple web application written in Python/Flask
understand the way the typical parts of many web applications interact
url routing to pages
HTML generation and serving associated resources
data layer
business logic
deploy our web application to a typical single-server LAMP stack

InstaClone

See http://github.com/simeonf/instaclone for the source

http://instaclone.simeonfranklin.com/

Upload jpeg’s, apply a filter from a dropdown to get a new image.

Let’s try it out.

Now that we’ve explored what our web application does let’s take a look at the major components of this or any other web application.

URL Routing

The process of mapping the path portion of a url to a specific part of our code.

How do we know what to do when the user clicks on a link like /images/1?

leave it up to the web server (eg Apache maps urls to .php files)
delegate paths we care about to our application

webapp.py

@app.route('/') # 
def front_page():
    db = util.get_db()
    cur = db.execute("""select * from instaclone_images
                               order by id desc limit 5""")
    rows = cur.fetchall()
    return render_template('front_page.html', rows=rows, front_page=True)

@app.route is a decorator provided by our framework that ties a particular path to a particular python function.

Pages

Most web applications are at least partially "page" oriented due to the nature of the web.

The page is what I get back when I go to a particular url. A typical page is composed of HTML
containing our application's information and associated UI resources like Javascript, CSS and
media like images or movies.

Asking for a url in our browser generates a cascade of requests.

The python function front_page corresponds to the "/" url. It looks like this:

webapp.py

@app.route('/')
def front_page():
    db = util.get_db()
    cur = db.execute("""select * from instaclone_images
                               order by id desc limit 5""")
    rows = cur.fetchall() # 
    return render_template('front_page.html', rows=rows, front_page=True) #

	Fetches dynamic data from the database
	Uses a template to render the data into an HTML page

HTML Generation

Most web applications produce HTML. (Some might produce only structured data like XML or JSON)
Most web application frameworks use templating libraries

Syntax may vary between templating systems but most are essentially static content with placeholder substitution. Templates are code too!

front_page.html

{% extends "base.html" %} 
{% block main %}
<script src="/static/jquery.cycle2.min.js"></script>
<div class="cycle-slideshow" style="width:80%">
  {% for row in rows %} 
    <a href="/image/{{ row.id }}"><img src="{{ row|image_path }}" data-title="{{ row.name }}" alt=""></a>
  {% endfor %}
</div>

<p>Add beautiful effects to your pictures and share them with friends...</p>
{% endblock %}

	template tag that uses inheritance to link child template to parent
	template tag that does repeated substitution and generation

Data Layer

Many web applications use a relational database to store at least some of their data.

This is so common you may see LAMP - Linux/Apache/MySql/P(erl|ython|HP) - assumed to be the default stack upon which to write web applications.

MySQL/PostgreSQL/Oracle, etc Separate program that is solid, mature,
performant, multi-user, comes with Structured Query Language support

Let’s look at our function again:

webapp.py

@app.route('/')
def front_page():
    db = util.get_db()
    cur = db.execute("""select * from instaclone_images
                               order by id desc limit 5""") # 
    rows = cur.fetchall() # 
    return render_template('front_page.html', rows=rows, front_page=True) #

	select * from ... is an sql query - code the database executes
	rows will be basically an array of arrays
	and we pass data to the template to render to HTML

Relational databases: oriented around tables, columns, and rows. Succint language for expressing concepts about entities made up of structured data and relationships between the entities.

schema.sql

CREATE TABLE "instaclone_likes" (
       "id" INTEGER PRIMARY KEY AUTOINCREMENT,
       "image" integer REFERENCES "instaclone_images" ("id"),
       "dt" TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE "instaclone_images" (
       "id" INTEGER PRIMARY KEY AUTOINCREMENT,
       "parent_id" integer REFERENCES "instaclone_images" ("id"),
       "name" varchar(255) NOT NULL,
       "image" varchar(255) NOT NULL,
       "dt" TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Business Logic

All this is peripheral in some ways to our web application.
We do need to generate html, store data, manage urls, etc.
None of this is our application: applying filters to images
uses PIL (Python Imaging Library) to apply transforms to images
Nothing specifically "web" about this

What we do is apply filters to images.

filters.py

def putpalette(image, palette):
    # convert to grayscale
    orig_mode = image.mode
    if orig_mode != "L":
        image = image.convert("L")
    # apply palette - adjusting balance of colors in image
    image.putpalette(palette)
    # convert back to its original mode
    if orig_mode != "L":
        image = image.convert(orig_mode)
    return image

Lab: getting set up on EC2

For this class I’ve already installed the web application on LAMP boxes that are actually virtual machines running on Amazon’s hosted infrastructure.

EC2 - Elastic Compute Cloud - allows us to create virtual computers and run them on Amazon’s infrastructure.

Each student will have their own machine to work on. For convenience all instances have the same security credentials. Please see http://simeonfranklin.com/scalability/ to download the public key and sign up to reserve your machine.

You can now ssh to your box with:

$ ssh -i templatekey.pem ec2-user@ec2-54-241-193-255.us-west-1.compute.amazonaws.com # but use your machine name!

Your machine has already been set up by running the deployment script against it.

Go ahead and explore the web application (upload a jpg, apply a filter to an existing image, etc). Log in to your box via ssh and take a minute to explore the code in /var/www/instaclone.

Sample Web Application Follow-up

In this section you saw a high level overview of some of the common functionality many web applications provide. You also got to explore the sample web application and see some of the code responsible for generating it. We’re now ready to consider the question of scaling our web application.

Bottlenecks in Scaling Web Applications

Lesson Objectives

At the conclusion of this lesson you will:

Understand the difference between the ideas of performance and scalability
Review the areas in which we can bump into limits when scaling our application

Performance/Scalability

Definitions can be complicated.

the ability to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth

Speed.

Web Application Performance

Is the result of many complex factors from network latency to HTML and Javascript structure. It implies questions like:

What is the total amount of time between the start of the request
and application readiness?

How soon after hitting refresh can I use the application?

Web Application Scalability

A scalable web application might be slower than a non-scalable one.

Scalability questions are about capacity. Consider our very simple sample application:

What if I had 10 simultaneous requests? 100? 1000?
What if I had 1,000 images? 100,000? 1 billion?
What if an image had 1,000 likes? What if every image had 10,000 likes.

Possible areas of difficulty

You can figure out some of the limits of our sample application. We’re currently running on an AWS Micro instance with:

8 Gb hard drive space
613 MB memory
High IO latency
Poor sustained CPU performance

We can easily scale vertically thanks to the ability of AWS EC2 to move to more capable virtual hardware. We could get more disk space, more memory, better IO latency and a beefier CPU. At some point, however, we’ll run into limits. Let’s calculate some of those limits on our current virtual hardware.

Quiz

Some of the limits for a particular application and environment are obvious and easy to (roughly) calculate:

If we limit the largest image upload to 1Mb what would be the (approximate) minimum upper bound on the number of images we can host.
If a process running our application takes approximately 40Mb of memory to serve a single request, each process can handle one request at a time, and our OS leaves us about 300Mb of memory, approximately how many simultaneous requests can we serve?
If our maximum network throughput is limited to 10Mb/s and an average response returns approximately 1Mb over 2 seconds, approximately how many simultaneous requests can we serve?
More genererally: How do performance and scalability differ? How are they related? Describe a scenario in which slow performance could cause an application to exceed its scalability limits. (Hint: consider an interaction between latency and maximum number of requests.)

Common Scalability Problems and solutions

Fortunately we don’t have to figure out all the possible bottlenecks and come up with solutions ourselves. Web applications tend to hit the same bottlenecks and the solutions are common enough to describe generally even if the implementation for any given application may be specific.

We’ll look in more detail at the following common areas of that may limit the scalability of our application. In reviewing each area we’ll discuss several strategies for improving the scalability of our application.

database limits
Serving static resources
CPU and memory bound web framework code
CPU/Memory/IO bound application code

Bottlenecks Follow-up

In this section you learned about the difference between performance and scalability. You also saw a high level overview of some of the common limits web applications may run into when trying to scale to larger amounts of data or users.

Scaling the Database Layer

Lesson Objectives

At the conclusion of this lesson you will:

Understand scalability limits common to relational databases
Understand a variety of ways to help relational databases scale
Understand a variety of alternatives with better scaling characteristics that can replace or augment relational databases

What Limits Relational Database Scalability?

The fundamental features of relational database systems.

based on set-style logic.
tables, columns, rows
Structured Query Language

ACID

A: Atomic - transactions suceed completely or fail completely
C: Consistent - transactions bring the database from one consistent state to another
I: Isolation - transactions are isolated so that simultaneous transactions must have the same affect as though executed serially
D: Durability - once a transaction is committed it will remain so even in the event of power loss, system crash, etc

(see http://en.wikipedia.org/wiki/ACID for more details)

> These properties require that writes to the datastore will have to go back onto disk (a durable datastore) even if all the data can fit in memory.

Disk latency is likely to be a limiting factor in both the performance and scalability of databases.

Irrespective of the performance characteristics of individual querie (how long does it take to execute a single query) there are also scalability limitations frequently expressed as a maximum # of read and write transactions/sec.

Example of performance limits on beefy hardware

Source: http://locklessinc.com/articles/mysql_performance/

Additionally different databases may have absolute and relative limits on the size of datasets

absolute file size limits base on OS (FAT32 limited to < 4GB files)
MyISAM storage engine on MySQL has a limit of approximately 4 billion rows/table.
Relational databases maintain indexes (pre-computed lookup structures) and query caches to maintain go performance on reads. Larger datasets may not be able to fit useful caches or complete indexes in memory.
And larger datasets make writing gradually more expensive as database indexes to be maintained also grow larger

How can we overcome these limits?

Typical strategies for scaling the database layer include:

Master/Slave or data replication database configuration

a main database handles all writes
data is replicated (perhaps asyncronously) to subsidiary databases
all reads are directed to the subsidiary databases
scaling to handle additional reads means adding additional read databases

There are drawbacks!

not much use if reads do not outweigh writes
asyncronous data replication is fast but may mean inconsistent reads
syncronous writes usually have scaling/performance limitations
typically much more complicated configuration and management

Database Sharding

partition data between multiple database servers
enforce limits on the number of records per database
scale by adding additional servers
may be manual process or supported by database server

There are drawbacks!

adding database routing to our application may decrease performance and add complexity
managing sharding may become complicated - all related records must live in the same partition or joins are impossible

Maybe we shouldn’t use (just) Relational Databases!

In response to these performance limitations many alternative data stores can be used.

Typically you give up ACID attributes or the ability to reason relationally in exchange for different performance and scaling characteristics.

in-memory only key value stores (memcached)
eventually-durable key-value stores (Redis)
eventually consistent distributed stores (Cassandra)
stores designed for sharding of data and scaling of computation (MongoDB)

And many others generally lumped together as NoSql datastores.

Not necessarily a replacement, but perhaps an augmentation to the relational database as a data store. Typical patterns:

Most data in an RDBMS but some data (typically large/growing non-relational datasets) stored in a distributed keyvalue store instead (eg storing notification messages or logging info).
All data in an RDBMS but (expensively calculated non-critical data like aggregation results) cached in volatile in-memory datastore.

Quiz

Using a cache to store the results of expensively calculated for future lookup sounds like a performance optimisation. Explain how caching long running queries may enahnce the scalability of your application (remember the interplay between latency and throughput).
Login to your EC2 instance. The mysqlslap tool can be used to stress test a MySql database. See http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html for documentation but try:
```
$ mysqlslap --create-schema="instaclone"
            --query="select * from instaclone_images order by dt DESC"
            --concurrency=100 --number-of-queries=10000
```
What does the output tell you about the performance abilities of our micro instances? Can you see any effect when ramping up concurrency?
Consider our sample application with minimal data. What techniques could we use to make sure our application scales? Consider at least two techniques and cite at least one drawback for any change we might make.

Database Follow-up

In this section you learned about some of the specific issues associated with scaling the database layer. You also learned about some techniques commonly used to scale the database layer. While thoroughly understanding all the options available for dynamic data storage is beyond the scope of this class you should at least understand some of the desirable performance and scaling characteristics available in non-relational data stores as well as some of the potential drawbacks.

Scaling Static Resources

Lesson Objectives

In this lesson we will:

review the potential scalability limitations associated with static media
discuss possible approaches to scaling our static resources

Static Resource Scalability Limits

We’ve already discussed the most obvious limit associated with static resources:

dynamic/static resources (eg media dynamically specified)
maximum number of records == Total Storage / Avg size of Media

Seems easy to scale vertically.

Our 8Gb instance might only be able to store ~14,000 images if the average image "weighs" about 1/2Mb and we have most of the disk to allocate to media storage.

But it’s trivial nowadays to get 500Gb or 1Tb of storage. That would take us up to 1-2 million images which seems like a lot.

imgur gets ~24 million new images each month as of late 2012 (see http://en.wikipedia.org/wiki/Imgur)! Even simple storage might be a problem.

Serving Static Resources

We might also run into limitations (including some suprises) in the process of serving static resources.

Assuming a 100Mb/s network connection how many images per second can we serve?

Simple math suggests we could serve 100 1Mb images in a single second, saturating our network connection.

Of course the answer ends up depending on client bandwidth as well as server bandwidth - we can’t stream files faster than the client can download. If clients can support on average 300Kb/s throughput we might be able to allow 300 simultaneous connections that all take 3 seconds to stream a 1Mb image, still arriving at 100 1Mb images in a single second.

However: each transfer requires a separate web server process or thread and each process or thread may take 60Mb of memory… We may not have enough memory (18Gb) to create 300 processes.

Common Solutions

Fortunately there are relatively simple steps we can take to solve the problem of scaling static assets.

separate web servers for dynamic and static content may mean we can afford more of the numerous static http processes (alternative lightweight httpd like nginx, lighttpd, etc)
physically separate web servers (a dedicated media server) help us vertically scale
adding additional media servers (partioning/sharding our media or using SAN) helps us scale horizontally

Drawbacks

There are always tradeoffs! (But I’ll let you figure them out yourself.) And there are performance complexities not yet addressed - latencies based on geographical location, cache management, etc.

Why not let somebody else handle this?

A content delivery network or content distribution network (CDN) is a large distributed system of servers deployed in multiple data centers in the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance.

-- http://en.wikipedia.org/wiki/Content_delivery_network

The most common solution to scaling static media is to outsource it to someone else who specializes in scalable data storage and service. If not a full blown CDN at least an http accessible datastore.

Akamai (CDN)
Amazon CloudFront (CDN)
Amazon S3 (http accessible datastore)

Quiz

Can you describe some potential drawbacks to common approaches to scaling media? What might be difficult about sharding media servers?
Using cloud infracture avoids some drawbacks - but might introduce others. Can you think of any cases where using a remote CDN or network file store might increase our scalability but decrease our performance?

Static Resources Follow-up

In this section you learned about some of the potential bottlenecks associated with scaling static resources - both obvious limitations like total disk space and less straightforward limitations like the potential interactions between network speed and memory usage. You also learned about some of the techniques used to alleviate scaling issues surrounding static resources.

Scaling the "Web" part of our application

Lesson Objectives

Every web application must, at a minimum, talk HTTP. Most applications will also perform routine tasks not directly related to our application’s business logic. Web stuff - like generating html.

In this lesson we will learn:

how to use HTTP to affect both our performance and our scalability.
possible approaches to limiting the number of requests required for our application
possible approaches for improving the performance of individual requests and how to scale these approaches
how to scale the number of concurrent requests we can handle

Web applications are all the same

No really! If it’s web, it supports HTTP!

Understanding HTTP can have serious impacts on both performance and scalability.

Web applications are mostly the same

Most web applications are going to generate HTML to present the application specific information.

This can be a CPU and memory intensive process.

The answer to performance problems is always

reduce requests
add caching
sometimes doing one accomplishes the other!

Caching

Web applications frequently can profit from caching added at the boundaries between layers.

HTTP Cache directives inform client caching

Expires and Cache-Control HTTP headers can tell the client (eg web browser) that the previously downloaded version is still valid.
If supplied, the Last-Modified and ETag http headers allow the client to do a conditional GET on expired resources, passing back to the server information about the version of the content the client has
A typical web application might have dynamic content that cannot be canched but also many resources that do not change or change infrequently

https://developers.google.com/speed/articles/caching

Wait a minute! I thought we were talking about scaling, not performance!

This is a performance optimization - but reducing unnecessary requests means load will increase more slowly as the number of concurrent clients increases.
Additionally we can scale read-heavy applications by putting a cache server in front our web server, using HTTP headers to communicate to the cache server.
In this case many consecutive reads from unrelated clients would result in a single request to the application with additional requests served by the cache server instead.
Further scaling can be achieved by adding additional cache servers behind a load balancer

See https://www.varnish-cache.org/ or http://www.squid-cache.org/

Client/Server key-value cache stores

The most used technique to improve the performance of web application code is to cache the results of expensive operations.

A distributed in-memory hash table server

Key/value oriented with simple operations: set, get, del

Look at all your layers for places where you need finer caching granularity than "the whole page". We could:

cache the results of expensive database queries
cache the results of html generation for a given data set
cache data read from disk
memoization of expensive functions

Wait a minute! I thought we were talking about scaling, not performance!

True, but we may need to scale our performance increases!

Obviously we can only continue to populate the cache while we have available memory in which to store data.

Memcache is a distributed system which can easily be set up to shard its data store across multiple machines. Adding additional "memory" to your cache can be easily accomplished by adding additional memcached servers.

Here at Facebook, we’re likely the world’s largest user of memcached. […] We use more than 800 servers supplying over 28 terabytes of memory to our users.

-- https://www.facebook.com/note.php?note_id=39391378919

Spreading requests across multiple servers

No matter how much caching we add, it is possible to saturate our web server with too many requests.

Source: http://wiki.dreamhost.com/File:Webserver_requests_graph.jpg

At some point we get too many requests per second to handle. The actual numbers aren’t important:

some web servers may be able to service 50,000 r/sec for small static files
but complex applications might only be able to handle a few hundred req/s
you can imagine writing an intensive enough application to max out your CPU at 1 req/s

Sometimes the only possible answer is to add additional servers. This can be done by adding load balancing. This could be implemented with a load balanced setup

Load balancers may be implemented by:

using dns to map a single name (www.example.com) to multiple IP addresses
using a load balancing server (Apache with mod_proxy_balancer, varnish) that relays all requests to a pool of web servers
handling more load means adding more servers

Be careful about your architecture! With multiple web servers local data may differ. But shared resources frequently form a scaling bottleneck - enough so that a frequent design pattern in building scalable web applications is to use a Shared Nothing Architecture.

Quiz

Explain the difference between http caching servers like varnish or squid and key/value servers like memcached. Why can’t we just use http caching servers? What sort of pages can’t be cached?
A frequently used technique to make static resources as cache-able as possible is to use "far-future" expires headers. So for my main style.css file I might make sure a header like
```
Expires: Sat, 1 Feb 2014 23:00:00 GMT
```
is issued. Are there any drawbacks to this policy? When will the client download a new version of my style.css?
How many requests/second does our http server on ec2 support for static files? How about for the front page of our application? Use the command line tool ab on our ec2 instance to test the performance of Apache. Try ab -h for help but
```
$ ab -n 1000 -c 5 http://localhost/
```
is the sort of thing we want to do. Try ramping up concurrency. What effects do you see on static files? How about against our application?

Web Requests Follow-up

In this lesson we reviewed some features of HTTP that allow us to implement scaling someplace other than our web application server. We saw that HTTP caching servers or memcache servers allow us to spread the network traffic or the memory usage across multiple servers. Sometimes, however, we are forced to scale our web application server directly and we reviewed using a load balancer to distribute an application across multiple web servers.

Scaling Our Application

Lesson Objectives

Every non-trivial application will eventually hit a performance bottleneck. The bottleneck might be CPU, IO, or memory availability but the solutions are typically the same. In this lesson we’ll explore the architecture of typical distributed task systems used by web applications.

Finding our limitations

Finding the bottleneck that limits the ability of our application is a matter of applying typical performance analysis techniques.

You may or may not already be familiar with profiling tools for investigating the performance of conventional applications - and teaching the usage of sophisticated profiling tools like Valgrind or Dtrace is definitely beyond the scope of the class.

Ideally you should be able to run the performance-critical parts of your application outside of a "web" context. We can frequently figure out the limiting factor in the performance of our application through simple measurements and some reasoning. Additionally we may be able to solve several different classes of performance bottlenecks by using the same basic techniques.

CPU Bound

It’s possible that our application is CPU-bound. This simply means that the factor limiting our performance is the CPU. Typically applications that are CPU bound do complicated computation on relatively small data sets.

We can roughly test our application performance by running a performance monitoring tool like top while simultaneously running our application continuously. If our application is CPU bound we should be capable of maxing out our CPU utilization.

Note: Look for telltales (like maxing out 1/4 of your total CPU utilization) that hint you could reduce your CPU bottleneck by increasing the concurrency of your application.

Memory Bound

It’s possible that the limiting factor in our performance is the amount of available memory. If our application isn’t maxing out the CPU this implies we should be able to increase performance by increasing concurrency. However if our program consumes all the available memory we are incapable of increasing concurrency.

Note: Of course modern systems don’t actually run out of memory - they can use swap space to emulate memory. Of course disk IO is orders of magnitude slower than memory, so we want to make sure we can stay in RAM. Use top to evaluate your program’s memory usage.

IO Bound

It’s possible that our application is IO bound. In, fact, for scalable web applications (which are likely to depend heavily on networked services) its very likely. It’s also possible to max out disk throughput if your application is reading/writing large data sets or many small

It can can be more difficult to measure our IO performance. One huge clue is the absence of other bottlenecks. If you run your application continously and are not maxing out either available memory or CPU utilization, it’s likely the performance limiting factor is IO of some kind.

Removing limitations

Fortunately, we can solve many different kinds of scaling problems with the same basic architecture.

Most scalable web applications end up using a distributed task queue architecture.

The basic idea is straightforward:

the performance critical portion of our application is wrapped up into a job or task that can run in isolation from our web application
our web server(s) creates jobs in response to our web application and place the jobs on a task queue
clients (frequently called workers) remove jobs from the queue and complete them
the workers could just be additional processes/threads on the same server or could be processes on other servers
scaling our application can be accomplished by adding additional workers

Details, Details

Of course the devil is always in the details. How do we transform code into tasks that can be serialized, shipped to a different environment, and executed? How do servers and workers all communicate with the same queue? What about details like priorities on tasks, success/error notification, canceling jobs, etc?

Fortunately these are solved problems in a variety of languages. Robust task queues have become a standard add-on to many web application frameworks and there’s even a protocol (AMQP) specifically for sending/queuing messages between distributed systems. RabbitMQ is a popular AMQP server and is widely used (see http://www.amqp.org/about/examples) by many tech giants.

RabbitMQ has API implementations in many languages. In the Python world Celery is a popular distributed task queue system that can use RabbitMQ as its message broker. Celery has good support for easily creating tasks from Python functions, running worker daemons, and can use alternative messages stores as necessary.

Obviously this could get a little complicated.

But it is necessary complexity to achieve a simple objective. We are circumventing our CPU, IO, or memory bottleneck by scaling it across multiple machines.

And as we will see - by using cloud based offerings instead of rolling our own we might not have to do all the devops work necessary to build our own cluster of workers.

Quiz

Explain the difference between CPU bound, memory bound, and IO bound problems. Which category is calculating digits of PI likely to be in? How about parsing and manipulating a few multi-gigabyte XML files? What about reading and parsing 1 million separate small XML records stored on the filesystem?

How about our application? What do you guess the bottleneck will be if we apply filters to many pictures? What if the pictures are large? What if we do many requests concurrently?

Let's test out your intuition. The
time command will run a script and report how much time
it took. You can run the `filter.py` file used by our web
application as a command line program. Log into your EC2 instance
and and run

----
$ time /var/www/instaclone/ENV/bin/python filters.py input.jpg output.jpg
----

Where `input.jpg` is an existing file. To help you out let me show
you two tricks I use to bulk produce sample data and to add
concurrency to testing. I put some images in the sample-images
folder and you can use the `xargs` command to make many copies:

----
$ cd sample-images
$ find *jpg | xargs -i{} cp {} 1{}
----

Now we can use the `xargs` command to run repeatedly for each file,
timing the whole run.

----
$ time find sample-images/*jpg | xargs -I{}  /var/www/instaclone/ENV/bin/python filters.py {} {}.del
----

How many images/second can you filter? Does it help to increase
concurrency (try adding a `-P 4` to the `xargs` command)? Use the
`top` command to look at general system activity (you might have to
login in a second window). What is the bottleneck in our
application?

Scaling Our Application Follow-up

In this lesson we’ve reviewed the major kinds of performance bottlenecks we might address via scaling. We also reviewed the typical architecture required to scale performance bottlenecks that are inherent to our application. And in a future lesson we’ll see that taking advantage of a distributed task queue may not be quite as complicated as it first appears!

← →