Scaling Web Applications with AWS

Simeon Franklin

Feb 2013

Getting Started

   About the Instructor

   Class Objectives

In this class we will explore:

   Class Objectives

By the end of this class you will:

   Class Objectives

Due to the short nature of the class we won’t cover in detail:

   The Plan

I will present:

Lots of theoretical material to cover. Interspersed with hands-on measurements, interactive API
explorations, quiz questions in short essay form.

I hope you will:

Take the time to play with the examples. Try to answer the quiz questions. No grading - the
important thing is being able to reason about the environment and concepts.
Please interrupt! Ask questions.

   Resources

Most of the work we’ll be doing will be on remote computers - you’ll need to have an SSH client (Putty on windows, standard openssh client on Mac/Linux) and basic familiarity with a command line environment.

None of the following resources are necessary for the class - but if you’re interested in learning more about AWS:

Sample Web Application

   Lesson Objectives

   InstaClone

See http://github.com/simeonf/instaclone for the source

http://instaclone.simeonfranklin.com/

   InstaClone

Simple Web App

   InstaClone

Upload jpeg’s, apply a filter from a dropdown to get a new image.

images/upload.jpg
images/filter.jpg

Let’s try it out.

   InstaClone

Now that we’ve explored what our web application does let’s take a look at the major components of this or any other web application.

      URL Routing

Note The process of mapping the path portion of a url to a specific part of our code.

How do we know what to do when the user clicks on a link like /images/1?

      URL Routing

      URL Routing

webapp.py
@app.route('/') # 1
def front_page():
    db = util.get_db()
    cur = db.execute("""select * from instaclone_images
                               order by id desc limit 5""")
    rows = cur.fetchall()
    return render_template('front_page.html', rows=rows, front_page=True)
1 @app.route is a decorator provided by our framework that ties a particular path to a particular python function.

      Pages

Most web applications are at least partially "page" oriented due to the nature of the web.

The page is what I get back when I go to a particular url. A typical page is composed of HTML
containing our application's information and associated UI resources like Javascript, CSS and
media like images or movies.

      Pages

Asking for a url in our browser generates a cascade of requests.

images/firebug.jpg

      Pages

The python function front_page corresponds to the "/" url. It looks like this:

webapp.py
@app.route('/')
def front_page():
    db = util.get_db()
    cur = db.execute("""select * from instaclone_images
                               order by id desc limit 5""")
    rows = cur.fetchall() # 1
    return render_template('front_page.html', rows=rows, front_page=True) # 2
1 Fetches dynamic data from the database
2 Uses a template to render the data into an HTML page

      HTML Generation

Note Syntax may vary between templating systems but most are essentially static content with placeholder substitution. Templates are code too!

      HTML Generation

front_page.html
{% extends "base.html" %} 1
{% block main %}
<script src="/static/jquery.cycle2.min.js"></script>
<div class="cycle-slideshow" style="width:80%">
  {% for row in rows %} 2
    <a href="/image/{{ row.id }}"><img src="{{ row|image_path }}" data-title="{{ row.name }}" alt=""></a>
  {% endfor %}
</div>

<p>Add beautiful effects to your pictures and share them with friends...</p>
{% endblock %}
1 template tag that uses inheritance to link child template to parent
2 template tag that does repeated substitution and generation

      Data Layer

Many web applications use a relational database to store at least some of their data.

Note This is so common you may see LAMP - Linux/Apache/MySql/P(erl|ython|HP) - assumed to be the default stack upon which to write web applications.

      Data Layer

Let’s look at our function again:

webapp.py
@app.route('/')
def front_page():
    db = util.get_db()
    cur = db.execute("""select * from instaclone_images
                               order by id desc limit 5""") # 1
    rows = cur.fetchall() # 2
    return render_template('front_page.html', rows=rows, front_page=True) # 3
1 select * from ... is an sql query - code the database executes
2 rows will be basically an array of arrays
3 and we pass data to the template to render to HTML

      Data Layer

Relational databases: oriented around tables, columns, and rows. Succint language for expressing concepts about entities made up of structured data and relationships between the entities.

schema.sql
CREATE TABLE "instaclone_likes" (
       "id" INTEGER PRIMARY KEY AUTOINCREMENT,
       "image" integer REFERENCES "instaclone_images" ("id"),
       "dt" TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE "instaclone_images" (
       "id" INTEGER PRIMARY KEY AUTOINCREMENT,
       "parent_id" integer REFERENCES "instaclone_images" ("id"),
       "name" varchar(255) NOT NULL,
       "image" varchar(255) NOT NULL,
       "dt" TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

      Business Logic

      Business Logic

What we do is apply filters to images.

filters.py
def putpalette(image, palette):
    # convert to grayscale
    orig_mode = image.mode
    if orig_mode != "L":
        image = image.convert("L")
    # apply palette - adjusting balance of colors in image
    image.putpalette(palette)
    # convert back to its original mode
    if orig_mode != "L":
        image = image.convert(orig_mode)
    return image

   Lab: getting set up on EC2

For this class I’ve already installed the web application on LAMP boxes that are actually virtual machines running on Amazon’s hosted infrastructure.

EC2 - Elastic Compute Cloud - allows us to create virtual computers and run them on Amazon’s infrastructure.

Each student will have their own machine to work on. For convenience all instances have the same security credentials. Please see http://simeonfranklin.com/scalability/ to download the public key and sign up to reserve your machine.

You can now ssh to your box with:

$ ssh -i templatekey.pem ec2-user@ec2-54-241-193-255.us-west-1.compute.amazonaws.com # but use your machine name!

Your machine has already been set up by running the deployment script against it.

Go ahead and explore the web application (upload a jpg, apply a filter to an existing image, etc). Log in to your box via ssh and take a minute to explore the code in /var/www/instaclone.

   Sample Web Application Follow-up

In this section you saw a high level overview of some of the common functionality many web applications provide. You also got to explore the sample web application and see some of the code responsible for generating it. We’re now ready to consider the question of scaling our web application.

Bottlenecks in Scaling Web Applications

   Lesson Objectives

At the conclusion of this lesson you will:

   Performance/Scalability

Definitions can be complicated.

   Performance/Scalability

the ability to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth

   Performance/Scalability

Speed.

      Web Application Performance

Is the result of many complex factors from network latency to HTML and Javascript structure. It implies questions like:

What is the total amount of time between the start of the request
and application readiness?
How soon after hitting refresh can I use the application?

      Web Application Scalability

A scalable web application might be slower than a non-scalable one.

Scalability questions are about capacity. Consider our very simple sample application:

   Possible areas of difficulty

You can figure out some of the limits of our sample application. We’re currently running on an AWS Micro instance with:

   Possible areas of difficulty

We can easily scale vertically thanks to the ability of AWS EC2 to move to more capable virtual hardware. We could get more disk space, more memory, better IO latency and a beefier CPU. At some point, however, we’ll run into limits. Let’s calculate some of those limits on our current virtual hardware.

      Quiz

Some of the limits for a particular application and environment are obvious and easy to (roughly) calculate:

  1. If we limit the largest image upload to 1Mb what would be the (approximate) minimum upper bound on the number of images we can host.

  2. If a process running our application takes approximately 40Mb of memory to serve a single request, each process can handle one request at a time, and our OS leaves us about 300Mb of memory, approximately how many simultaneous requests can we serve?

  3. If our maximum network throughput is limited to 10Mb/s and an average response returns approximately 1Mb over 2 seconds, approximately how many simultaneous requests can we serve?

  4. More genererally: How do performance and scalability differ? How are they related? Describe a scenario in which slow performance could cause an application to exceed its scalability limits. (Hint: consider an interaction between latency and maximum number of requests.)

      Common Scalability Problems and solutions

Fortunately we don’t have to figure out all the possible bottlenecks and come up with solutions ourselves. Web applications tend to hit the same bottlenecks and the solutions are common enough to describe generally even if the implementation for any given application may be specific.

We’ll look in more detail at the following common areas of that may limit the scalability of our application. In reviewing each area we’ll discuss several strategies for improving the scalability of our application.

  1. database limits

  2. Serving static resources

  3. CPU and memory bound web framework code

  4. CPU/Memory/IO bound application code

   Bottlenecks Follow-up

In this section you learned about the difference between performance and scalability. You also saw a high level overview of some of the common limits web applications may run into when trying to scale to larger amounts of data or users.

Scaling the Database Layer

   Lesson Objectives

At the conclusion of this lesson you will:

   What Limits Relational Database Scalability?

The fundamental features of relational database systems.

   What Limits Relational Database Scalability?

ACID

   What Limits Relational Database Scalability?

A

Atomic - transactions suceed completely or fail completely

C

Consistent - transactions bring the database from one consistent state to another

I

Isolation - transactions are isolated so that simultaneous transactions must have the same affect as though executed serially

D

Durability - once a transaction is committed it will remain so even in the event of power loss, system crash, etc

(see http://en.wikipedia.org/wiki/ACID for more details)

   What Limits Relational Database Scalability?

> These properties require that writes to the datastore will have to go back onto disk (a durable datastore) even if all the data can fit in memory.

Disk latency is likely to be a limiting factor in both the performance and scalability of databases.

Irrespective of the performance characteristics of individual querie (how long does it take to execute a single query) there are also scalability limitations frequently expressed as a maximum # of read and write transactions/sec.

   What Limits Relational Database Scalability?

Example of performance limits on beefy hardware

Source: http://locklessinc.com/articles/mysql_performance/

   What Limits Relational Database Scalability?

Additionally different databases may have absolute and relative limits on the size of datasets

   How can we overcome these limits?

Typical strategies for scaling the database layer include:

   How can we overcome these limits?

Master/Slave or data replication database configuration

   How can we overcome these limits?

There are drawbacks!

   How can we overcome these limits?

Database Sharding

   How can we overcome these limits?

There are drawbacks!

   Maybe we shouldn’t use (just) Relational Databases!

In response to these performance limitations many alternative data stores can be used.

Typically you give up ACID attributes or the ability to reason relationally in exchange for different performance and scaling characteristics.

   Maybe we shouldn’t use (just) Relational Databases!

And many others generally lumped together as NoSql datastores.

Not necessarily a replacement, but perhaps an augmentation to the relational database as a data store. Typical patterns:

   Quiz

  1. Using a cache to store the results of expensively calculated for future lookup sounds like a performance optimisation. Explain how caching long running queries may enahnce the scalability of your application (remember the interplay between latency and throughput).

  2. Login to your EC2 instance. The mysqlslap tool can be used to stress test a MySql database. See http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html for documentation but try:

    $ mysqlslap --create-schema="instaclone"
                --query="select * from instaclone_images order by dt DESC"
                --concurrency=100 --number-of-queries=10000

    What does the output tell you about the performance abilities of our micro instances? Can you see any effect when ramping up concurrency?

  3. Consider our sample application with minimal data. What techniques could we use to make sure our application scales? Consider at least two techniques and cite at least one drawback for any change we might make.

   Database Follow-up

In this section you learned about some of the specific issues associated with scaling the database layer. You also learned about some techniques commonly used to scale the database layer. While thoroughly understanding all the options available for dynamic data storage is beyond the scope of this class you should at least understand some of the desirable performance and scaling characteristics available in non-relational data stores as well as some of the potential drawbacks.

Scaling Static Resources

   Lesson Objectives

In this lesson we will:

   Static Resource Scalability Limits

We’ve already discussed the most obvious limit associated with static resources:

   Static Resource Scalability Limits

Seems easy to scale vertically.

Our 8Gb instance might only be able to store ~14,000 images if the average image "weighs" about 1/2Mb and we have most of the disk to allocate to media storage.

But it’s trivial nowadays to get 500Gb or 1Tb of storage. That would take us up to 1-2 million images which seems like a lot.

imgur gets ~24 million new images each month as of late 2012 (see http://en.wikipedia.org/wiki/Imgur)! Even simple storage might be a problem.

      Serving Static Resources

We might also run into limitations (including some suprises) in the process of serving static resources.

Assuming a 100Mb/s network connection how many images per second can we serve?

Simple math suggests we could serve 100 1Mb images in a single second, saturating our network connection.

      Serving Static Resources

Of course the answer ends up depending on client bandwidth as well as server bandwidth - we can’t stream files faster than the client can download. If clients can support on average 300Kb/s throughput we might be able to allow 300 simultaneous connections that all take 3 seconds to stream a 1Mb image, still arriving at 100 1Mb images in a single second.

      Serving Static Resources

However: each transfer requires a separate web server process or thread and each process or thread may take 60Mb of memory… We may not have enough memory (18Gb) to create 300 processes.

   Common Solutions

Fortunately there are relatively simple steps we can take to solve the problem of scaling static assets.

      Drawbacks

There are always tradeoffs! (But I’ll let you figure them out yourself.) And there are performance complexities not yet addressed - latencies based on geographical location, cache management, etc.

Why not let somebody else handle this?

      Drawbacks

A content delivery network or content distribution network (CDN) is a large distributed system of servers deployed in multiple data centers in the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance.

-- http://en.wikipedia.org/wiki/Content_delivery_network

      Drawbacks

The most common solution to scaling static media is to outsource it to someone else who specializes in scalable data storage and service. If not a full blown CDN at least an http accessible datastore.

   Quiz

  1. Can you describe some potential drawbacks to common approaches to scaling media? What might be difficult about sharding media servers?

  2. Using cloud infracture avoids some drawbacks - but might introduce others. Can you think of any cases where using a remote CDN or network file store might increase our scalability but decrease our performance?

   Static Resources Follow-up

In this section you learned about some of the potential bottlenecks associated with scaling static resources - both obvious limitations like total disk space and less straightforward limitations like the potential interactions between network speed and memory usage. You also learned about some of the techniques used to alleviate scaling issues surrounding static resources.

Scaling the "Web" part of our application

   Lesson Objectives

Every web application must, at a minimum, talk HTTP. Most applications will also perform routine tasks not directly related to our application’s business logic. Web stuff - like generating html.

In this lesson we will learn:

   Web applications are all the same

No really! If it’s web, it supports HTTP!

Understanding HTTP can have serious impacts on both performance and scalability.

   Web applications are mostly the same

Most web applications are going to generate HTML to present the application specific information.

This can be a CPU and memory intensive process.

   Web applications are mostly the same

The answer to performance problems is always

   Caching

Web applications frequently can profit from caching added at the boundaries between layers.

      HTTP Cache directives inform client caching

https://developers.google.com/speed/articles/caching

      HTTP Cache directives inform client caching

Wait a minute! I thought we were talking about scaling, not performance!

See https://www.varnish-cache.org/ or http://www.squid-cache.org/

   Client/Server key-value cache stores

The most used technique to improve the performance of web application code is to cache the results of expensive operations.

A distributed in-memory hash table server

Key/value oriented with simple operations: set, get, del

   Client/Server key-value cache stores

Look at all your layers for places where you need finer caching granularity than "the whole page". We could:

   Client/Server key-value cache stores

Wait a minute! I thought we were talking about scaling, not performance!

True, but we may need to scale our performance increases!

Obviously we can only continue to populate the cache while we have available memory in which to store data.

Memcache is a distributed system which can easily be set up to shard its data store across multiple machines. Adding additional "memory" to your cache can be easily accomplished by adding additional memcached servers.

   Client/Server key-value cache stores

Here at Facebook, we’re likely the world’s largest user of memcached. […] We use more than 800 servers supplying over 28 terabytes of memory to our users.

-- https://www.facebook.com/note.php?note_id=39391378919

   Spreading requests across multiple servers

No matter how much caching we add, it is possible to saturate our web server with too many requests.

images/requests.jpg

Source: http://wiki.dreamhost.com/File:Webserver_requests_graph.jpg

   Spreading requests across multiple servers

At some point we get too many requests per second to handle. The actual numbers aren’t important:

   Spreading requests across multiple servers

Sometimes the only possible answer is to add additional servers. This can be done by adding load balancing. This could be implemented with a load balanced setup

images/load_balanced.png

   Spreading requests across multiple servers

Load balancers may be implemented by:

Be careful about your architecture! With multiple web servers local data may differ. But shared resources frequently form a scaling bottleneck - enough so that a frequent design pattern in building scalable web applications is to use a Shared Nothing Architecture.

   Quiz

  1. Explain the difference between http caching servers like varnish or squid and key/value servers like memcached. Why can’t we just use http caching servers? What sort of pages can’t be cached?

  2. A frequently used technique to make static resources as cache-able as possible is to use "far-future" expires headers. So for my main style.css file I might make sure a header like

    Expires: Sat, 1 Feb 2014 23:00:00 GMT

    is issued. Are there any drawbacks to this policy? When will the client download a new version of my style.css?

  3. How many requests/second does our http server on ec2 support for static files? How about for the front page of our application? Use the command line tool ab on our ec2 instance to test the performance of Apache. Try ab -h for help but

    $ ab -n 1000 -c 5 http://localhost/

    is the sort of thing we want to do. Try ramping up concurrency. What effects do you see on static files? How about against our application?

   Web Requests Follow-up

In this lesson we reviewed some features of HTTP that allow us to implement scaling someplace other than our web application server. We saw that HTTP caching servers or memcache servers allow us to spread the network traffic or the memory usage across multiple servers. Sometimes, however, we are forced to scale our web application server directly and we reviewed using a load balancer to distribute an application across multiple web servers.

Scaling Our Application

   Lesson Objectives

Every non-trivial application will eventually hit a performance bottleneck. The bottleneck might be CPU, IO, or memory availability but the solutions are typically the same. In this lesson we’ll explore the architecture of typical distributed task systems used by web applications.

   Finding our limitations

Finding the bottleneck that limits the ability of our application is a matter of applying typical performance analysis techniques.

You may or may not already be familiar with profiling tools for investigating the performance of conventional applications - and teaching the usage of sophisticated profiling tools like Valgrind or Dtrace is definitely beyond the scope of the class.

Ideally you should be able to run the performance-critical parts of your application outside of a "web" context. We can frequently figure out the limiting factor in the performance of our application through simple measurements and some reasoning. Additionally we may be able to solve several different classes of performance bottlenecks by using the same basic techniques.

      CPU Bound

It’s possible that our application is CPU-bound. This simply means that the factor limiting our performance is the CPU. Typically applications that are CPU bound do complicated computation on relatively small data sets.

We can roughly test our application performance by running a performance monitoring tool like top while simultaneously running our application continuously. If our application is CPU bound we should be capable of maxing out our CPU utilization.

Note: Look for telltales (like maxing out 1/4 of your total CPU utilization) that hint you could reduce your CPU bottleneck by increasing the concurrency of your application.

      Memory Bound

It’s possible that the limiting factor in our performance is the amount of available memory. If our application isn’t maxing out the CPU this implies we should be able to increase performance by increasing concurrency. However if our program consumes all the available memory we are incapable of increasing concurrency.

Note: Of course modern systems don’t actually run out of memory - they can use swap space to emulate memory. Of course disk IO is orders of magnitude slower than memory, so we want to make sure we can stay in RAM. Use top to evaluate your program’s memory usage.

      IO Bound

It’s possible that our application is IO bound. In, fact, for scalable web applications (which are likely to depend heavily on networked services) its very likely. It’s also possible to max out disk throughput if your application is reading/writing large data sets or many small

It can can be more difficult to measure our IO performance. One huge clue is the absence of other bottlenecks. If you run your application continously and are not maxing out either available memory or CPU utilization, it’s likely the performance limiting factor is IO of some kind.

   Removing limitations

Fortunately, we can solve many different kinds of scaling problems with the same basic architecture.

Most scalable web applications end up using a distributed task queue architecture.

The basic idea is straightforward:

      Details, Details

Of course the devil is always in the details. How do we transform code into tasks that can be serialized, shipped to a different environment, and executed? How do servers and workers all communicate with the same queue? What about details like priorities on tasks, success/error notification, canceling jobs, etc?

Fortunately these are solved problems in a variety of languages. Robust task queues have become a standard add-on to many web application frameworks and there’s even a protocol (AMQP) specifically for sending/queuing messages between distributed systems. RabbitMQ is a popular AMQP server and is widely used (see http://www.amqp.org/about/examples) by many tech giants.

RabbitMQ has API implementations in many languages. In the Python world Celery is a popular distributed task queue system that can use RabbitMQ as its message broker. Celery has good support for easily creating tasks from Python functions, running worker daemons, and can use alternative messages stores as necessary.

      Details, Details

Obviously this could get a little complicated.

Sample Celery Architecture

      Details, Details

But it is necessary complexity to achieve a simple objective. We are circumventing our CPU, IO, or memory bottleneck by scaling it across multiple machines.

And as we will see - by using cloud based offerings instead of rolling our own we might not have to do all the devops work necessary to build our own cluster of workers.

   Quiz

  1. Explain the difference between CPU bound, memory bound, and IO bound problems. Which category is calculating digits of PI likely to be in? How about parsing and manipulating a few multi-gigabyte XML files? What about reading and parsing 1 million separate small XML records stored on the filesystem?

  2. How about our application? What do you guess the bottleneck will be if we apply filters to many pictures? What if the pictures are large? What if we do many requests concurrently?

    Let's test out your intuition. The
    time command will run a script and report how much time
    it took. You can run the `filter.py` file used by our web
    application as a command line program. Log into your EC2 instance
    and and run
    ----
    $ time /var/www/instaclone/ENV/bin/python filters.py input.jpg output.jpg
    ----
    Where `input.jpg` is an existing file. To help you out let me show
    you two tricks I use to bulk produce sample data and to add
    concurrency to testing. I put some images in the sample-images
    folder and you can use the `xargs` command to make many copies:
    ----
    $ cd sample-images
    $ find *jpg | xargs -i{} cp {} 1{}
    ----
    Now we can use the `xargs` command to run repeatedly for each file,
    timing the whole run.
    ----
    $ time find sample-images/*jpg | xargs -I{}  /var/www/instaclone/ENV/bin/python filters.py {} {}.del
    ----
    How many images/second can you filter? Does it help to increase
    concurrency (try adding a `-P 4` to the `xargs` command)? Use the
    `top` command to look at general system activity (you might have to
    login in a second window). What is the bottleneck in our
    application?

   Scaling Our Application Follow-up

In this lesson we’ve reviewed the major kinds of performance bottlenecks we might address via scaling. We also reviewed the typical architecture required to scale performance bottlenecks that are inherent to our application. And in a future lesson we’ll see that taking advantage of a distributed task queue may not be quite as complicated as it first appears!

1 / 188

#