Lots of theoretical material to cover. Interspersed with hands-on measurements, interactive API explorations, quiz questions in short essay form.
In this class we will explore:
By the end of this class you will:
Due to the short nature of the class we won’t cover in detail:
I will present:
Lots of theoretical material to cover. Interspersed with hands-on measurements, interactive API explorations, quiz questions in short essay form.
I hope you will:
Take the time to play with the examples. Try to answer the quiz questions. No grading - the important thing is being able to reason about the environment and concepts.
Please interrupt! Ask questions.
Most of the work we’ll be doing will be on remote computers - you’ll need to have an SSH client (Putty on windows, standard openssh client on Mac/Linux) and basic familiarity with a command line environment.
None of the following resources are necessary for the class - but if you’re interested in learning more about AWS:
See http://github.com/simeonf/instaclone for the source
Upload jpeg’s, apply a filter from a dropdown to get a new image.
Let’s try it out.
Now that we’ve explored what our web application does let’s take a look at the major components of this or any other web application.
|
The process of mapping the path portion of a url to a specific part of our code. |
How do we know what to do when the user clicks on a link like /images/1?
@app.route('/') #def front_page(): db = util.get_db() cur = db.execute("""select * from instaclone_images order by id desc limit 5""") rows = cur.fetchall() return render_template('front_page.html', rows=rows, front_page=True)
@app.route is a decorator provided by our framework that ties a particular path to a particular python function. |
Most web applications are at least partially "page" oriented due to the nature of the web.
The page is what I get back when I go to a particular url. A typical page is composed of HTML containing our application's information and associated UI resources like Javascript, CSS and media like images or movies.
Asking for a url in our browser generates a cascade of requests.
The python function front_page corresponds to the "/" url. It looks like this:
@app.route('/') def front_page(): db = util.get_db() cur = db.execute("""select * from instaclone_images order by id desc limit 5""") rows = cur.fetchall() #return render_template('front_page.html', rows=rows, front_page=True) #
![]()
Fetches dynamic data from the database | |
Uses a template to render the data into an HTML page |
|
Syntax may vary between templating systems but most are essentially static content with placeholder substitution. Templates are code too! |
{% extends "base.html" %}{% block main %} <script src="/static/jquery.cycle2.min.js"></script> <div class="cycle-slideshow" style="width:80%"> {% for row in rows %}
<a href="/image/{{ row.id }}"><img src="{{ row|image_path }}" data-title="{{ row.name }}" alt=""></a> {% endfor %} </div> <p>Add beautiful effects to your pictures and share them with friends...</p> {% endblock %}
template tag that uses inheritance to link child template to parent | |
template tag that does repeated substitution and generation |
Many web applications use a relational database to store at least some of their data.
|
This is so common you may see LAMP - Linux/Apache/MySql/P(erl|ython|HP) - assumed to be the default stack upon which to write web applications. |
Let’s look at our function again:
@app.route('/') def front_page(): db = util.get_db() cur = db.execute("""select * from instaclone_images order by id desc limit 5""") #rows = cur.fetchall() #
return render_template('front_page.html', rows=rows, front_page=True) #
![]()
select * from ... is an sql query - code the database executes | |
rows will be basically an array of arrays | |
and we pass data to the template to render to HTML |
Relational databases: oriented around tables, columns, and rows. Succint language for expressing concepts about entities made up of structured data and relationships between the entities.
CREATE TABLE "instaclone_likes" ( "id" INTEGER PRIMARY KEY AUTOINCREMENT, "image" integer REFERENCES "instaclone_images" ("id"), "dt" TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE "instaclone_images" ( "id" INTEGER PRIMARY KEY AUTOINCREMENT, "parent_id" integer REFERENCES "instaclone_images" ("id"), "name" varchar(255) NOT NULL, "image" varchar(255) NOT NULL, "dt" TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
What we do is apply filters to images.
def putpalette(image, palette): # convert to grayscale orig_mode = image.mode if orig_mode != "L": image = image.convert("L") # apply palette - adjusting balance of colors in image image.putpalette(palette) # convert back to its original mode if orig_mode != "L": image = image.convert(orig_mode) return image
For this class I’ve already installed the web application on LAMP boxes that are actually virtual machines running on Amazon’s hosted infrastructure.
EC2 - Elastic Compute Cloud - allows us to create virtual computers and run them on Amazon’s infrastructure.
Each student will have their own machine to work on. For convenience all instances have the same security credentials. Please see http://simeonfranklin.com/scalability/ to download the public key and sign up to reserve your machine.
You can now ssh to your box with:
$ ssh -i templatekey.pem ec2-user@ec2-54-241-193-255.us-west-1.compute.amazonaws.com # but use your machine name!
Your machine has already been set up by running the deployment script against it.
Go ahead and explore the web application (upload a jpg, apply a filter to an existing image, etc). Log in to your box via ssh and take a minute to explore the code in /var/www/instaclone.
In this section you saw a high level overview of some of the common functionality many web applications provide. You also got to explore the sample web application and see some of the code responsible for generating it. We’re now ready to consider the question of scaling our web application.
At the conclusion of this lesson you will:
Definitions can be complicated.
the ability to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth
Speed.
Is the result of many complex factors from network latency to HTML and Javascript structure. It implies questions like:
What is the total amount of time between the start of the request and application readiness?
How soon after hitting refresh can I use the application?
A scalable web application might be slower than a non-scalable one.
Scalability questions are about capacity. Consider our very simple sample application:
You can figure out some of the limits of our sample application. We’re currently running on an AWS Micro instance with:
We can easily scale vertically thanks to the ability of AWS EC2 to move to more capable virtual hardware. We could get more disk space, more memory, better IO latency and a beefier CPU. At some point, however, we’ll run into limits. Let’s calculate some of those limits on our current virtual hardware.
Some of the limits for a particular application and environment are obvious and easy to (roughly) calculate:
If we limit the largest image upload to 1Mb what would be the (approximate) minimum upper bound on the number of images we can host.
If a process running our application takes approximately 40Mb of memory to serve a single request, each process can handle one request at a time, and our OS leaves us about 300Mb of memory, approximately how many simultaneous requests can we serve?
If our maximum network throughput is limited to 10Mb/s and an average response returns approximately 1Mb over 2 seconds, approximately how many simultaneous requests can we serve?
More genererally: How do performance and scalability differ? How are they related? Describe a scenario in which slow performance could cause an application to exceed its scalability limits. (Hint: consider an interaction between latency and maximum number of requests.)
Fortunately we don’t have to figure out all the possible bottlenecks and come up with solutions ourselves. Web applications tend to hit the same bottlenecks and the solutions are common enough to describe generally even if the implementation for any given application may be specific.
We’ll look in more detail at the following common areas of that may limit the scalability of our application. In reviewing each area we’ll discuss several strategies for improving the scalability of our application.
database limits
Serving static resources
CPU and memory bound web framework code
CPU/Memory/IO bound application code
In this section you learned about the difference between performance and scalability. You also saw a high level overview of some of the common limits web applications may run into when trying to scale to larger amounts of data or users.
At the conclusion of this lesson you will:
The fundamental features of relational database systems.
ACID
Atomic - transactions suceed completely or fail completely
Consistent - transactions bring the database from one consistent state to another
Isolation - transactions are isolated so that simultaneous transactions must have the same affect as though executed serially
Durability - once a transaction is committed it will remain so even in the event of power loss, system crash, etc
(see http://en.wikipedia.org/wiki/ACID for more details)
> These properties require that writes to the datastore will have to go back onto disk (a durable datastore) even if all the data can fit in memory.
Disk latency is likely to be a limiting factor in both the performance and scalability of databases.
Irrespective of the performance characteristics of individual querie (how long does it take to execute a single query) there are also scalability limitations frequently expressed as a maximum # of read and write transactions/sec.
Additionally different databases may have absolute and relative limits on the size of datasets
Typical strategies for scaling the database layer include:
Master/Slave or data replication database configuration
There are drawbacks!
Database Sharding
There are drawbacks!
In response to these performance limitations many alternative data stores can be used.
Typically you give up ACID attributes or the ability to reason relationally in exchange for different performance and scaling characteristics.
And many others generally lumped together as NoSql datastores.
Not necessarily a replacement, but perhaps an augmentation to the relational database as a data store. Typical patterns:
Using a cache to store the results of expensively calculated for future lookup sounds like a performance optimisation. Explain how caching long running queries may enahnce the scalability of your application (remember the interplay between latency and throughput).
Login to your EC2 instance. The mysqlslap tool can be used to stress test a MySql database. See http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html for documentation but try:
$ mysqlslap --create-schema="instaclone" --query="select * from instaclone_images order by dt DESC" --concurrency=100 --number-of-queries=10000
What does the output tell you about the performance abilities of our micro instances? Can you see any effect when ramping up concurrency?
Consider our sample application with minimal data. What techniques could we use to make sure our application scales? Consider at least two techniques and cite at least one drawback for any change we might make.
In this section you learned about some of the specific issues associated with scaling the database layer. You also learned about some techniques commonly used to scale the database layer. While thoroughly understanding all the options available for dynamic data storage is beyond the scope of this class you should at least understand some of the desirable performance and scaling characteristics available in non-relational data stores as well as some of the potential drawbacks.
In this lesson we will:
We’ve already discussed the most obvious limit associated with static resources:
Seems easy to scale vertically.
Our 8Gb instance might only be able to store ~14,000 images if the average image "weighs" about 1/2Mb and we have most of the disk to allocate to media storage.
But it’s trivial nowadays to get 500Gb or 1Tb of storage. That would take us up to 1-2 million images which seems like a lot.
imgur gets ~24 million new images each month as of late 2012 (see http://en.wikipedia.org/wiki/Imgur)! Even simple storage might be a problem.
We might also run into limitations (including some suprises) in the process of serving static resources.
Assuming a 100Mb/s network connection how many images per second can we serve?
Simple math suggests we could serve 100 1Mb images in a single second, saturating our network connection.
Of course the answer ends up depending on client bandwidth as well as server bandwidth - we can’t stream files faster than the client can download. If clients can support on average 300Kb/s throughput we might be able to allow 300 simultaneous connections that all take 3 seconds to stream a 1Mb image, still arriving at 100 1Mb images in a single second.
However: each transfer requires a separate web server process or thread and each process or thread may take 60Mb of memory… We may not have enough memory (18Gb) to create 300 processes.
Fortunately there are relatively simple steps we can take to solve the problem of scaling static assets.
There are always tradeoffs! (But I’ll let you figure them out yourself.) And there are performance complexities not yet addressed - latencies based on geographical location, cache management, etc.
Why not let somebody else handle this?
A content delivery network or content distribution network (CDN) is a large distributed system of servers deployed in multiple data centers in the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance.
-- http://en.wikipedia.org/wiki/Content_delivery_network
The most common solution to scaling static media is to outsource it to someone else who specializes in scalable data storage and service. If not a full blown CDN at least an http accessible datastore.
Can you describe some potential drawbacks to common approaches to scaling media? What might be difficult about sharding media servers?
Using cloud infracture avoids some drawbacks - but might introduce others. Can you think of any cases where using a remote CDN or network file store might increase our scalability but decrease our performance?
In this section you learned about some of the potential bottlenecks associated with scaling static resources - both obvious limitations like total disk space and less straightforward limitations like the potential interactions between network speed and memory usage. You also learned about some of the techniques used to alleviate scaling issues surrounding static resources.
Every web application must, at a minimum, talk HTTP. Most applications will also perform routine tasks not directly related to our application’s business logic. Web stuff - like generating html.
In this lesson we will learn:
No really! If it’s web, it supports HTTP!
Understanding HTTP can have serious impacts on both performance and scalability.
Most web applications are going to generate HTML to present the application specific information.
This can be a CPU and memory intensive process.
The answer to performance problems is always
Web applications frequently can profit from caching added at the boundaries between layers.
Wait a minute! I thought we were talking about scaling, not performance!
See https://www.varnish-cache.org/ or http://www.squid-cache.org/
The most used technique to improve the performance of web application code is to cache the results of expensive operations.
A distributed in-memory hash table server
Key/value oriented with simple operations: set, get, del
Look at all your layers for places where you need finer caching granularity than "the whole page". We could:
Wait a minute! I thought we were talking about scaling, not performance!
True, but we may need to scale our performance increases!
Obviously we can only continue to populate the cache while we have available memory in which to store data.
Memcache is a distributed system which can easily be set up to shard its data store across multiple machines. Adding additional "memory" to your cache can be easily accomplished by adding additional memcached servers.
Here at Facebook, we’re likely the world’s largest user of memcached. […] We use more than 800 servers supplying over 28 terabytes of memory to our users.
-- https://www.facebook.com/note.php?note_id=39391378919
No matter how much caching we add, it is possible to saturate our web server with too many requests.
Source: http://wiki.dreamhost.com/File:Webserver_requests_graph.jpg
At some point we get too many requests per second to handle. The actual numbers aren’t important:
Sometimes the only possible answer is to add additional servers. This can be done by adding load balancing. This could be implemented with a load balanced setup
Load balancers may be implemented by:
Be careful about your architecture! With multiple web servers local data may differ. But shared resources frequently form a scaling bottleneck - enough so that a frequent design pattern in building scalable web applications is to use a Shared Nothing Architecture.
Explain the difference between http caching servers like varnish or squid and key/value servers like memcached. Why can’t we just use http caching servers? What sort of pages can’t be cached?
A frequently used technique to make static resources as cache-able as possible is to use "far-future" expires headers. So for my main style.css file I might make sure a header like
Expires: Sat, 1 Feb 2014 23:00:00 GMT
is issued. Are there any drawbacks to this policy? When will the client download a new version of my style.css?
How many requests/second does our http server on ec2 support for static files? How about for the front page of our application? Use the command line tool ab on our ec2 instance to test the performance of Apache. Try ab -h for help but
$ ab -n 1000 -c 5 http://localhost/
is the sort of thing we want to do. Try ramping up concurrency. What effects do you see on static files? How about against our application?
In this lesson we reviewed some features of HTTP that allow us to implement scaling someplace other than our web application server. We saw that HTTP caching servers or memcache servers allow us to spread the network traffic or the memory usage across multiple servers. Sometimes, however, we are forced to scale our web application server directly and we reviewed using a load balancer to distribute an application across multiple web servers.
Every non-trivial application will eventually hit a performance bottleneck. The bottleneck might be CPU, IO, or memory availability but the solutions are typically the same. In this lesson we’ll explore the architecture of typical distributed task systems used by web applications.
Finding the bottleneck that limits the ability of our application is a matter of applying typical performance analysis techniques.
You may or may not already be familiar with profiling tools for investigating the performance of conventional applications - and teaching the usage of sophisticated profiling tools like Valgrind or Dtrace is definitely beyond the scope of the class.
Ideally you should be able to run the performance-critical parts of your application outside of a "web" context. We can frequently figure out the limiting factor in the performance of our application through simple measurements and some reasoning. Additionally we may be able to solve several different classes of performance bottlenecks by using the same basic techniques.
It’s possible that our application is CPU-bound. This simply means that the factor limiting our performance is the CPU. Typically applications that are CPU bound do complicated computation on relatively small data sets.
We can roughly test our application performance by running a performance monitoring tool like top while simultaneously running our application continuously. If our application is CPU bound we should be capable of maxing out our CPU utilization.
Note: Look for telltales (like maxing out 1/4 of your total CPU utilization) that hint you could reduce your CPU bottleneck by increasing the concurrency of your application.
It’s possible that the limiting factor in our performance is the amount of available memory. If our application isn’t maxing out the CPU this implies we should be able to increase performance by increasing concurrency. However if our program consumes all the available memory we are incapable of increasing concurrency.
Note: Of course modern systems don’t actually run out of memory - they can use swap space to emulate memory. Of course disk IO is orders of magnitude slower than memory, so we want to make sure we can stay in RAM. Use top to evaluate your program’s memory usage.
It’s possible that our application is IO bound. In, fact, for scalable web applications (which are likely to depend heavily on networked services) its very likely. It’s also possible to max out disk throughput if your application is reading/writing large data sets or many small
It can can be more difficult to measure our IO performance. One huge clue is the absence of other bottlenecks. If you run your application continously and are not maxing out either available memory or CPU utilization, it’s likely the performance limiting factor is IO of some kind.
Fortunately, we can solve many different kinds of scaling problems with the same basic architecture.
Most scalable web applications end up using a distributed task queue architecture.
The basic idea is straightforward:
Of course the devil is always in the details. How do we transform code into tasks that can be serialized, shipped to a different environment, and executed? How do servers and workers all communicate with the same queue? What about details like priorities on tasks, success/error notification, canceling jobs, etc?
Fortunately these are solved problems in a variety of languages. Robust task queues have become a standard add-on to many web application frameworks and there’s even a protocol (AMQP) specifically for sending/queuing messages between distributed systems. RabbitMQ is a popular AMQP server and is widely used (see http://www.amqp.org/about/examples) by many tech giants.
RabbitMQ has API implementations in many languages. In the Python world Celery is a popular distributed task queue system that can use RabbitMQ as its message broker. Celery has good support for easily creating tasks from Python functions, running worker daemons, and can use alternative messages stores as necessary.
Obviously this could get a little complicated.
But it is necessary complexity to achieve a simple objective. We are circumventing our CPU, IO, or memory bottleneck by scaling it across multiple machines.
And as we will see - by using cloud based offerings instead of rolling our own we might not have to do all the devops work necessary to build our own cluster of workers.
Explain the difference between CPU bound, memory bound, and IO bound problems. Which category is calculating digits of PI likely to be in? How about parsing and manipulating a few multi-gigabyte XML files? What about reading and parsing 1 million separate small XML records stored on the filesystem?
How about our application? What do you guess the bottleneck will be if we apply filters to many pictures? What if the pictures are large? What if we do many requests concurrently?
Let's test out your intuition. The time command will run a script and report how much time it took. You can run the `filter.py` file used by our web application as a command line program. Log into your EC2 instance and and run
---- $ time /var/www/instaclone/ENV/bin/python filters.py input.jpg output.jpg ----
Where `input.jpg` is an existing file. To help you out let me show you two tricks I use to bulk produce sample data and to add concurrency to testing. I put some images in the sample-images folder and you can use the `xargs` command to make many copies:
---- $ cd sample-images $ find *jpg | xargs -i{} cp {} 1{} ----
Now we can use the `xargs` command to run repeatedly for each file, timing the whole run.
---- $ time find sample-images/*jpg | xargs -I{} /var/www/instaclone/ENV/bin/python filters.py {} {}.del ----
How many images/second can you filter? Does it help to increase concurrency (try adding a `-P 4` to the `xargs` command)? Use the `top` command to look at general system activity (you might have to login in a second window). What is the bottleneck in our application?
In this lesson we’ve reviewed the major kinds of performance bottlenecks we might address via scaling. We also reviewed the typical architecture required to scale performance bottlenecks that are inherent to our application. And in a future lesson we’ll see that taking advantage of a distributed task queue may not be quite as complicated as it first appears!
1 / 188