Reverse Proxying of Web Services

You may wonder why someone should run a web proxy on the server machine, as usually they are associated with client side caching.

In a world of mixed static/interactive content, expensive (in terms of time spent) generation of content and multiple technologies, a so-called reverse proxy is a clever solution for multiple problems:

Imagine a resource-hungry server process that takes up a considerable amount of memory. AxKit1 web sites often fall into this category, as do many other Apache setups that combine multiple technologies and languages into one server. While such a server is spending time to generate an HTML page out of a database query, it just does its job. But what if the content is already complete and just waiting to be transmitted over a slow connection? A big chunk of memory is sitting unused, while perhaps a second server process (with a likewise big memory chunk) handles the next query which could as well have been handled by the first server process, if it was not sitting around waiting for the bits to arrive on the far end. Separating content generation from transmission will greatly aid with server software that tends to be on the fat side memory-wise.

Imagine a highly dynamic web site. No matter how dynamic, it will contain static elements. While AxKit2 can serve static elements, and do so real fast, why waste its attention when it could do what it does best, drive web applications? A reverse proxy will keep often-used items in memory, eliminating all disk io for these files, reducing overall system load.

Finally, think of dynamic but cacheable content. Sure, AxKit2 has a cache plugin, and there are many situations where it is the best way to go. But there is one case where a web proxy won't be beaten so easily, because this is it's main job: Caching complete web pages with a given expiry date/timeout, the most effective way of reducing load.

Probably the most popular open source web proxy is Squid, and it has full support for reverse proxying. Since it's configuration file is a bit large-ish, here are some important configuration settings to set up a plain reverse proxy:

# We want squid to act as the main web server
http_port 80

# disable some aspects of client-side caching
icp_port 0

# never cache some files (cgi scripts, URLs with a query string)
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY
# check out refresh_pattern as well

# tweak some of the memory settings
# depends on your site, you may as well use the defaults
maximum_object_size_in_memory 16 KB
cache_replacement_policy heap GDSF
memory_replacement_policy heap GDSF

# configuring access rules for reverse proxying
acl all src 0.0.0.0/0.0.0.0
acl localhost src 127.0.0.1/255.255.255.255
acl manager proto cache_object
acl purge method PURGE
acl backend dst 1.2.3.4         # The IP address of your server
acl Safe_ports port 8000        # Where the main webserver resides

# mostly the recommended default configuration
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
# Security: Deny requests to unknown ports or machines
http_access deny !Safe_ports
http_access deny !backend
icp_access deny all
# Finally open it for public usage
http_access allow all

httpd_accel_port 8000          # Again, the main webserver
httpd_accel_host virtual
httpd_accel_uses_host_header on

There are other settings to be customized, but most of the configuration file can stay at default settings.

And if you have chosen to run a Squid reverse proxy, the next step is to support VirtualHosting through some simple additions.