For admins, by admins.
  
 
  1. Next-Generation of Antiweb
  2. Server Design
  3. Memory Management
  4. Processes
  5. Confs and Xconfs
  6. Filesystem Layout
  7. Unicode
  8. Anti Webpages

Next-Generation of Antiweb

Antiweb is a webserver written in Common Lisp, C, and Perl by Hoytech. Antiweb is not a "proof of concept" and is not "exploratory code". We intend the core design of Antiweb (as described in this design paper) to be stable for the next 10+ years of use.

The two webservers that have had the largest influence on Antiweb4 are nginx and lighttpd. Antiweb Tip
Did you know that Antiweb is older than both lighttpd and nginx? It used to be called awhttpd.
We took liberal advantage investigating these and other excellent servers while designing Antiweb4. Another more obscure server that has influenced Antiweb is fhttpd.

Why another webserver? In our opinion, the biggest problem with the above servers is that they aren't written in lisp. Many servers that we studied have grafted on extension languages (ie, Perl for nginx and Lua for lighttpd). Antiweb is different. Instead of being a C program that uses some other language, Antiweb is a lisp program that uses C (and Perl).

Server Design

Like nginx, lighttpd, fhttpd, and Antiweb3, Antiweb4 is an asynchronous or event-based or non-blocking server, meaning that a single thread of control multiplexes multiple client connections. An Antiweb system is a collection of unix processes. Connections are transferred between processes with sendmsg(). When this happens, any data that was initially read from the socket is transferred along with the socket itself. The socket is always closed in the sending process.

To multiplex connections inside a process, Antiweb uses a state machine data structure defined in src/libantiweb.h. Antiweb requires either the kqueue() or epoll() stateful event APIs in level-triggered mode.

  • On a 32 bit linux/CMUCL system, 10000 inactive keepalive connections consume about 3M of user-space memory (in addition to two lisp images).
  • The number of inactive keepalive connections has negligible performance impact on new connections.

There are three modes for sending files: medium, small, and large:

  1. Medium: These files are mmap()ed (memory mapped) to avoid copying the file's data into user-space. The data is copied directly from the filesystem to the kernel's socket buffer.
  2. Small: These files are read into a user-space buffer because a small read() is often cheaper than mmap()+munmap().
  3. Large: Antiweb uses a user-space buffer for large files. This is to avoid disk-thrashing when serving many large files to clients concurrently (idea from lighttpd) and to avoid running out of address space on 32 bit systems.

Super-size it: Because Antiweb uses a 64 bit off_t type and lisp's unlimited precision integers on all systems, Antiweb can serve files of any size. It also supports download resuming for all three file send modes.

Antiweb's data structures are designed for pipelining. Antiweb uses vectored I/O (also known as scatter-gather I/O) nearly everywhere. Antiweb's internal message passing protocol uses pipelining also. For example, an HTTP connection that pipelines two requests for small files followed by one request for a medium file is responded to with a single writev() system call consisting of the following:

  • The HTTP headers and file contents for the first two small files
  • The HTTP headers for the medium file
  • As much of the memory mapped medium file as it takes to fill the kernel's socket buffer.

Subsequently, all the generated log messages are written to the hub process with writev(). The hub will forward the log messages to the logger process with another writev(). Finally, the logger process will append the messages onto the axslog file.

To see the connection statistics of a worker process, use the -stats command:

# antiweb -stats /var/aw/example.conf
...
Keepalive Time: 65 seconds
Total Connections: 41  HTTP requests: 72  Avg reqs/conn: 1.8
File descriptor usage (estimate): 17/32767
Current Connections: 11
  Keepalives: 7  Sending files to: 2
  Proxy: Sources: 0  Sinks: 0  Idle: 0
  Timers: 0  Hub: 1  Unix Connections: 1
  Lingering: 0  Zombies: 0
...

Notice that in addition to the HTTP traffic, there is also a connection to the hub's unix socket that was opened on start-up, and one other open unix socket. That other unix socket is you. You created a supervisor connection while asking for stats info.

-stats will also tell you how hosts are mapped to directories on a worker:

# antiweb -stats /var/aw/example.conf
...
Host -> HTML root mappings:
  localhost -> /var/www/testing
  example.com -> /var/www/example.com
  www.example.com -> /var/www/example.com
...

Although usually we love it, sometimes pipelining is bad. Antiweb deliberately tears down persistent HTTP connections on certain responses:

  • 4XX and 5XX HTTP Errors - This is to prevent blind web vulnerability scanners like nikto from persisting or pipelining 95+ percent of their requests.
  • Directory Listings - To prevent pipelined recursive crawling.

When finished with a connection, Antiweb will shutdown the write direction of the socket and linger as required by HTTP/1.1. Antiweb always gracefully degrades for HTTP/0.9 and HTTP/1.0 clients. Antiweb has first-class IPv6 support. If you really do want to pipeline 4XX and 5XX errors, you have two options:

  1. Use Antiweb's rewrite module to change problematic requests into requests for existing files.
  2. Use Antiweb's fast-files module. This is a memory cache that supports accelerated static content, pre-generation of HTTP headers, negative caching, and persisting/pipelining 404 errors.

Antiweb was designed with security in mind from the beginning. Here are some of the security decisions made during the Antiweb design process:

  • Virtual hosts are privilege-separated without proxying. Once the hub has determined which worker should handle a connection, it transfers the socket to the worker process and has nothing further to do with the connection. Worker processes run under different UIDs from the hub (and each-other). Workers are optionally chroot()ed.
  • Workers have no access to log files: all log messages are sent to the hub over the unix socket. The hub then subsequently sends them to the logger process. This means that a compromised worker process cannot steal previously created log messages or log messages created by other workers, and a compromised hub process cannot steal previously created log messages.
  • CGI processes can be restricted with resource limitations.
  • Even on lisps without unicode support, Antiweb4 guarantees all internal data and filenames are UTF-8 encoded. This includes verifying all code-points are in their shortest possible representation and that there are no otherwise invalid surrogates.
  • Antiweb processes never try to clean-up or recover in the event of an unexpected condition. A process cannot do that because it has failed. Some other process that hasn't failed will clean-up after it.

Antiweb also includes an experimental new technology for constructing webpages called Anti Webpages. These are Perl-inspired programs that let you draw page layouts with significant whitespace, glue together HTML/CSS/Javascript, and more.

Antiweb was created for admins, by admins. Please let us know any ways you think it could be better.

Memory Management

The most important memory management system in Antiweb is lisp's garbage-collector. Many implementations of Common Lisp that Antiweb supports have excellent generational collectors so you will probably never notice pausing except in stressful benchmarks.

Outside of lisp, Antiweb also maintains two important data structures: conns and ioblocks. The definitions of these data structures are in the file src/libantiweb.h. Despite the extension, this is not a C header file. It is a special format that can be parsed by both lisp and a C compiler.

You can always get a break-down of the memory being used by an Antiweb process with the -room command. Here is what it might look like when you query a worker using CMUCL:

# antiweb -room /var/aw/example.conf

"---ANTIWEB MEMORY STATS---
Dynamic Space Usage:        1,946,272 bytes (out of  512 MB).
Read-Only Space Usage:     24,024,304 bytes (out of  256 MB).
Static Space Usage:         3,665,792 bytes (out of  256 MB).
Control Stack Usage:            1,636 bytes (out of  128 MB).
Binding Stack Usage:               88 bytes (out of  128 MB).
The current dynamic space is 0.
Garbage collection is currently enabled.

conns and ioblocks:
  Allocated conns:     2, 470 bytes
  Allocated ioblocks:  1, 4112 bytes, 14 in use, 99.7% overhead
  Free conns:          2, 470 bytes
  Free ioblocks:       514, 2113568 bytes
  Total:               2118620 bytes + malloc overhead
---END OF ANTIWEB MEMORY STATS---"

If you look through the file src/libantiweb.c, you will see that Antiweb malloc()s conn and ioblock structures but never free()s them. When Antiweb is done with a structure it pushes it onto a free-list. The next time it needs a structure it pops the most recently pushed one off the free-list. Antiweb can do this because conn and ioblock structures are always the same size.

Freeing memory considered harmful.

In the above -room output, the conn and ioblock measurements represent the high water mark. During periods of heavy traffic, more memory is allocated. When the traffic settles down, the pages at the bottom of the free-lists are swapped out by the kernel as needed.

Processes

For performance and security, Antiweb is a group of unix processes. This section describes the roles and responsibilities of the different processes.

The following figure illustrates the processes and responsibilities of a running Antiweb system.

Antiweb4 Processes

  1. Hub Process
  2. Logger process
  3. Worker Process(es)
  4. Supervisor Process(es)
  5. CGI Process(es)

Hub Process

The hub process is usually the busiest process in the Antiweb system. Here are its duties:

  • Accept new connections over its unix socket, which may either be connecting worker processes or supervisor connections.
  • Accept HTTP connections from the internet, decide based on the requested virtual host (vhost) which worker the connection applies to, then send the already read data from the connection along with the socket itself over the unix socket to the worker. This process closes the socket in the hub process so it is only open in the worker.
  • Accept all log messages from worker processes and forward them to the logger process.
  • Route inter-process messages (such as supervisor connections).

Some other details about the hub:

  • Runs as its own user or UID specified in the hub conf file.
  • chroot()ed to the empty directory, of which it owns no permissions.

Logger process

The logger process is connected to the hub process over a unix socket. Its only job is to accept log messages from the hub process and write them to disk. When a worker creates a log message, it is routed through the hub to the logger process.

  • Runs as its own user or UID specified in the hub conf file.
  • chroot()ed to the aw_log directory, of which it has write permission.
  • It will chmod() the log files so they are never world readable/writeable.
  • Log files cannot be symlinks.

Worker Process(es)

The worker processes do the heavy lifting of processing HTTP requests.

  • Each runs under its own user or UID specified in the worker conf.
  • Optionally chroot()ed.
  • Because it is transferred connecting client sockets, the worker process handles all subsequent HTTP requests on these sockets. They do not go through the hub.

Antiweb Tip
Why would you want to run multiple workers?

  • Privilege separated vhosts
  • SMP/multi-core
  • Reduce disk latency
On start-up, each worker process reads its conf file and compiles a function for dispatching HTTP requests given the mount points and other features present. After it starts, a worker process will never re-open this conf file. The only way to provide it a new conf file is by message passing using a supervisor process. This is necessary to allow you to chroot() a worker to a root not containing the conf file.

After registering the vhosts it is interested in with the hub, a worker process will "lock" the hub connection so it can't register additional vhosts. To later add vhosts, the connection must be manually unlocked by a supervisor process (this happens behind-the-scenes when you -reload a worker conf--see one possible but unlikely attack related to this.). If you try to -reload with a bad worker conf, Antiweb will not install the new conf and will continue with the original conf. If you -reload a worker conf and it compiles fine but throws some error during processing an HTTP request, Antiweb will kill the worker and the reason will be logged to syslog.

Privilege Separation

  • A compromised worker process cannot steal connections for vhosts that don't belong to it.
  • A compromised worker process cannot intercept log messages created by other workers.

Supervisor Process(es)

  • Supervisor processes are only intended for interactive use, not as unattended, long running processes.
  • A supervisor process never executes the eval message.

Supervisor processes do not run during normal Antiweb operation. The user must (as root) manually start a supervisor process. Antiweb Tip
Although you must start a supervisor process as root, root privileges are dropped once you have attached. They are dropped to the same UID/GID as the process you are attaching to.
Typically, you will only run a supervisor process to attach to other processes for diagnostics/development/etc.

Supervisor processes are created for almost every Antiweb operation, including -stats queries and adding or removing listeners. When you attach to a worker process, the hub passes the supervisor process's unix socket to the worker over the same unix socket that HTTP connections are transferred over.

CGI Process(es)

Antiweb Tip
The limit on the length of POST content is defined in src/libantiweb.h as AW_MAX_POST_LEN
Antiweb is conditionally compliant with CGI/1.1. All useful environment variables are supported, as well as buffered POST content on standard input. PATH_INFO is supported and is implemented more efficiently than many other servers (including Antiweb3). Your CGI script may start receiving data before the entire POST content has been read by the worker. If the POST is terminated prematurely, the content may be cut short. Be sure to check you have read exactly $ENV{CONTENT_LENGTH} bytes.

  • CGI processes run under the same UIDs as the worker processes that spawn them. If you chroot your worker processes, expect using CGI to be more difficult.
  • CGI scripts have all file descriptors from their parent worker closed as well as the worker's epoll descriptor (kqueues don't transfer on fork()).
  • Requests to CGI scripts always close the HTTP connection. You don't need to send a "Connection: close", Antiweb will send that for you (unless you use naked CGI scripts).
  • CGI scripts are unable to issue Antiweb log messages or even indicate success/failure to Antiweb (it just ignores CGI process exit() codes). Zombies are reaped.

Confs and Xconfs

Antiweb is not written in Common Lisp for the sake of novelty. Common Lisp was selected because it is the best. Instead of writing our own config file parsers, string routines, memory allocators, condition systems, dynamic binding systems, just-in-time compilers, etc, etc., we rely heavily on many features of Common Lisp (but not its I/O or pathnames).

If you haven't used lisp before, Antiweb may require a major shift in thinking. Many things you may think are impossible are easy. Languages like C are designed to be easy to implement—Common Lisp is designed to be powerful to program. Antiweb Tip
The primary description of our lisp programming style is Doug Hoyte's book:
Let Over Lambda.
Even if you have programmed in lisp before, "The Antiweb Way" might still seem foreign. Just like Antiweb is not a conventional webserver, it is also not a conventional lisp program.

This section describes the most important data-structures for installing, configuring, extending, and creating content for Antiweb: confs and xconfs.

  1. confs
  2. xconfs

confs

As an administrator, almost all Antiweb files you interact with are called conf files.

Conf files are text files filled with lisp forms. Every one of these lisp forms must be a list with the first element a symbol identifying the type of the form. This might be an example conf:

(worker blah)
(uid "my-user")

If a worker contains an inherit element, the corresponding file will also be parsed as a conf and its contents are appended to the current file's conf:

(inherit "/absolute/path/to/inherited.conf")

  • Circular inherits are detected and signal errors.

xconfs

The forms inside confs are called xconfs. They must begin with an identifying symbol, and then are optionally followed by arguments. The xconfs in the example confs from the previous section each had one mandatory argument, like this:

(name "value")

But here is an xconf with no mandatory arguments and one optional argument:

(name :arg "value")

Xconfs superficially resemble Common Lisp destructuring but are in fact very different. For instance, the same argument can be given multiple values in a single xconf:

(name :arg "value1" :arg "value2")

Depending on the type of xconf, Antiweb may use only "value1" or it may use both "value1" and "value2".

Xconfs can also have boolean arguments:

(name :arg "value1" :boolean-arg)
(name :boolean-arg :arg "value1")

The presence of the :boolean-arg keyword in the above examples turns on some functionality. In this respect, xconfs are closer to unix shell arguments than Common Lisp destructuring.

Confs and xconfs are designed to be intuitive and extremely flexible. The src/conf.lisp file has more details on how and why confs and xconfs work.

Filesystem Layout

Antiweb's installation procedure adds three (3) files to your filesystem. Their paths can be changed by editing build.lisp before compiling Antiweb. By default they are:

  • /usr/bin/antiweb - The perl launch script. All Antiweb processes are launched through this script interface.
  • /usr/lib/libantiweb32.so or /usr/lib/libantiweb64.so - The Antiweb shared library. Most of the code for accessing the network and filesystem is compiled and stored here. The name depends on your architecture.
  • /usr/lib/antiweb.cmu.image or /usr/lib/antiweb.clisp.image or /usr/lib/antiweb.ccl.image - A frozen image of the memory of an Antiweb lisp process. The name depends on your lisp environment.

You also need to create a hub directory. Inside this directory (typically /var/aw/):

  • hub.conf - Hub configuration. Includes which interaces/ports to listen for connections on (though you can add more later without restarting), the user or UID the hub and logger processes should run as, and the file descriptor limit for the hub process.
  • hub.socket - Unix domain socket for connecting to the hub. It is owned by root and no other user can connect. All Antiweb administration, development, and inter-process communication is done through this socket.
  • empty/ - An empty directory. The hub process is chroot()ed to this directory. It should have no write permission to this directory.
  • aw_log/ - The log directory. The logger process is chroot()ed to this directory. This directory name starts with the "aw_" prefix because that is a special prefix and such directories will never be served by Antiweb through HTTP in case of an incorrectly configured HTML root. This convention dates back to Antiweb2. In any case, both of Antiweb's log files are owned and only readable by the logger user. The two log files are:
    • syslog - System log messages. Always check this file first if something isn't working.
    • axslog - Access logs. All valid HTTP requests are recorded here.
    Workers pass log messages to the hub after which they are passed to the logger process. There are two ways to make the logger reopen its log files.

There is one worker.conf file per worker process. These files can live anywhere but a good place is in the /var/aw/ hub directory. They should not be owned (or writeable) by any user that a worker or hub runs as.

Every worker that uses gzip content encoding, Anti Webpages, javascript minification or CSS minification must have a cache directory. This directory must be owned by the user the worker runs as. No other users should have write access to this directory. Cache directories should never be in /tmp/ and should never have sticky bits.

Unicode

Antiweb Tip
Although all data is internally stored as (and enforced to be) UTF-8, you can map static files to different charsets with the :mime-type parameter of the worker conf. Also, in CGIs you can send any mime-type you want.
Antiweb4 supports exactly one type of character data:

  • UTF-8 encoded unicode code points.
All filenames must be UTF-8. The HTML content in Anti Webpages is always UTF-8. Here are some technologies NOT supported:
  • ASCII
  • Latin-1
  • UTF-16
  • etc...
The plan is to store user data in Normalization Form C (but this is not implemented yet).

Anti Webpages

We love Perl. Anti Webpages are an attempt to bring the Perl spirit to the modern web and to make this style of programming extremely efficient. Anti Webpages are not idealist, they are pragmatic. The manual you are reading now is an Anti Webpage.

The Guiding Principles for Anti Webpages

  • Anti Webpages: When you just gotta git 'er done dammit.
  • If you want to write straight HTML or Javascript or CSS, go ahead and do it. Anti Webpages don't judge, they glue.
  • If you can't solve it with a regular expression, you aren't trying hard enough.
  • There's Even More Ways To Do It Than In Perl (TEMWTDITIP).
  • It doesn't matter how deep you go. There's no bottom.
  • Why duct-tape when you can super-glue?

See the Anti Webpages section of the manual for more details.