derpygamer2142.com

I LOVE C (my awesome web server)

this is my first blog post so be nice pretty please but also say if it is bad. for a less technical and nerdy overview, see the projects page.

seriously the amount of yap here is lethal. don't read this unless you're a giga nerd.


This code is written for Linux. I would rather eat glass than implement this in Windows. I do not apologize for any inconvenience this may cause. If you have any complaints, please breed and train a carrier pigeon before using the information on the contact page to mail them to me, where they can then be promptly burned.

All code in this article is licensed under MPL-2.0.

Once upon a time, I was sitting around thinking about how it had been an agonizingly long time (1 week) since I had last found an excuse to use C for something or other. I then got to pondering about how web servers work. I knew vaguely that you could communicate remotely through sockets or something because I had used those for my Buckshot Roulette project, but those were a fairly abstracted python library and no one really explained how the code worked, just that it did.

I began my exciting adventure by looking up "network programming in c" and came across Hands-On Network Programming with C by Lewis Van Winkle:

I highly recommend this book for an in-depth overview of socket programming in C on Windows, Linux, or MacOS. It also gave a jumping-off point for making a web server, which is what I was after.

Source code available here. Things may be omitted or modified for brevity.

    int create_socket(const char* host, const char* port) {
        struct addrinfo hints;
        memset(&hints, 0, sizeof(hints));
        hints.ai_family = AF_INET;
        hints.ai_socktype = SOCK_STREAM;
        hints.ai_flags = AI_PASSIVE;
        
        struct addrinfo* bind_address;
        getaddrinfo(host, port, &hints, &bind_address);

        int socket_listen;
        socket_listen = socket(bind_address->ai_family,
            bind_address->ai_socktype, bind_address->ai_protocol);
        
        if (socket_listen < 0) {
            fprintf(stderr, "socket() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }


        if (bind(socket_listen, bind_address->ai_addr, bind_address->ai_addrlen)) {
            fprintf(stderr, "bind() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }
        freeaddrinfo(bind_address);

        if (listen(socket_listen, 64)) {
            fprintf(stderr, "listen() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }

        return socket_listen;
    }
    

Our code starts off pretty simple. Create an addrinfo struct to store hints about what we want our address to do, and another to store the actual address:

        struct addrinfo hints;
        memset(&hints, 0, sizeof(hints));
        hints.ai_family = AF_INET;
        hints.ai_socktype = SOCK_STREAM;
        hints.ai_flags = AI_PASSIVE;

        struct addrinfo* bind_address;
    

The getaddrinfo(3) function takes in an argument for the host, the port, the hints about the address, and a location to store the output. We pass all of these, then use the resulting address to create a socket.

The ai_family, AF_INET, means that the address uses IPv4. An ai_socktype of SOCK_STREAM means that the socket will use TCP, the protocol HTTP is a subset of. Finally, the ai_flags having the AI_PASSIVE flag means that the address will be used to listen for connections via bind(2)

        getaddrinfo(host, port, &hints, &bind_address);

        int socket_listen;
        socket_listen = socket(bind_address->ai_family,
            bind_address->ai_socktype, bind_address->ai_protocol);
        
        if (socket_listen < 0) {
            fprintf(stderr, "socket() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }
    

I'll give a quick explanation of addresses and sockets, but I think this post is supposed to be more about the server and less about network programming as a whole. If you really care about that, please read Hands-On Network Programming with C or the man pages.

An address, described in the addressinfo struct of netdb.h, identifies an internet host and service in a way that can be used to either bind(2) (listen for incoming connections) or connect(2) (create an outgoing connection).

A socket, described in sys/socket.h is a file descriptor referencing a communications endpoint, and it's what allows you to send and receive data. This is still disconnected from the address until you bind(2) them, but they have the same protocol information.

        if (bind(socket_listen, bind_address->ai_addr, bind_address->ai_addrlen)) {
            fprintf(stderr, "bind() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }
        freeaddrinfo(bind_address);
    

We can bind the socket to the relevant address, and then free it using a built in function because the data is no longer needed. The socket is now connected to that address, and is ready to listen.

        if (listen(socket_listen, 64)) {
            fprintf(stderr, "listen() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }

        return socket_listen;
    

We start listening for connections on that socket, with the second argument, 64, specifying the maximum amount of concurrent connections on that socket. We use 64 to account for the inevitable onslaught of bots who just want to make our day worse by connecting to the socket and never disconnecting.

        SSL_library_init();
        OpenSSL_add_all_algorithms();
        SSL_load_error_strings();

        ctx = SSL_CTX_new(TLS_server_method());
        if (!ctx) {
            fprintf(stderr, "SSL_CTX_new() failed\n");
            return 1;
        }
        if (!SSL_CTX_use_certificate_file(ctx, cert_path, SSL_FILETYPE_PEM)
        || !SSL_CTX_use_PrivateKey_file(ctx, key_path, SSL_FILETYPE_PEM)) {
            fprintf(stderr, "SSL_CTX_use_certificate_file() failed\n");
            ERR_print_errors_fp(stderr);
            return 1;
        }
    

OpenSSL provides some wonderful functions that let us set up SSL, so when we're ready we can upgrade connections from HTTP to HTTPS. These functions are fairly self explanatory, but they set up an SSL context with the relevant certificate and private key. Once again, if you care about the specifics of how SSL works, I don't get paid enough to explain it.

A bit of helper stuff before we start worrying about connections. I'll only show the function signatures because the implementations themselves don't matter here.

        // include/hashtable.h
        
        // Allocate and initialize a hash table of a given size, with 0 resulting in a default size
        HashTable* hash_table_init(unsigned int start_size);

        // Store a value at a given key in a hash table, where key is len bytes long. Returns the number of used entries.
        unsigned int hash_store(HashTable* table, char* key, int len, void* value);

        // Get the value stored at a given key of length len in a hash table.
        void* hash_get(HashTable* table, char* key, int len);

        // Free memory used by a given hash table, and optionally free() all values stored in hash table entries.
        void destroy_hash_table(HashTable* table, int free_entries);

        struct cached_response {
            time_t last_update;
            char* response;
            size_t length;
            size_t alloc_size;
        };

        // malloc and calloc, except they allocate memory in shared memory space using mmap(2)
        void* calloc_shm(size_t __nmemb, size_t __size);
        void* malloc_shm(size_t __size);
    

This is just a generic hash table structure, but the important part is that the table and its data are store in shared memory. We'll use this hash table for caching responses. We also have a struct for the cached response, with the information we'll need to update the cache. Notably we store the allocated size of the request, for some shenanigans we'll talk about later.

        char* zlibGzip(char* input, unsigned int size, size_t* outSize);
    

Currently the only encoding method the web server supports is gzip, which uses zlib. It takes in a request and the size of it, then returns the compressed string along with its size.

        #define MAX_REQUEST_SIZE 2048
        // Max time since last packet, in seconds
        #define CLIENT_TIMEOUT 8

        struct client_info {
            socklen_t address_length;
            struct sockaddr_storage address;
            int socket;
            char request[MAX_REQUEST_SIZE + 1];
            int received;
            struct client_info *next;
            SSL* ssl;
            int tls;
            time_t last_packet;
        };

        // Get a client_info struct given the socket by traversing the linked list
        // This also creates a new client_info if it doesn't exist.
        struct client_info* get_client(int socket);

        // Closes a socket and removes it from the client list
        void drop_client(struct client_info* client);
    

We store a bit of info about each client; the address and its size, used to accept connections; the socket itself, to send data; the request and how much they've sent so far; a pointer to the next client in the linked list; the SSL context, if applicable; and whether or not the connection uses TLS. We also have some helper functions for those.

        // Represents an array of HTTP headers
        struct header_list {
            char** headers;
            int length;
        };

        // Returns an array of headers parsed from a request
        struct header_list* get_headers(char* request);

        // Free the header array
        void free_headers(struct header_list* headers);

        // Given a header key, find the value
        char* get_header_value(struct header_list* headers, char* header);

        // Represents a parsed HTTP request
        struct parsed_request {
            char* path;
            char* method;
            char* body;
            struct header_list* headers;
        };

        // Returns a mime type given a file path
        const char* get_content_type(const char* path);

        // Functions that send a generic response before dropping the client
        void send_400(struct client_info* client);
        void send_404(struct client_info* client);
        void send_501(struct client_info* client);

        // A function that directly sends a response to a client
        void send_uncached(struct client_info* client, char* response, int length);
    

That should cover all of the miscellaneous code we need for this.

        int https_server = create_socket("0.0.0.0", "443");
        int http_server  = create_socket("0.0.0.0", "80");

        responseCache = hash_table_init(0);
    

We initialize three things: the HTTPS socket on port 443, the HTTP server on port 80, and the response cache hash table. The hash table is a global variable so it can be accessed from all of the other functions.

        while (1) {
            fd_set reads;
            reads = wait_on_clients(https_server, http_server);

            ...
        }
    

In the main loop, we start off by declaring a file descriptor set that we will use to store the sockets that are ready to read, and then call a custom function to get those sockets.

        fd_set wait_on_clients(int https, int http) {
            fd_set reads;
            FD_ZERO(&reads);
            FD_SET(http,  &reads);
            FD_SET(https, &reads);
            int max_socket = http;
            if (https > max_socket) max_socket = https;


            struct client_info* ci = clients;
            while (ci) {
                FD_SET(ci->socket, &reads);
                if (ci->socket > max_socket) max_socket = ci->socket;
                ci = ci->next;
            }
            ...
        }
    

We zero the set, add the http and https sockets to the set, but then we declare a max socket. This is needed in select(2) for reasons I don't remember, but it will stop at this socket so it needs to be accurate. We also add all of the client sockets to the set, and check if max_socket needs to be updated.

select(2) can only monitor up to 1024 file descriptors. This isn't a super big problem here, but it's bad practice. You should use poll(2).
        struct timeval timeout_struct;
        timeout_struct.tv_sec = CLIENT_TIMEOUT; // Timeout after CLIENT_TIMEOUT seconds

        if (select(max_socket+1, &reads, 0, 0, &timeout_struct) < 0) {
            fprintf(stderr, "select() failed. %s (%d)\n", strerror(errno), errno);
            exit(1);
        }

        return reads;
    

We can call select(2), passing the maximum socket we want checked as well as the reads set. This function will block until a socket in the set is ready or after the timeout in the last argument, and since we only passed a socket for the readfds set, it will only release when the sockets are read to be read from. The set will be modified to contain only the sets that are ready.

        if (FD_ISSET(https_server, &reads)) {
            struct client_info* client = get_client(-1);

            client->socket = accept(https_server,
                (struct sockaddr*) &(client->address),
                &client->address_length
            );

            client->ssl = SSL_new(ctx);

            SSL_set_fd(client->ssl, client->socket);
            if (SSL_accept(client->ssl) <= 0) {
                fprintf(stderr, "SSL_accept() failed\n");
                ERR_print_errors_fp(stderr);


                drop_client(client);
            }
            else {
                client->tls = 1;
            }


        }
        ...
    

Jumping back to main, we check if the HTTPS server is in the set of sockets ready to read with FD_ISSET(3). If it is, we accept whatever new socket is waiting for us and store the data in a client struct.

Since this is the HTTPS server, it will be used for SSL stuff. We initialize an SSL structure with SSL_new(3), attach it to the client socket with SSL_set_fd(3), and finally accept the SSL connection with SSL_accept(3).

If all goes well, we enable the TLS flag on the client. Otherwise we drop the client, which is fine because the fallback is a plain HTTP request.

        if (FD_ISSET(http_server, &reads)) {
            struct client_info* client = get_client(-1);
            client->tls = 0;

            client->socket = accept(http_server,
                (struct sockaddr*) &(client->address),
                &client->address_length
            );
        }

    

If the HTTP server has any new sockets to accept, we just make a new client. Not too difficult. We do set the TLS flag to 0 to begin with just to ensure we don't have any problems.

        time_t current;
        time(¤t);

        while (client) {

            struct client_info* next = client->next;
            if (FD_ISSET(client->socket, &reads)) {

                if (MAX_REQUEST_SIZE == client->received) {
                    send_400(client);
                    continue;
                }

                int r;
                if (client->tls) r = SSL_read(client->ssl, client->request + client->received, MAX_REQUEST_SIZE-client->received);
                else r = recv(client->socket, client->request + client->received, MAX_REQUEST_SIZE-client->received, 0);

                ...


            }
            else {
                if (current - client->last_packet > CLIENT_TIMEOUT) {
                    drop_client(client);
                }
            }
            client = next;
        }
    

This code isn't very complicated, but I'm segmenting it for readability. We traverse the linked list, and for every client we check if it has data to read. If it does, we check if the request exceeds the max size to prevent overflow. Then we either read the data through an OpenSSL wrapper(it will handle decryption for us) or the generic recv(3) function.

The output gets stored in the request field of the client, and we make sure to not exceed the maximum buffer size(which would be bad). Since TCP doesn't guarantee we will get the full packet in one request we need to store the amount of data received so we can continuously stream the request.

If the client doesn't have new data we check how long it's been since it last sent a packet. If it's more than the constant CLIENT_TIMEOUT, we close the connection.

        ...

        if (r < 1) {
            printf("Unexpected disconnect from %s\n", get_client_address(client));
            drop_client(client);
        }
        else {
            client->received += r;
            client->request[client->received] = 0;
            char* q = strstr(client->request, "\r\n\r\n");
            client->last_packet = current;
            if (q) {
                handle_request(client, client->request, SERVE_PATH);
            }

        }
    

An amount of received bytes being less than 1 indicates an error, but otherwise update the amount of data received. We then add a null terminator so we can search for the HTTP request terminating string, and if it's present we handle the request.

        void handle_request(struct client_info* client, char* request, char* serve) {
            struct parsed_request* parsed = parse_request(client, request);
            if (!parsed) return;

            ...
        }
    

To start off, we need to parse the request. Very fun.

        struct parsed_request* parse_request(struct client_info* client, char* http_request) {
            struct parsed_request* parsed = (struct parsed_request*) calloc(1, sizeof(struct parsed_request));

            char* original = http_request;
            int space = strcspn(http_request, " \r\n"); // get the number of characters to the next space or newline
            // we check for newline as well to make sure we don't overflow to the next line
            if (space == strlen(http_request)) { // if there isn't any, malformed request
                send_400(client);
                free(parsed);
                return 0;
            }
            int nline = strcspn(http_request, "\r\n");
            if (nline == space) { // if we made it to the end of the line without seeing a space then it's malformed
                send_400(client);
                free(parsed);
                return 0;
            }
            char* method = malloc(space + 1);
            method[space] = '\0'; // terminate method string
            strncpy(method, http_request, space);

            ...
        }
    

The HTTP specification requires 3 things on the first line of a request: the method, the resource identifier (the path to request), and the protocol. These are required to be seperated by a single space, with the only new line coming after all of the required elements.

We store a pointer to the original string for later use, then get the number of characters to the next space or newline. If this is equal to the length of the request, it means there aren't any (or it's malformed to begin with). This is used to make sure the line has space characters, and we look for newline characters as well to prevent overflow to the next line. We also get the number of characters to the next newline, and if this is equal to the number of characters to the space or newline we know that there aren't any spaces.

Following all that verification, we can allocate some memory for the method string and then write it.

        ...

        http_request += space+1;
        space = strcspn(http_request, " \r\n"); // get the number of characters to the next space
        nline = strstr(http_request, "\r\n")-http_request; // number of characters to the first newline
        if (space == strlen(http_request)) { // if there isn't one, it's missing the path
            send_400(client);
            free(method);
            free(parsed);
            return 0;
        }
        if (nline == space) { // same as before, still on the same line
            send_400(client);
            free(method);
            free(parsed);
            return 0;
        }
        char* path = malloc(space + 1);
        strncpy(path, http_request, space);
        path[space] = '\0'; // terminate path string

        ...
    

We jump past the last space and do the same thing, except now grabbing the request path. All of this is about the same, but note how we get the newline position by taking the pointer to the next newline and subtracing the pointer to the start of the string, yielding the number of characters between the two pointers.

        ...
        
        http_request += space;
        char* next = strstr(http_request, "\r\n"); // get the next line in the request
        // can't use nline for this becaus it could be \r or \n
        if (next) http_request = next+2;
        if (http_request[0] == '\r' || http_request[0] == '\n' || http_request[0] == ' ' || http_request[0] == '\0') {
            parsed->headers = 0;
        }
        else {
            parsed->headers = get_headers(http_request);
        }

        parsed->body = strstr(http_request,"\r\n\r\n")+4;
        parsed->method = method;
        parsed->path = path;

        return parsed;
    

We once again jump past the space character, and now look for the next newline. We don't care about the HTTP protocol version (granted, we are assuming it's an HTTP request). We don't reuse the nline variable because we are looking for the string \r\n specifically.

If the character after the newline is whitespace or null then there are no headers, and we can just ignore that field of the request, otherwise we parse the headers (discussed below). We then assign the fields and return the parsed request.

        struct header_list* get_headers(char* request) {
            #define MAX_HEADERS 67

            char** headers = calloc(MAX_HEADERS, sizeof(char*));
            int i = 0;
            while (i < MAX_HEADERS) {
                char* next = strstr(request, "\r\n");
                if (next == request) { 
                    struct header_list* retobj = calloc(1, sizeof(struct header_list));
                    retobj->headers = headers;
                    retobj->length = i;
                    return retobj; // next newline is right after the current pointer, that's the start of the body
                }

                char* temp = malloc(1 + next-request);
                temp[next-request] = '\0'; // terminate string

                strncpy(temp, request, next-request);
                headers[i] = temp;
                request = next+2;
                i++;
            }
        }
    

The code for parsing headers is pretty simple. It allocates an arbitrary maximum number of headers, then it finds every newline and adds the string between the current position and the newline to the header list. It knows to stop when the next newline is on top of the current position, because that indicates that it reached two newlines in a row.

Now that we've talked about parsing the request, we can jump back to the handling of requests.

        void handle_request(struct client_info* client, char* request, char* serve) { // request is null terminated
            printf("Request %s\n", request);
            struct parsed_request* parsed = parse_request(client, request);
            if (!parsed) return;

            if (strncmp("/", parsed->path, 1)) { // not a path
                send_400(client);
            }
            else {
                ... (everything else)
            }

            
            free(parsed->path);
            free(parsed->method);
            free_headers(parsed->headers);
            free(parsed);
        
        }
    

That else {} is hiding quite a lot, but for the moment we'll just talk about this short snippet. The sole requirement for a path to be considered is that it starts with a slash, and if it doesn't we send a 400. We also free the parsed request after handling it.

        ...
        else {
            if (!strcasecmp(parsed->method, "GET") || !strcasecmp(parsed->method, "HEAD")) {
                if (strchr(parsed->path, ' ')) {
                    send_400(client);
                    return;
                }

                if (!serve_directory(serve, client, parsed->path, parsed)) {
                    send_404(client);
                }
            }
            else if (!strcasecmp(parsed->method, "OPTIONS")) {
                ...
            }
        }
        ...
    

Once again, I'm hiding that else {} block to keep things contained. The HTTP specification requires servers to provide the HEAD method for resources, so we accept that as well as GET. We don't allow spaces in the path, and in the event that one somehow got in we check for that and deny it if needed. The magical serve_directory() function will be shown later, but for now all you need to know is that it returns 0 if no response was send, and 1 if it was.

        else if (!strcasecmp(parsed->method, "OPTIONS")) {
            if (!strcmp(parsed->path, "*")) {
                char wildcard_options[2048];
                sprintf(wildcard_options,
                    "HTTP/1.1 200 OK\r\n"
                    "Allow: OPTIONS, GET, HEAD\r\n"
                    "Cache_Control: max-age=%i\r\n"
                    "Content-Length: 0\r\n\r\n",
                    CACHE_LIFETIME
                );

                send_uncached(client, wildcard_options, strlen(wildcard_options));
            }
            else {
                char* filepath = sanitize_file_path(serve, parsed->path, client);
                if (access(filepath, F_OK) == 0) { // Check if the file path exists
                    char response[2048];
                    sprintf(response,
                        "HTTP/1.1 200 OK\r\n"
                        "Allow: OPTIONS, GET, HEAD\r\n"
                        "Cache_Control: max-age=%i\r\n"
                        "Content-Length: 0\r\n\r\n",
                        CACHE_LIFETIME
                    );
                    send_uncached(client, response, strlen(response));
                }
                else {
                    char response[2048];
                    sprintf(response,
                        "HTTP/1.1 404 Not Found\r\n"
                        "Allow: OPTIONS, GET, HEAD\r\n"
                        "Cache_Control: max-age=%i\r\n"
                        "Content-Length: 0\r\n\r\n",
                        CACHE_LIFETIME
                    );
                    send_uncached(client, response, strlen(response));
                }

                free(filepath);
            }
        }
        else {
            // we are only looking at GET, HEAD, and OPTIONS requests right now
            send_501(client);
        }
    

The HTTP specification also requires OPTIONS to be accepted, though in my (albeit minimal) testing I didn't see any browsers using it. There are three branches here:

The parsed path is a wildcard: "*". This requires the server to send general info about itself. I couldn't find a whole lot of information about what this was supposed to be, so we provide the cache lifetime (specified here in the CACHE_LIFETIME constant) as well as the methods that the server can handle. Note that we also sent a Content-Length of 0, as this is required by the HTTP specification.

The parsed path isn't a wildcard. This means the client wants information about a specific resource, so the server can send the status as well as accepted methods. We use a magic path sanitization function that you can see if you behave. If the file exists, we send a 200 status.

If the file doesn't exist, we send a 404 status. That seems to be the only thing that really matters here, so that's the only thing that's different between the two. Since sanitize_file_path() returns an allocated piece of memory we need to free it.

Oh, and we also send a 501 for everything else.

        char* sanitize_file_path(char* directory, char* path, struct client_info* client) {
            int shouldfree = 0;

            if (strcmp(path, "/") == 0) path = "/index.html";
            if (strrchr(path, '.') <= strrchr(path, '/')) {
                char* temppath = malloc(strlen(path) + strlen(".html") + 1);
                strcpy(temppath, path);
                strcat(temppath, ".html");
                path = temppath;
                shouldfree = 1;
            }
            int pathlen = strlen(path);
            if (path[pathlen-1] == '/') {
                path[pathlen-1] = '\0';
            }
            if (strlen(path) > 100) { // too long, ignore
                if (shouldfree) free(path);
                send_400(client);
                return 0;
            }
            if (strstr(path, "..")) { // cringe path traversal attempt
                if (shouldfree) free(path);
                send_400(client);
                return 0;
            }

            char* full_path = malloc(128);
            sprintf(full_path, "%s%s", directory, path);
            if (shouldfree) free(path); // we don't need to know the path anymore

            return full_path;
        }
    

This function has a lot of steps and is probably susceptible to all sorts of tomfoolery, but it works well enough. Silver star to the first person who reads outside of the served directory.

/ gets corrected to /index.html for convenience, if a path ends with a / we get rid of it, if there's no . at the end of the file path we add a .html, if the path is longer than 100 characters we ignore it, and if the path contains .. we ignore it. After all that we write it to an allocated string and return it.


Earlier I mentioned a serve_directory() function that handles GET and HEAD requests. That's what we're going to be going over now.

        int serve_directory(char* directory, struct client_info* client, char* path, struct parsed_request* request) {
            char* full_path = sanitize_file_path(directory, path, client);
            if (!full_path) return 1;

            FILE* fp = fopen(full_path, "rb");

            if (!fp) {
                printf("Not found: %s\n", path);
                free(full_path);
                return 0;
            }

            char* key = generate_hash_key(request);

            // if there is a cached response younger than CACHE_LIFETIME, send that
            // otherwise go through the normal stuff

            struct cached_response* cached = hash_get(responseCache, key, strlen(key));
            ...
        }
    

We first sanitize the file path into a full one, returning if it's invalid. Otherwise we open the file to prepare reading, and generate a hash key for the request. The generate_hash_key() function isn't really important, but it generates a unique and deterministic key depending on the resource and any transformative headers, such as deflation. We then use that key to get the cached response, which will be 0 if it hasn't been cached yet.

if (!cached) { // initialize a cache entry to be updated by the child thread // this is done in the main thread to avoid problems with resizing memory on child threads // note: i don't think resizing the memory on the main thread will cause problems // because the memory should be preserved until all child threads are killed // but i'm not completely sure struct cached_response* entry = calloc_shm(1, sizeof(struct cached_response)); entry->last_update = 0; entry->length = 0; entry->alloc_size = CACHE_REQUEST_ALLOC; entry->header_length = 0; entry->response = (char*) malloc_shm(CACHE_REQUEST_ALLOC); hash_store(responseCache, key, strlen(key), (void*) entry); cached = entry; // make it easier for the child thread to edit the entry } else { if (cached->length > cached->alloc_size) { munmap(cached->response, cached->alloc_size); cached->response = (char*) malloc_shm(cached->length); cached->alloc_size = cached->length; } } free(key);

If there isn't a cache entry we allocate an entry in shared memory (so the child threads can access it), initialize it, and allocate a given amount of memory for the response before storing the new cache entry.

The CACHE_REQUEST_ALLOC part is a bit odd. Since the child thread is the one that will be filling the cache entry, the main thread doesn't know how much to allocate for the response. It allocates an arbitrary amount, and the child thread will update the actual length of the request and only store the request if it has enough space. If the same request gets hit again the main thread will ensure the allocated memory is sufficient, and if it isn't it will reallocate the correct amount needed.

While this isn't ideal, it does work. Note that we also free the key because we don't need it anymore.

        if (!fork()) {
            ...
        }
        else {
            free(full_path);
            drop_client(client);
            return 1;
        }
    
This bit is why this program is for Linux/whatever else. Forking is super weird on Windows and I don't want to deal with it.

For the parent, fork(2) returns the process id of the child, and for the child it returns 0. I'm hiding the child branch for now because it's a lot, but the parent thread frees the path because it's no longer in use and then drops the client. Dropping the client is important because the socket will be preserved even if the child closes it (all holders of the file descriptor must close the socket) and older browsers won't automatically close the connection.

        time_t timer;
        time(&timer);
        char* out = 0;

        if (timer - cached->last_update >= CACHE_LIFETIME) {
            ...
        }

        int send_head = !strcasecmp(request->method, "HEAD");
        int len = (int) (send_head ? cached->header_length : cached->length);

        if (client->tls) {
            SSL_write(client->ssl, out ? out : cached->response, len);
        }
        else {
            send(client->socket, out ? out : cached->response, len, 0);
        }
        if (out) free(out);

        fclose(fp);
        drop_client(client);
        free_headers(request->headers);
        free(request->path);
        free(request->method);
        free(request);
        free(full_path);
        exit(0);
    

I'm once again hiding the bulk of the code here, this time it's the part that handles updating the cache. If the request method is HEAD we only send the headers and the status to the client, which we already know the length of. We either send the full response or only the first part depending on what is needed, and either send it through SSL or the raw socket. We free a whole lot of stuff to prevent memory leaks (though the thread will be exited anyways) and drop the client. As mentioned before, if the child doesn't close the socket it will remain open.

        if (timer - cached->last_update >= CACHE_LIFETIME) {
            fseek(fp, 0L, SEEK_END);
            size_t content_length = ftell(fp);
            rewind(fp);
            const char* content_type = get_content_type(full_path);
            char* filedata = malloc(content_length + 1);

            fread(filedata, content_length, 1, fp);
            char* accepted_encoding = get_header_value(request->headers, "Accept-Encoding");
            char* encoding = 0;

            if (accepted_encoding) {
                if (strstr(accepted_encoding, "gzip")) {
                    char* compressed = zlibGzip(filedata, content_length, &content_length);
                    free(filedata);
                    filedata = compressed;
                    encoding = "gzip";
                }
            }
            

            ...
        }
    

This is the cache updating code, which is run if the cache is expired. Since the main thread initializes new caches' update times to 0, this will also be run for those.

We use fseek(3) to jump the file pointer to the end of the file, read the position of the pointer to get the file size, then jump back to the beginning to start reading. We read the file into a perfectly allocated buffer, then get the Accept-Encoding header to determine how we can compress the response.

Currently the only accepted encoding is gzip, which uses a wrapper function that returns the compressed string and stores the length in a given variable. Since we don't need the file anymore we free it and store the current encoding for later.

        if (...) {
            ...

            #define BSIZE 2048
            char headers[BSIZE] = "HTTP/1.1 200 OK\r\n"
                                "Connection: close\r\n";

            if (encoding) sprintf(headers+strlen(headers), "Content-Encoding: %s\r\n", encoding); // write the encoding header if it was encoded
            sprintf(headers+strlen(headers), "Content-Length: %lu\r\nContent-Type: %s\r\n\r\n", content_length, content_type);

            cached->header_length = strlen(headers);

            char* buffer = malloc(strlen(headers)+content_length+1);
            strcpy(buffer, headers);
            
            int length = strlen(buffer);
            for (int i = 0; i < content_length; i++) {
                buffer[length+i] = (unsigned char)filedata[i];
            }
            buffer[length+content_length] = '\0';

            if (cached->alloc_size >= length+content_length+1) {
                memcpy(cached->response, buffer, length+content_length+1);
                time(&cached->last_update);
                free(buffer);
            }
            else {
                out = buffer;
            }

            cached->length = length+content_length+1;
        }
    

We use a fixed size string for the headers, and if a content encoding was used we add that to the response. Then we add the Content-Length and Content-Type headers and copy it into one final perfectly sized buffer. We then copy the file data into the buffer and null terminate it.

Finally, we check if the amount of memory allocated for the response cache earlier was enough, and if it was we store it and the update time, otherwise we use a temporary variable, out, to store the response.



And that's it! Only around 900 lines of poorly formatted code and bad decisions, and we have an HTTP/1.1 compliant web server, complete with caching, compression and TLS!

Notes

This is most likely full of holes. Don't use this for anything important, and I'll be running it in a container just in case. The sanitize_file_path() function alone has my brain turning just enough to be concerning, and there's probably dozens of issues that I'm not seeing. Like I mentioned before, silver star to anyone who figures out how to read outside of the served directory. Gold star to anyone who can manipulate the star or get remote code execution.

Older browsers are picky about encoding, gzip/deflate specifically. Gzip includes some magic headers that modern browsers can do without, but older ones like Internet Explorer 6 need the headers spoon fed to them. You can't just run a response body through zlib's deflate(), you also have to use the correct arguments in deflateInit2(). You know what, since you were so well behaved and sat through my entire yap I'll show you my wrapper function:

        #define CHUNK 16384

        char* zlibGzip(char* input, unsigned int size, size_t* outSize) {
            // based on https://zlib.net/zpipe.c
            z_stream stream;
            stream.zalloc = Z_NULL;
            stream.zfree  = Z_NULL;
            stream.opaque = Z_NULL;
            int ret = deflateInit2(&stream, Z_DEFAULT_COMPRESSION, Z_DEFLATED, 15 + 16, 8, Z_DEFAULT_STRATEGY);

            if (ret != Z_OK) {
                fprintf(stderr, "Deflate init error\n");
            }

            size_t max = compressBound(size);
            unsigned char* out = (unsigned char*)malloc(max);

            stream.next_in = (unsigned char*)input;
            stream.avail_in = size;
            stream.next_out = out;
            stream.avail_out = max;

            ret = deflate(&stream, Z_FINISH);

            *outSize = stream.total_out;
            deflateEnd(&stream);

            return (char*)out;
        }
    

That's it. If you really want to use this or read the full source, you can find it here.