Friday 25 November 2011

Apache Web Server



1. Introduction to Apache

The World Wide Web (WWW) is the Internet’s most successful application, and its most prominent component is a web server. The web server serves the user’s request by returning the requested web page to the user. Two applications are required in order to process such requests: a web server, and a web client. A protocol known as the Hyper Text Transfer Protocol (HTTP or http) is required for communication between a client and a server, and between a web client and a web server.

According to Netcraft’s monthly secure server surveys available at http://news.netcraft.com/, the Apache web server currently has 68.01% of the market share as compared to its competitors, Microsoft at 20.56%, and Sun Microsystems at 2.47%.

The Apache HTTP web server is a part of the Apache Software Foundation, which supports other open source projects as well, including Ant, SpamAssasin, Struts, and Tomcat, etc. The current version of the Apache web server, which is being used for the purposes of this tutorial, is version 2.2.0. It can be downloaded from its official website at http://httpd.apache.org/download.cgi.

2. Installation

Apache is usually pre-installed in most Linux distributions. Use the rpm -qa |grep httpd command to confirm whether it is installed or not. If Apache has been installed from the source code, the command mentioned above will not produce any result. In this case, try locating the httpd/apache/apache2 directories. If these directories exist on your system, it means that Apache has already been installed on it.

Apache can also be installed manually as well, by downloading either the rpm or the source code. This tutorial will demonstrate both methods.

2.1. Installing from the rpm

  1. Download Apache’s latest version from http://httpd.apache.org/download.cgi.        

# wget   \
        
2.   If you have already installed a previous version of Apache:

·         From the rpm: Uninstall it, using the command:                                                                                                  

# rpm -e httpd

·         From the source installation: Install the new rpm on a path that is different from the path of the source installation. Apache’s rpm can be installed by the following command:

# rpm -ivh httpd-2.2.0-1.i386.rpm                                       

If you get any dependency errors regarding the Apache Portable Runtime (APR or apr) packages, upgrade it to the version compatible with the current version, httpd-2.2.0-1. This is the apr-1.2.2-1, which can be downloaded from http://apr.apache.org/.

3.   Verify the installation by running:

      # rpmm -q httpd                                  

Browse to the "/etc/httpd" path.

2.2. Installing from the source

A number of options can be used to configure Apache. Customized installation will be discussed in the “References” section.

  • Download Apache from http://httpd.apache.org/download.cgi                

# wget http://apache.mirror99.com/httpd/httpd-2.2.0.tar.gz

  • Create an Apache directory in "/usr/local". This path is optional, and is being used for the purposes of this tutorial only.

Unpack the distribution:                        

#  tar zxvf httpd-2.2.0.tar.gz -C /usr/local                                 
# cd /usr/local/httpd-2.2.0/
# cd apache2

  • Run configure with the following options:

# ./configure --with-layout=Apache --prefix=/usr/local/apache2 \--enable-module=most--enable-mods-shared=most

  • Run make to compile the distribution:

# make

  • Install Apache by running the following command:

# make install

3. Apache Configuration

If you are using pre-installed Apache that comes with the distribution, then it is probably installed in /etc/httpd. If you have built it from the source, and followed the procedure mentioned in the previous section, then the path is /usr/local/apache2. In order to refer to this default installation path (“/etc/httpd" or "/usr/local/apache2") $APACHE_HOME will be used for the purposes of this tutorial only.

Apache runs as a daemon in the background, on which the server handles requests continuously. Port 80 is specified by default in the Apache configuration file, httpd.conf. Running Apache on port 80 requires root privileges, and can be run via the following command:

# $APACHE_HOME/bin/apachectl start                                    

[If a pre-installed version of Apache is being used, then the bin might not be under the $APACHE_HOME directory]

Other useful commands include:

# $APACHE_HOME/bin/apachectl stop                                    
# $APACHE_HOME/bin/apachectl restart
# $APACHE_HOME/bin/apachectl status

A start-up script, httpd can also be used to start, stop, or restart the Apache web server:

            # /etc/init.d/httpd start

Apache reads a special file at start-up, httpd.conf, which contains configuration-specific information. This is the main configuration file, and its location can be configured either at the time of compilation, or it can be specified by passing the -f option, $apachectl -f /path/to/config/file.

This configuration file is divided into three sections:

  • Global Environment: This section defines configuration parameters for the Apache server process e.g. the path to the Apache configuration directory; the Apache pid file, and the path to other configuration files, etc.

  • Main Server Configuration: Apache can be configured to host multiple websites on a single host, and each website can be handled by defining a virtual host entry. The main server configuration specifies the default settings for the Apache server which are not handled by virtual hosts.

  • Virtual Host: This section defines settings for virtual hosts that are either IP-based, or name-based.

The configuration file is configured by placing directives. Most directives have a global scope that applies to the entire server, but this can be changed by placing the directives in some special directives, such as <Directory>, <DirectoryMatch>, <Files>, and <Location>, etc.

3.1. Running Apache

In order to test whether the web server configuration file is syntactically correct or not, run the command:

            # apachectl configtest

The output will display "Syntax OK" if everything is correct.

The Apache configuration file, httpd.conf, specifies the web server listening port; it is 80 by default. If it is not, change the port to 80, restart the Apache web server, and browse to "http://localhost". If the configuration is correct, the browser will display, "Test Page".

Note: In Fedora Core 3, a special package, "SE Linux", can create problems in Apache’s configuration. Ensure that it is disabled before testing the configuration, and then restart Apache.

4. Basics of Apache Configuration

Some common configuration tasks include server-wide configuration, site-specific configuration, virtual hosting, logging, access control and authentication.

4.1. Server-wide configuration

Basic server configuration specifies the following:
           
Server Name: This specifies the server name and the port which is used by the server to identify itself. This is useful for the purposes of redirection e.g. when the machine’s name is xyz.osrc.org.pk, but it has the DNS entry for www.osrc.org.pk, and you want to identify the machine as the latter, then the "ServerName" can be used as given below:
           
ServerName www.osrc.org.pk:80                                                         
           
Specify the server name in order to prevent any problem at start-up. This directive can also be used in the virtual host section.
                       
Listening Port: This specifies the port number or IP, and the port number on which the web server will listen for incoming requests. If only the port is specified, then the server will listen on the given port number on all IP interfaces, otherwise it will listen to the specified IP and port number only:

Listen 80
[Listens on port 80 and all available interfaces]

Listen 12.34.56.78:80
[Listens on port 80 and the IP 12.34.54.78 only]

4.2. Site-specific configuration

Document Root: The default web folder for Apache is /var/www/html where you can publish HTML documents. This can be changed by using the DocumentRoot directive. This directive can also be used in the virtual host section:
           
DocumentRoot /var/www/html

Directory Index: If the requested URL specifies a directory, this option specifies the resources to look for e.g. http://www.xyz.com/downloads/ where / specifies that "downloads" is a directory. The resources can be, for instance, index.html index.php, etc. It is important to note that the order matters, and that the first available resource will always be returned:
                       
DirectoryIndex index.html idnex.php index.txt

The above configuration tells Apache to look for the index.html file in the "downloads" directory. If there is no index.html, look for index.php, and then index.txt. If none of these resources can be found, then the behavior depends upon whether the Options directive is set or not with the Indexes options. This directive can also be used in virtual host section.
           
Options Indexes: If this option is set for a directory, and the requested URL maps to a directory e.g. http://www.xyz.com/downloads/, and no DirectoryIndex is set, or the resource specified in the DirectoryIndex cannot be found, then this option will create a default formatted listing for the requested directory:

<Directory "/var/www/html">
Options Indexes
</Directory>

This configuration will set the auto index generation for the directory "html" and its sub-directories. This directive can also be used in the virtual host section.

4.3. Virtual Hosts

Virtual hosting allows running more than one website on a single machine. Apache usually allows running only one website on a single machine. In order to run multiple websites, you can either use multiple Apache daemons, with each daemon handling a specific website, or configure Apache for virtual hosting. Running multiple daemons is an inefficient practice, and should, therefore, be avoided. Virtual hosts can be:

4.3.1. IP-Based

This allows running multiple websites, each with a different IP, on a single machine. This can be achieved by hosts that have multiple network connections, or by virtual interfaces. A multi-homed machine, for example, can have two network cards with IPs 192.168.2.58 and 10.10.10.100. You can configure a website http://www.xyz.com/accounts on 192.168.2.178 and http://www.xyz.com/hr on 10.10.10.100.

The following is a sample configuration of IP-based virtual hosts. The hostnames will be resolved to their respective IP addresses.

<VirtualHost www.example1.com>
        DocumentRoot /var/www/html/example1
</VirtualHost>

<VirtualHost www.example2.com>
        DocumentRoot /var/www/html/example2
</VirtualHost>

Ensure that the entry NameVirtualHost in the main section is commented out. The above configuration specifies that when a request is made from the client to http://www.example1.com then first resolve the hostname, which returns to 192.168.2.58. This returns the contents in the directory specified by DocumentRoot.

A similar operation can be performed by Apache for http://www.example2.com, where the IP address is 10.10.10.100. These hostnames, and their corresponding IP addresses, should be specified in the "/etc/hosts" file in the web server machine, in addition to creating entries in the DNS server. Otherwise, the client will need to specify http://www.example1.com/example1 instead of just www.example1.com. 

The above-mentioned configuration requires DNS name resolution, which will obviously slow down the entire process. Please refer to http://httpd.apache.org/docs/2.2/dns-caveats.html for more information. The recommended practice is to specify IP address instead of the hostname in the virtual host section.

<VirtualHost 192.168.2.58>
        DocumentRoot /var/www/html/example1
        ServerName www.example1.com
</VirtualHost>

<VirtualHost 10.10.10.100>
        DocumentRoot /var/www/html/example2
        ServerName www.example2.com
</VirtualHost>

You need an additional directive, ServerName, so that the requests for example1 or example2 can be mapped. If no ServerName is specified, then Apache will try the reverse DNS in order to look up the hostname.

4.3.2. Name-Based

Name-based virtual hosts allow multiple websites on a single IP address. This is in contrast to IP-based virtual hosts, where you need an IP address for each website. IP-based virtual hosts rely explicitly on IP addresses to determine the correct virtual host to the server. Name-based virtual hosts rely on the client to specify the hostname in the HTTP headers. Name-based virtual hosts are easy to configure, and do not require multiple IP addresses, and can, therefore, work in situations in which you are short of IPs. Prefer name-based virtual hosting over IP-based virtual hosting unless you have very specific reasons for doing otherwise. The following is a sample configuration for name-based virtual hosts:

NameVirtualHost 192.168.2.58:80

<VirtualHost 192.168.2.58:80>
        DocumentRoot /var/www/html/example1
        ServerName www.example1.com
</VirtualHost>

<VirtualHost 192.168.2.58:80>
        DocumentRoot /var/www/html/example2
        ServerName www.example2.com
</VirtualHost>

The directive NameVirtualHost specifies that IP 192.168.2.58 must listen on this specific IP for incoming requests. Normally, you can use * here, but in cases which require mixed types of settings, i.e. a host that supports both IP-based and name-based virtual hosts, you need to specify which IP address you want to configure for name-based virtual hosting. If you are planning to use multiple ports, such as SSL, for example, then specify the port here. The argument given in NameVirtualHost must match with the virtual host section for name-based virtual hosts:

NameVirtualHost *

<VirtualHost *>
        DocumentRoot /var/www/html/example1
        ServerName www.example1.com
</VirtualHost>

<VirtualHost *>
        DocumentRoot /var/www/html/example2
        ServerName www.example2.com
</VirtualHost>

4.4. Authentication, Authorization and Access Control

Authentication refers to the verification of the identity of the requesting host and/or user i.e. the user/host is actually who/what they claim to be.

Authorization is the process of granting someone access to the areas to which the user is allowed to go.

Access control is also authorization, but it provides authorization at another layer i.e. based on an IP address, hostname or the characteristic of the request.

Make sure that the requisite modules are installed and loaded in Apache beforehand. Please refer to http://httpd.apache.org/docs/2.2/howto/auth.html and http://httpd.apache.org/docs/2.2/howto/access.html for the list.

In order to implement such security mechanisms, you first need to understand the Apache directory’s structure, and its configuration. Apache is normally configured using the main httpd.conf file, where the configuration parameters are applicable to all the published web folders. Sometimes you need to customize configuration based on specific directories, URLs, files, hosts, or locations. You might, for example, want to restrict a particular section of the website to a few users, in which case Apache provides two options: either use <Directory> </Directory> in the main configuration file httpd.conf, or use the .htaccess special file by placing it in that directory. Conceptually, there is no difference in either of the above-mentioned methods, as both have the same syntax and applicability. The difference between a directory, a file, and locations is as follows:

<Directory /var/www/html/test>
            Order allow,deny
            Deny from all
</Directory>

This means denying access to the directory test and all its sub-directories. So, access to the URL http://www.test.com pointing to the directory /var/www/html/test is denied. Access to the URL http://www.test.com/public pointing to the directory /var/www/html/all is allowed.

<File private.html>
            Order allow,deny
            Deny from all
</File>

This means that access to the file private.html located anywhere is denied.

<Location /private>
            Order allow,deny
            Deny from all
</Location>

This means that access to any URL containing private is denied. Access to http://www.test.com/private/public is not allowed, whereas access to http://www.test.com/public is allowed.

The .htaccess method is easy to configure. Place the contents of the .httaccess file in <Directory> </Directory> in the main configuration file.

The name of the .htaccess file can be changed by using the AccessFileName directive in the main configuration file. Configure Apache to allow such configuration files for directories. This can be done by using AllowOverride AuthConfig in <Directory> </Directory>. If you want a special directory, /var/www/html/public/restricted to be restricted, for example, you must allow the use of the .htaccess file. Place the following configuration in Apache’s main configuration file:

<Directory /var/www/html/public/restricted>
AllowOverride AuthConfig
</Directory>

Define the users who are granted access to the restricted area. These users, and their passwords, will be defined in a special file, which should be placed somewhere which is inaccessible to the web. The file can be created with a special utility htpasswd that comes with Apache:

# htpasswd -c /etc/httpd/conf/passwd user1
New password:
Re-type new password:
Adding password for user user1

Create the .htpasswd file in /var/www/html/public/restricted from where the Apache server will read the configuration about the password file and users in order to allow them access to the restricted area:
           
.htaccess
----------------------------------------------
AuthType Basic
AuthName "Restricted Files"
# Optional line: AuthBasicProvider file
AuthUserFile /usr/local/apache/passwd/passwords
AuthUserFile /etc/httpd/conf/passwd
Require user user1
----------------------------------------------
AuthType specifies the type of authentication, and Basic is unencrypted. AuthName specifies the realm which is used as a temporary session identifier. AuthUserFile specifies the path of the password file, and Require user specifies the user to whom access must be granted. Sometimes access needs to be granted to more than one user. This can be achieved by using the Require valid-user, which will allow access to the restricted area to anyone listed in the password file. Please see the “References” section for more advanced techniques regarding configuring authentication/authorization, using groups, and databases.

Now consider restricting access based on hostnames, IP addresses, or the characteristic of the request. Please refer to http://httpd.apache.org/docs/2.2/howto/access.html for a list of modules that require installing and loading in this regard.

In order to customize access based on hosts/IPs, use Allow and Deny directives. The Order directive can also be used to specify the order in which the filters should be applied. The syntax is:

Allow from HOST
Deny from HOST
Order Allow,Deny
Order Deny, Allow

Consider the examples given below:

  1. Allow from 192.168.2.100                      [Allow from this host only]
  2. Allow from 192.168.2.0/24                     [Allow from this network 192.168.2 only]
  3. Allow from 192.168.2.100 192.168.2.200            [Allow from these hosts only]
  4. Allow from my.host.com

Order specifies the order of the filters, which can be:

Deny,Allow: First Deny, and then the Allow directive is evaluated. Access is allowed by the default meaning that any client that matches neither the Deny nor the Access directive will be allowed to access the server.

Allow,Deny: First Allow, and then the Deny directive is evaluated. Access is denied by the default meaning that any client that matches neither the Allow nor the Deny directive will be allowed to access the server.

Consider a real example, a directory /var/www/html/localusers. You want only local users falling in the 192.168.2 network access to /var/www/html/localusers. Use the following configuration:

<Directory /var/www/html/localusers>

        Order Allow,Deny
        Allow from 192.168.2.0/24

</Directory>

Consider the following configuration:

<Directory /var/www/html/localusers>

        Order Allow,Deny
        Allow from 192.168.2.0/24
        Deny from 192.168.2.178
</Directory>

This will allow access to all hosts in the network 192.168.2.0/24 except 192.168.2.178. All other requests will be denied by default. Changing the order from Allow,Deny to Deny,Allow will only allow the host 192.168.2.178 to access, since Allow will override the Deny behavior.

4.5 Logging

Apache logs provide comprehensive information and customization for the purposes of security analysis and troubleshooting. Apache logs are located, by default, under the /var/log/httpd directory.

There are two basic types of logs:

Error Log: This log provides error information while processing requests for diagnostic purposes. The location of this log can be controlled by the ErrorLog directive in the main configuration file. Error logs cannot be customized.

Access Log: This log records useful information, such as client IP, date/time, location accessed, client platform information, and so on. An access log can be customized, and its location and content can be controlled by the CustomLog directive.

5. An Example Set-up

Consider a real-world example to configure a static website. The configuration is given below:

Routable Server IP                   203.215.183.11
Non-routable IP                        192.168.2.178
Domain name                           www.testmachine.org
host name                                osrc-test
FQDN                           osrc-test.testmachine.org
           
The machine’s name is osrc-test, but the DNS alias for this configuration is www.testmachine.org.

Steps

  • Open the Apache configuration file httpd.conf
  • Locate DocumentRoot and ensure that it is set to /var/www/html
  • Set ServerName to testmachine.org:80
  • Put your web-publishing directory directly under /var/www/html. If you have all the data that is to be published under/home/user1/website, type:

$ mv /home/user1/website/* /var/www/html

  • Save the Apache configuration file with new changes, exit, and restart the Apache service.

Ensure that valid DNS entries exist for www.testmachine.org that should point to the IP of your machine.

Test the website by pointing to www.testmachine.org.

No comments:

Post a Comment