Apache is controlled by a series of configuration files:
httpd.conf, access.conf. and srm.conf (there's actually also a mime.types file,
but you have to deal with that only when you're adding or removing MIME types
from your server, which shouldn't be too often). The files contain instructions,
called directives, that tell Apache how to run. Several companies offer
GUI-based Apache front-ends, but it's easier to
edit the configuration files by hand.
Remember to make back-up copies of all your Apache
configuration files, in case one of the changes you make while experimenting
renders the Web server inoperable.
Also, remember that configuration changes you make don't take
effect until you restart Apache. If you've configured Apache to run as an inetd
server, then you don't need to worry about restarting, since inetd will do that
for you.
Download the
referencecard
As with other open-source projects, Apache users share a
wealth of information on the Web. Possibly the single most useful piece of
Apache-related information-- apart from the code itself, of course--is
a two-page guide created by Andrew Ford.
Called the Apache Quick Reference Card, it's a PDF file
(also available in PostScript) generated from a database of Apache directives.
There are a lot of directives, and Ford's card gives you a handy reference to
them.
While this may not seem like a tip on how to run Apache, it
will make your Apache configuration go much smoother because you will have the
directives in an easy-to-access format.
One quick note--we found that the PDF page was a bit larger
than the printable area of our printer (an HP LaserJet 8000 N). So we set the
Acrobat reader to scale-to-fit and the pages printed just fine.
Use one configuration
file
The typical Apache user has to maintain three different
configuration files--httpd.conf, access.conf, and srm.conf. These files contain
the directives to control Apache's behavior.
The tips in this story keep the configuration files separate,
since it's a handy way to compartmentalize the different directives. But Apache
itself doesn't care--if you have a simple enough configuration or you just want
the convenience of editing a single file, then you can place all the
configuration directives in one file. That one file should be httpd.conf, since
it is the first configuration file that Apache interprets. You'll have to
include the following directives in httpd.conf:
AccessConfig /dev/null
ResourceConfig /dev/null
ResourceConfig /dev/null
That way, Apache won't cough up an error message about the
missing access.conf and srm.conf files. Of course, you'll also need to copy the
directives from srm.conf and access.conf into your new httpd.conf
file.
Restrict access
Say you have document directories or files on your Web server
that should be visible only to a select group of computers. One way to protect
those pages is by using host-based authentication. In your access.conf file, you
would add something like this:
<Directory
/usr/local/apache/share/htdocs/protected>
order deny,allow
deny from all
allow from 10.10.64
</Directory>
order deny,allow
deny from all
allow from 10.10.64
</Directory>
The <Directory> directive
is what's called a sectional directive. It encloses a group of directives
that apply to the specified directory. The Apache Quick Reference Card includes
a listing of sectional directives.
The above case allows only computers with an IP address
starting with 10.10.64 to access the pages in the given directory. You can use
the complete IP address, an IP range as shown here, or even use the DNS names.
For example, to allow only CNET computers access to a specific file, you might
do this in your access.conf file:
<Location /usr/local/apache/share/htdocs/company/employees.html>
order deny,allow
deny from all
allow from .cnet.com
</Location>
order deny,allow
deny from all
allow from .cnet.com
</Location>
It's
important to have that preceding period on the domain name, otherwise Apache
allows only the computer that exactly matches cnet.com. If that's
what you want, you can restrict to individual IP addresses and fully qualified
domain names.
An
interesting side-effect of host-based authentication is that if you're using a
browser on the Web server machine itself and attempt to access the page through
localhost, you'll be denied permission. That's because the localhost IP,
127.0.0.1, will not be in the .cnet.com range.
You can easily add localhost to the permission list by putting the appropriate
IP on the allow directive:
allow from .cnet.com 127.0.0.1
The
majority of security measures you will need to take when running a publicly
accessible Web site will be set at the operating system level. You will want to
make sure write access is restricted in the directories where your Web pages are
stored to keep visitors from defacing your site.
Customize error messages
If
a user requests a page that doesn't exist or is in a protected directory, Apache
returns one of its built-in error messages that say things like Forbidden or Not
Found. That's accurate, but not very informative. You may want to give your
users more guidance as to what they did wrong, provide an alternative URL to get
them back in your site, or at least offer an error page that fits in with your
overall site design. With a bit of editing, you can make Apache return a custom
error page or run a script to handle the error.
Open
the srm.conf file and insert the following:
ErrorDocument 404 /error.html
Your
server will now return the error.html page whenever a user requests a page that
doesn't exist (which is what the 404 error code means--check out the Apache
Quick Reference Card for a list of other HTTP 1.1 status codes). In this
example, the destination of the directive is an HTML page, but you could also
point to a CGI or even a URL from a different Web site.
Unless
you include a full URL, the ErrorDocument directive
uses a path relative to the document root of your Web server. So in our example,
error.html must reside in the Apache document root. By default that document
root is /usr/local/apache/share/htdocs. Also, when Apache actually serves up
this error page it does so within the context of the erroneous URL. So if a user
requested a nonexistent page (http://www.dummydomain.com/one/two/none.html),
Apache returns error.html as if it resided in the /one/two directory. That means
you need to be careful and fully qualify any relative paths to images or other
pages in the error.html file. Otherwise you might serve an error page that
itself contained errors.
Support multiple languages
HTTP
1.1 formally specified a feature called content negotiation, which had
actually been around for awhile in experimental servers, including early
versions of Apache. It's a way to present documents in different languages and
formats based on a user's browser configuration.
For
example, suppose you're a Canadian company that needs to serve both French and
English versions of your Web site. First, you must enable the feature by adding
the appropriate directive to your access.conf file.
Open
the access.conf file and find or create the appropriate <Directory> entry
where you plan to store the multilanguage pages. Then add the Options
MultiViews directive to that section. Remember that Options All does
not actually mean all--it doesn't turn on MultiViews support. So you must
explicitly declare your intention to use MultiViews. For example:
<Directory /usr/local/apache/share/htdocs/multi>
Options MultiViews
</Directory>
Options MultiViews
</Directory>
Next,
you need to edit your srm.conf file to include the languages you want to support
and the file extensions associated with each language. The Canadian example
calls for English and French, which have the standard
identifiers en and fr, respectively. Your srm.conf file should
already have these, but if not, add the appropriate lines:
AddLanguage en .en
AddLanguage fr .fr
LanguagePriority en fr
AddLanguage fr .fr
LanguagePriority en fr
The LanguagePriority directive
is used when there's a tie during content negotiation. For example, if Apache
can't tell whether the browser prefers English or French, the LanguagePriority directive
tells Apache to serve the English version of the page.
For
Apache to recognize which pages it should serve, you have to include the proper
extension on your file names. If, for example, you want to offer a help file in
two languages, you'd create a help.html.en and help.html.fr file with the
appropriate language content. Then, when a user requests the
http://yourdomain.com/multi/help.html file, Apache will check the browser's
language preference and return the correct version.
Configure for server-side includes
If
you want to take a small step beyond static HTML pages, but you aren't quite
ready to dive into writing your own Perl scripts, then you should try
server-side includes (SSI). With SSI turned on, Apache will preparse certain
HTML files before sending them out, looking for special embedded commands. These
commands allow you to do basic things like include the contents from another
file or print out an environment variable.
To
enable it, you first need to make sure it has been compiled into your version of
Apache. Go to the directory where your httpd executable resides, typically
/usr/local/apache/sbin, and type./httpd -l. That should return a list of
all the modules included in your build of Apache. Hopefully mod_include.c is in
that list. If not, you'll have to rerun the build of Apache, editing the comment
code from the mod_include in the Configuration.tmpl file.
Once
you've determined that mod_include is available, you have to allow the execution
of includes and map an appropriate filetype. As with all things Apache, there
are about a gazillion ways to do this. Probably the easiest is to enable all the
options in one place in your access.conf file:
<Directory /usr/local/apache/share/htdocs/include>
Options +Includes
AddType text/html .shtml
AddHandler server-parsed .shtml
</Directory>
Options +Includes
AddType text/html .shtml
AddHandler server-parsed .shtml
</Directory>
All
files in the /usr/local/apache/share/htdocs/include directory that contain a
.shtml extension get parsed by Apache before being sent out to a browser.
In
many instances, the AddType and AddHandler directives
are already in your srm.conf file, but they're commented out. So you could
uncomment those, and in your access.conf file set theOptions to allow
executing include commands. Note the use of the plus sign in the Options directive--that
tells Apache to add this
option to any preceding options settings, rather than overriding them. If you
want to limit the SSI support to prevent executing potentially dangerous
programs, you might want to use Options
+IncludesNOEXEC.
To
test your settings, create a test.shtml file like this one:
<HEAD><TITLE>SSI
Test</TITLE></HEAD>
<BODY bgcolor="white">
<H1>SSI Test</H1>
File last modified <!--#flastmod file="test.shtml" -->
<P><PRE>
<!--#printenv -->
</PRE>
<P><!--#exec cmd="/bin/date" -->
</BODY>
</HTML>
<BODY bgcolor="white">
<H1>SSI Test</H1>
File last modified <!--#flastmod file="test.shtml" -->
<P><PRE>
<!--#printenv -->
</PRE>
<P><!--#exec cmd="/bin/date" -->
</BODY>
</HTML>
Apache
will attempt to parse any text that starts with a <!--#. The
example uses three SSI commands--flastmod, printenv, and exec (the complete
list of SSI commands is on the Apache Quick Reference Card). The flastmod prints the
last-modified date for the specified file, printenv spits back
a list of environment variables and their values, and exec runs the
specified shell command. Note that if you've configured Options
+IncludesNOEXEC, then the exec command
returns an error message instead of the current date and time.
Configuring Apache for CGI
If
you've pushed server-side includes about as far as they can go, you might want
to try common gateway interface (CGI) scripts. CGI is a standard way for Web
servers to interact with other programs running on your computer. CGI scripts
are usually written in Unix shell commands or with a scripting language such as
Perl.
Configuring
Apache to run CGI programs isn't that hard. First, you need to assign an alias
for your script directory.
You
never want the directory containing CGI scripts to actually reside within the
normal document root of the server because an intruder could get access and run
their own scripts. So you create a special location, called an alias, to
the actual CGI directory. Edit your httpd.conf file and add the line
below:
ScriptAlias /cgi-bin/ /usr/local/apache/share/cgi-bin/
The
example uses the default directory for CGI programs, but you're free to use any
directory you want. Now, when someone requests a URL like this
http://mydomain.com/cgi-bin/test.cgi, the Apache Web server knows to look in the
/usr/local/apache/share/cgi-bin directory to find the test.cgi program.
However,
this does not configure Apache to run the programs it finds in the cgi-bin
directory. To actually execute programs, you need to edit the access.conf file
by adding a section like this:
<Directory /usr/local/apache/share/cgi-bin>
Options ExecCGI
AddHandler cgi-script .cgi .pl
</Directory>
Options ExecCGI
AddHandler cgi-script .cgi .pl
</Directory>
The Options directives
tell Apache to allow the execution of CGIs within the specified directory. And
the AddHandler directive
tells Apache which file extensions to associate with the cgi-script handler. The
example uses the two most common file extensions, CGI and PL.
If
you're running Apache on Unix, make sure that the user account under which the
Web server runs has permission to execute the scripts in the directory.
Otherwise the OS won't let Apache run the scripts.
Avoid unnecessary file lookups
There
are special files, called .htaccess, that reside within a directory and tell
Apache to provide special handling for the files in that directory. For example,
instead of enabling server-side includes in the access.conf file, you might
specify it within the actual directory by including the directive in a .htaccess
file.
By
default, Apache is not configured to allow .htaccess files. If you open up the
access.conf file you'll see that the AllowOverride
None directive is sprinkled liberally throughout the various<Directory> sections.
If you want to allow overrides, you might be tempted to change the directive at
the root level, like this:
<Directory />
AllowOverride All
</Directory>
AllowOverride All
</Directory>
Don't
do it. Whenever Apache handles a request, it would have to process .htaccess
files in the same directory as the file it is serving, and also in all the
parent directories up to the root. For instance, if you request the URL
/docs/about.html and your document root is /usr/local/apache/share/htdocs,
Apache tries to process .htaccess files in all these directories:
/
/usr
/usr/local
/usr/local/apache
/usr/local/apache/share
/usr/local/apache/share/htdocs
/usr/local/apache/share/htdocs/docs
/usr
/usr/local
/usr/local/apache
/usr/local/apache/share
/usr/local/apache/share/htdocs
/usr/local/apache/share/htdocs/docs
Normally,
there are no .htaccess files above the document root, but Apache still checks
the file system to make sure. That's a lot of unnecessary file lookups. And if a
malicious hacker had managed to place an .htaccess file somewhere in this
document tree, it could pose a security risk to your site.
Instead,
keep the AllowOverride
None directive for your root directory, and turn it on only for the
specific directories where you really want it. For example, to perform .htaccess
lookups starting in the document root, you'd modify your access.conf file like
this:
<Directory />
AllowOverride None
</Directory>
<Directory /usr/local/etc/httpd/htdocs>
AllowOverride All
</Directory>
AllowOverride None
</Directory>
<Directory /usr/local/etc/httpd/htdocs>
AllowOverride All
</Directory>
The All can be
replaced with whatever level of configurability you want. For example, if you
want to allow server-side include overrides but don't want to allow running
shell programs, you'd use something like this:
<Directory /usr/local/etc/httpd/htdocs>
AllowOverride IncludesNOEXEC
</Directory>
AllowOverride IncludesNOEXEC
</Directory>
A
final note--the .htaccess file doesn't actually have to be called .htaccess.
Open the srm.conf and find the AccessFileName directive.
You can change what the .htaccess file is called:
AccessFileName .my_htaccess_file
(Editor's
note: A version of this tip originally appeared in Apache Week. It appears here
with permission.)
Limit DNS overhead
To
improve Apache's performance, when restricting access with allow from or deny from, use IP
addresses where possible to limit the number of DNS lookups. Apache has to run a
double lookup when using an allow from domain
name or deny from domain
name directive--a reverse to resolve the browser's IP address into a
domain name followed by a forward to make sure that the reverse is not being
spoofed.
You
can limit your DNS lookup overhead even further by restricting lookup to only
the files you need hostname lookups on, such as HTML or CGI. To do that, add
something like this in your configuration files:
HostnameLookups off
<Files ~ "\.(html|cgi)$>
HostnameLookups on
</Files>
<Files ~ "\.(html|cgi)$>
HostnameLookups on
</Files>
Check the timeout
Even
relatively simple Web pages can have a number of pieces. Previously, a browser
had to set up a new connection to the Web server to retrieve each piece--a
connection to retrieve the HTML and separate connections for each GIF. One page
with three images would require four connections. That's kind of expensive in
network traffic, and can really slow things down.
HTTP
1.1 added a new feature called keep-alive. This lets a Web server keep a
connection open so the browser can send down multiple requests without having to
set up a new connection for each one. In Apache, keep-alives are controlled by
three directives in httpd.conf: KeepAlive,MaxKeepAliveRequests,
and KeepAliveTimeout.
The KeepAlive directive
determines whether to activate the KeepAlive feature,
whileMaxKeepAliveRequests determines
how many requests the server will allow from a browser during a single
connection. And KeepAliveTimeout determines
how long the server will keep the connection open waiting for additional
requests. So to turn on keep-alives, and allow for 100 requests with a 15 second
timeout, add the following lines to httpd.conf:
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MaxKeepAliveRequests 100
KeepAliveTimeout 15
Your
server may already be configured this way, so double-check httpd.conf before
adding these lines.
If
you're a little paranoid, you may wonder whether changing these values actually
does anything. HTTP is a text protocol, so you can test the KeepAliveTimeout value
for yourself. First, you need to Telnet to your server at the appropriate port.
In our case, our Linux computer was mapped in our internal DNS tables as
builder-linux, and the Web server was running at the default port 80. So the
Telnet command we used was telnet builder-linux 80.
Now
type in the following, remembering to hit Enter twice after typing the second
line:
HEAD / HTTP/1.1
Host: builder-linux
Host: builder-linux
This
retrieves the header information for the main index file of your server. Now
simply wait for the timeout--in our case the connection is terminated 15 seconds
after we send the HTTP query.
If
you change the KeepAliveTimeout,
you can use this Telnet trick to verify that the change took place.
Rotate log files
Whenever
a browser downloads information from an Apache Web site, the server stores
information about that access in a log file. You use these log files in
conjunction with other scripts or software to analyze the traffic on your site.
If you've got a busy Web site, that log file can grow quite large rather quickly
and become too unwieldy for easy analysis. The answer is log rotation.
Rotating
your log files means periodically creating a new log file, so that the older one
can be archived or sent to an analysis tool without disturbing the current log
file. Apache makes it easy with a built-in log file rotation utility called
rotatelogs.
To
use it, edit your httpd.conf file to include the TransferLog directive:
TransferLog "| /usr/local/apache/sbin/rotatelogs
/usr/local/apache/var/log/access_log 86400"
The
example above gives TransferLog the
location of the rotatelogs program, as well as the location and name for the log
file. The number at the end indicates how many seconds between each log rotation
-- in this case 86,400 seconds, or one day.
Apache
generates a log file with a base name of access_log followed by a long numeric
extension. So you might have one called access_log.0904347600 and then a day
later you'd have another one that's got an extension value that's higher by
86,400.
Block bad robots
Robots
are programs that automatically download pages from your Web site. A
well-behaved robot is supposed to read your robots.txt file to determine how to
crawl your site. But ill-behaved robots may ignore the file, potentially
distorting your Web site traffic and ad reports as well as stealing your network
bandwidth and slowing down your Web server.
If
you know the robot's IP address, you can use the Apache Deny directive to
restrict Web access from that IP address. For something more powerful, use the
Apache mod_rewrite module. It may not be part of your default Apache
configuration--you can check using the ./httpd -l command. If it's not there,
you'll have to edit the Configuration.tmpl file and recompile Apache.
Once
you've installed mod_rewrite, you can use it to restrict access to your server
based on any server or environment variable, including IP address, robot agent
name, and time of day.
For
example, adding the following directive to one of your configuration files
blocks all access from any robot with the keyword "NameOfBadRobot" in
the HTTP user agent:
RewriteCond
%{HTTP_USER_AGENT} ^NameOfBadRobot.*
RewriteRule ^/.* - [F]
RewriteRule ^/.* - [F]
For
more on using the mod_rewrite module, read the documentation at the
Apache Web site.
Diagnose your server
The
mod_status and mod_info modules let you analyze and debug your Web server from a
browser. First, make sure the modules are compiled in your version of Apache.
Then activate the modules and control access to this information.
The
mod_status module gives you comprehensive Web server diagnostics such as uptime
and downtime, requests, CPU usage, and so forth. (Note: using this module
requires that Apache be running in standalone mode, not as an inetd
server.)
Add
the following to your access.conf file:
<Location /status>
SetHandler server-status
<Limit GET>
order deny,allow
deny from all
allow from .cnet.com
</Limit>
</Location>
SetHandler server-status
<Limit GET>
order deny,allow
deny from all
allow from .cnet.com
</Limit>
</Location>
This
configures the server-status handler to run on the /status virtual directory.
You'll notice the use of the <Limit> directive,
which is another directive for restricting
access to your site. In this case, it actually lets you limit not just who
can access the section, but what sort of requests are honored. The example is
configured to accept only GET commands, and
only from computers within the cnet.com domain.
You
can add a bit of extra interactivity by appending a refresh command to the
URL:
http://oscar.cnet.com/status?refresh=5
This
gives you the Apache Web server status for oscar.cnet.com every 5 seconds.
The
next module is mod-info. Again, edit your access.conf with the following
section:
<Location /info>
SetHandler server-info
<Limit GET>
order deny,allow
allow from .cnet.com
deny from all
</Limit>
</Location>
SetHandler server-info
<Limit GET>
order deny,allow
allow from .cnet.com
deny from all
</Limit>
</Location>
Now,
a URL such as http://oscar.cnet.com/info will give detailed information about
the oscar.cnet.com server, such as running daemons, version, users and groups,
hosts, ports, and so forth. But mod_info also supplies a list of all the modules
that Apache is using, as well as stats about each module (such as the directives
being enforced). This can be very helpful when trying to debug the server.
Run on a Windows notebook
If
you need to run Apache on a Windows 95 or Windows 98 notebook computer, you will
probably have to change the server name. When a notebook PC is running without a
LAN card or active PPP/SLIP dialup connection, Windows will not load its TCP/IP
support, and Apache will return an error message because it can't determine its
network name.
Why
would you even want to run a Web server on a disconnected laptop? Well, you may
want to prototype and test things while on the road. Thankfully, it's a simple
problem to fix.
In
the httpd.conf file, you'll see the ServerName directive
commented out. Change it to match the Windows name you've assigned to the
laptop. In our case, the laptop was called Raptor in Windows networking, so the
line in our httpd.conf file read:
ServerName raptor
Now
Apache will run fine, whether you're networked or not.
No comments:
Post a Comment