None

Universal Htaccess Protection


Manage Apache's -directory level- configuration on numerous http servers.

By Dimitris Dimitropoulos

The htaccess file as used by the Apache web server is an integral part of every website, providing URL parsing and redirection features, content management based on regular expressions and protection of sensitive content. Unfortunately, managing several thousand htaccess files on large scale web servers is very time consuming, so this is where the Universal Htaccess Protection script comes into play.

The UHP script reads a common text file (htaccess.txt) splits the contents into sections and then scans the web server for possible htaccess files to update. The script will only consider the “main” htaccess file which is usually the first one accessed under the public_html directory.

The format of the htaccess.txt file uses a special starting and ending string and a set of strings that define separate sections. Below is a minimal htaccess.txt file:

 htaccess.txt

#####################UNIVERSALHTACCESSPROTECTION#####################

# [SITE: all]
# [MODULES: GLOBAL,CUSTOM]

# [GLOBAL]
RewriteEngine On
RewriteBase /
# [/GLOBAL]

#####################/UNIVERSALHTACCESSPROTECTION#####################

 

Special strings are defined as:

SITE = website domain name, for example: example.tld

MODULES = sections defined by the htaccess.txt file, for example: GLOBAL. The CUSTOM section is a special case, its never defined by the htaccess.txt file but allows each website to keep its own set of rules within a CUSTOM section which will not be overwritten by the UHP script.

The administrator may write lots of sections within the htaccess.txt file but may choose which ones to use per website, since none of the sections are mandatory. Lets take a look at a more complete htaccess.txt file with more usable and practical sections and define the following sections:

GLOBAL = enable the rewrite module

DENY IP = deny requests by IP matching

DENY DOMAIN = deny requests by domain matching

WWW = require www subdomain prefix

REQUEST METHOD = block requests by method

REQUEST PROTOCOL = block requests by protocol

REFERRER = block requests by referrer

USER AGENT = block requests by user agent

QUERY STRING = block requests by query string

CUSTOM = custom per-site section that is always ignored

 htaccess.txt

#####################UNIVERSALHTACCESSPROTECTION#####################

# [SITE: all]
# [MODULES: GLOBAL,DENY IP,DENY DOMAIN,WWW,REQUEST METHOD,REQUEST PROTOCOL,REFERRER,USER AGENT,QUERY STRING,CUSTOM]

# [GLOBAL]
RewriteEngine On
RewriteBase /
# [/GLOBAL]

# [DENY IP]
<RequireAll>
	Require all granted
	Require not ip 64.74.215.0/24 82.80.204.0/24 82.80.205.0/24
</RequireAll>
# [/DENY IP]

# [DENY DOMAIN]
<RequireAll>
	Require all granted
	Require not host clients.your-server.de amazonaws.com googleusercontent.com
</RequireAll>
# [/DENY DOMAIN]

# [WWW]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule .* https://www.%{HTTP_HOST} [R=301,L]
# [/WWW]

# [REQUEST METHOD]
RewriteCond %{REQUEST_METHOD} !(GET|POST|HEAD) [NC]
RewriteRule .* - [F]
# [/REQUEST METHOD]

# [REQUEST PROTOCOL]
RewriteCond %{THE_REQUEST} !HTTP/1\.[0-1]$
RewriteRule .* - [F]
# [/REQUEST PROTOCOL]

# [REFERRER]
RewriteCond %{HTTP_REFERER} (<|>|'|%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
RewriteCond %{HTTP_REFERER} (adult|poker|porn)  [NC,OR]
RewriteCond %{HTTP_REFERER} \.(in|am|ru|tv|kz|ua|br)/?$  [NC]
RewriteRule .* - [F]
# [/REFERRER]

# [USER AGENT]
RewriteCond %{HTTP_USER_AGENT} ^$               [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla$        [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE [1-8]\."
RewriteRule .* - [F]
# [/USER AGENT]

# [QUERY STRING]
RewriteCond %{QUERY_STRING} (javascript:).*(\;) [NC,OR]
RewriteCond %{QUERY_STRING} (<|%3C).*script.*(>|%3) [NC,OR]
RewriteRule .* - [F]
# [/QUERY STRING]

#####################/UNIVERSALHTACCESSPROTECTION#####################

 

Now that our htaccess.txt is fully defined with all the required sections, we proceed to the actual UHP script (uhp.php) which will execute on the web server and scan for possible htaccess files to modify. We are going to present the script in parts, below is the first part of the script, here we load the section names only:

 uhp.php

#!/usr/bin/php
<?php
/************** UNIVERSAL HTACCESS PROTECTION (UHP) **************/

echo "[".str_pad("", 57, "#")."]\n";
echo "[           UNIVERSAL HTACCESS PROTECTION (UHP)           ]\n";
echo "[".str_pad("", 57, "#")."]\n";

// Load source data
$srcfile = __DIR__.'/htaccess.txt';
if(!is_file($srcfile) || !is_readable($srcfile)) {
	echo "[UHP: Error reading file: htaccess.txt]\n";
	exit();
}
$src = file_get_contents($srcfile);

// Load source modules
$modules = "";
$rc = preg_match('%# \[MODULES\: (.*?)\]%is', $src, $modules);
if($rc!==1 || !isset($modules[1])) {
	echo "[UHP: Error parsing source modules from: htaccess.txt]\n";
	exit();
}
$modules = explode(",", $modules[1]);
$modules = array_fill_keys($modules, "");

 

The result is an array $modules which contains all the available section names, but not the content of each section. The next part of the script loads section content in the $modules array:

 uhp.php

echo str_pad("[UHP: Source modules loaded", 28, " ").": ".str_pad(sizeof($modules), 28, " ")."]\n";
echo str_pad("[UHP: Parse source data", 28, " ").": ";

// Parse source data
$stars = "";
foreach($modules as $key => $value) {

	// Special case for custom module
	if($key=="CUSTOM") {
		$stars .= "*";
		continue;		
	}

	// Regex match strings
	$rc = preg_match('%# \['.$key.'\](.*?)# \[\/'.$key.'\]%is', $src, $modules[$key]);
	if($rc!==1 || !isset($modules[$key][1])) {
		echo "[UHP: Error parsing source data]\n";
		exit();
	}
	$modules[$key] = $modules[$key][1];
	$stars .= "*";

}

echo str_pad($stars, 28, " ")."]\n";

 

The next step, is to find possible htaccess files to modify, or force the script to modify a single htaccess file as provided from the command line argument. By default, our script expects the system to have a typical CentOS/RHEL and Apache structure, each hosted domain is represented by a system user under the home directory, with website files under the public_html subdirectory. This is also typical of systems using or . The expected structure is illustrated below:


/home

/domain

/public_html

/.htaccess

/www → symbolic link to public_html


 

There are three possible ways to execute the UHP script.

  1. Without parameters. By default the script will scan the /home/*/public_html/ structure for possible htaccess files.
  2. Passing a directory as a parameter. In this case, the script will scan this directory as an alternative home structure, like /somedir/*/public_html/.
  3. Passing a single file as a parameter. In this case, the script expects the file to be an htaccess file, thus it does not perform a scan, it only modifies that specific file.

 uhp.php

// Array of targets
if(isset($argv) && is_array($argv) && sizeof($argv)>1) {

	if(isset($argv[0]))
		unset($argv[0]);
	
	if(is_dir($argv[1])) {

		// Directory
		$targets = array_filter(explode("\n", shell_exec("ls -A -B --ignore=virtfs --ignore=cpanel-rpmstor --ignore=cpeasyapache --ignore=cprestore --ignore=MySQL-install --ignore='.cp*' --ignore=webadmin ".escapeshellarg($argv[1])."/*/public_html/.htaccess 2>/dev/null")));

	} elseif(is_file($argv[1])) {

		// File
		$targets = $argv;

	} else {

		echo "[UHP: Error, input must be a home directory or an .htaccess file]\n";
		exit();

	}

} else {

	$targets = array_filter(explode("\n", shell_exec("ls -A -B --ignore=virtfs --ignore=cpanel-rpmstor --ignore=cpeasyapache --ignore=cprestore --ignore=MySQL-install --ignore='.cp*' --ignore=webadmin /home/*/public_html/.htaccess 2>/dev/null")));

}
if(!is_array($targets) || sizeof($targets)==0) {
	echo "[UHP: Error, did not find any targets]\n";
	exit();
}

echo str_pad("[UHP: Targets loaded", 28, " ").": ".str_pad(sizeof($targets), 28, " ")."]\n";

 

The final part of the script, loops around all possible target htaccess files and modifies them based on their MODULES definition.

 uhp.php

// Loop targets
foreach($targets as $t) {

	echo "[".str_pad("", 57, "-")."]\n";

	// Load target data
	$tdata = file_get_contents($t);

	// Check for UHP tag
	$rc = mb_strpos($tdata, '#####################UNIVERSALHTACCESSPROTECTION#####################', 0, 'UTF-8');
	if($rc===false) {

		$t = str_replace('/.htaccess', '', $t);
		$len = (mb_strlen($t, 'UTF-8') - 28);
		if($len<0)
			$len = 0;
		echo str_pad("[UHP: Skipping target", 28, " ").": ".str_pad(mb_substr($t, $len, null, 'UTF-8'), 28, " ")."]\n";

		// UHP tag not found, continue to next target
		continue;

	}

	// Parse target data
	$site = "";
	$rc = preg_match('%# \[SITE\: (.*?)\]%is', $tdata, $site);
	if($rc!==1 || !isset($site[1])) {
		echo "[UHP: Error parsing target data from: ".$t."]\n";
		exit();
	}
	$site = $site[1];

	echo str_pad("[UHP: Processing target", 28, " ").": ".str_pad($site, 28, " ")."]\n";

	// Load target modules
	$tmodulesstr = "";
	$rc = preg_match('%# \[MODULES\: (.*?)\]%is', $tdata, $tmodulesstr);
	if($rc!==1 || !isset($tmodulesstr[1])) {
		echo "[UHP: Error parsing target modules from: ".$t."]\n";
		exit();
	}
	$tmodulesstr = $tmodulesstr[1];
	$tmodules = explode(",", $tmodulesstr);

	echo str_pad("[UHP: Target modules loaded", 28, " ").": ".str_pad(sizeof($tmodules), 28, " ")."]\n";

	// Load custom data
	$tcustom = "";
	if(mb_strpos($tdata, '# [CUSTOM]', 0, 'UTF-8')!==false) {
		$rc = preg_match('%# \[CUSTOM\](.*?)# \[\/CUSTOM\]%is', $tdata, $tcustom);
		if($rc!==1 || !isset($tcustom[1])) {
			echo "[UHP: Error parsing target custom data]\n";
			exit();
		}
		$tcustom = $tcustom[1];
	}

	echo str_pad("[UHP: Target custom data", 28, " ").": ".str_pad(mb_strlen($tcustom, 'UTF-8')." bytes", 28, " ")."]\n";

	// Prepare output
	$toutput = "";
	$rc = preg_match('%^(.*?)#####################UNIVERSALHTACCESSPROTECTION#####################(.*?)#####################/UNIVERSALHTACCESSPROTECTION#####################(.*?)$%is', $tdata, $toutput);
	if($rc!==1 || !isset($toutput[1], $toutput[2])) {
		echo "[UHP: Error parsing target ignorable data]\n";
		exit();
	}
	$header = $toutput[1];
	$footer = $toutput[3];
	$toutput = $header."#####################UNIVERSALHTACCESSPROTECTION#####################\n\n# [SITE: ".$site."]\n# [MODULES: ".$tmodulesstr."]\n";

	// Loop modules
	$stars = "";
	foreach($tmodules as $tm) {

		echo str_pad("[UHP: Processing module", 28, " ").": ".str_pad($tm, 28, " ")."]\r";

		// Special case for custom module
		if($tm=="CUSTOM") {
			$toutput .= "\n# [".$tm."]".$tcustom."# [/".$tm."]\n";
			$stars .= "*";
			continue;
		}

		// Add source module to target output
		$toutput .= "\n# [".$tm."]".$modules[$tm]."# [/".$tm."]\n";
		$stars .= "*";

	}

	$toutput .= "\n#####################/UNIVERSALHTACCESSPROTECTION#####################".$footer."\n";

	echo str_pad("[UHP: Processing module", 28, " ").": ".str_pad($stars, 28, " ")."]\n";

	echo str_pad("[UHP: Target data collected", 28, " ").": ".str_pad(mb_strlen($toutput, 'UTF-8')." bytes", 28, " ")."]\n";

	// Remember correct ownership and permissions
	$fileattribs = array(fileowner($t), filegroup($t), fileperms($t));

	// Save target .htaccess file
	$rc = file_put_contents($t, $toutput, LOCK_EX);
	if($rc===false) {
		echo "[UHP: Error while saving to target .htaccess file: ".$t."]\n";
		exit();
	}

	// Check for correct ownership and permissions
	if(fileowner($t)!==$fileattribs[0] || filegroup($t)!==$fileattribs[1] || fileperms($t)!==$fileattribs[2]) {
		echo "[UHP: Error, .htaccess file attributes changed: ".$t."]\n";
		exit();
	}

}

echo "[".str_pad("", 57, "#")."]\n";

 

Before we execute the UHP script, we have to prepare the target htaccess files. The target htaccess files will only be modified if they contain the proper structure as defined in the htaccess.txt file, otherwise they are completely ignored. For example, a minimal structure for a hosted WordPress website could be a file like: /home/domain/public_html/.htaccess

In this example file, we have changed the SITE name, defined a limited set of MODULES and added our custom rules in the CUSTOM section, which will not be affected by the UHP script:

 htaccess.txt

#####################UNIVERSALHTACCESSPROTECTION#####################

# [SITE: domain.tld]
# [MODULES: GLOBAL,REFERRER,USER AGENT,CUSTOM]

# [CUSTOM]
DirectoryIndex index.php

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress
# [/CUSTOM]

#####################/UNIVERSALHTACCESSPROTECTION#####################

 

Once we execute the UHP script against the above htaccess file, we end up with the following output:

[#########################################################]
[ UNIVERSAL HTACCESS PROTECTION (UHP) ]
[#########################################################]
[UHP: Source modules loaded : 12 ]
[UHP: Parse source data : ************ ]
[UHP: Targets loaded : 1 ]
[---------------------------------------------------------]
[UHP: Processing target : domain.tld ]
[UHP: Target modules loaded : 4 ]
[UHP: Target custom data : 231 bytes ]
[UHP: Processing module : **** ]
[UHP: Target data collected : 3571 bytes ]
[#########################################################]

 

So what have we achieved so far? First of all, we managed to collect all our htaccess rules in a single text file (htaccess.txt), which helps with the overall management of multiple servers and multiple hosted domains. Secondly, we managed to install our changes automatically across multiple htaccess files with minimal work with a single execution of the UHP script. Finally, we managed to keep any custom rules intact.

What is missing, is a method to distribute the htaccess.txt file across multiple servers, but that is left as an exercise to the reader.

 


View epilis's profile on LinkedIn Visit us on facebook X epilis rss feed: Latest articles