Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs?

23,912

Solution 1

There's an apache module called mod_substitute that can do this. Here's a short example:

<Location "/">
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s/uat.site.co.jp/jp.uat.site2uk.co.uk/ni"
</Location>

Or, when combined with mod_proxy:

ProxyPass / http://uat.site.co.jp/
ProxyPassReverse / http://uat.site.co.jp/

Substitute "s|http://uat.site.co.jp/|http://jp.uat.site2uk.co.uk/|i"

There's more information at the Apache documentation for mod_substitute.

Solution 2

If you haven't restarted Apache, be sure to do that, but if you've already done so, you could try a global output filter that runs a custom PHP script to do your replacing just to see if that solves it for some reason.

EDIT: based on your comment, it could be that substitute isn't working because the content is compressed. To turn off compression, add these lines to your VirtualHost:

RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity

If that doesn't work, try the following:

Add these to your conf, updating the paths of course:

#add this outside of any VirtualHost tags
ExtFilterDefine proxiedcontentfilter mode=output cmd="/usr/bin/php /var/www/proxyfilter.php"

#add these in your VirtualHost tag
RequestHeader unset Accept-Encoding 
RequestHeader set Accept-Encoding identity
SetOutputFilter proxiedcontentfilter

In proxyfilter.php have some code like the following:

#!/usr/bin/php
<?php
$html = file_get_contents('php://stdin');
$html = str_ireplace('uat.site.co.jp', 'jp.uat.site2uk.co.uk', $html);
file_put_contents('php://stdout', $html);

If this works, then narrow the focus of this to just text/html content as you have in your example.

Solution 3

According to https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypassreverse which rewrites the headers, you use "

To rewrite HTML content to match the proxy, you must load and enable mod_proxy_html.

Share:
23,912

Related videos on Youtube

ZZ9
Author by

ZZ9

Updated on September 18, 2022

Comments

  • ZZ9
    ZZ9 over 1 year

    Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs? The documentation on substitute is not clear.

    I have some pages I am reverse proxying that have absolute paths. This breaks the site. They need replacing and tools like mod_rewrite are not picking them up as they are not URL requests.

    <VirtualHost *:80>
        ServerName  servername1
        ServerAlias servername2
    
        ErrorLog "/var/log/proxy/jpuat_prox_error_log"
        CustomLog "/var/log/proxy/jpuat_prox_access_log" common
    
        RewriteEngine on
        LogLevel alert rewrite:trace2
        RewriteCond %{HTTP_HOST} /uat.site.co.jp$ [NC]
        RewriteRule ^(.*)$ http://jp.uat.site2uk.co.uk/$1 [P]
    
        AddOutputFilterByType SUBSTITUTE text/html
        Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|i"
    
    
        ProxyRequests Off
    
        <Proxy *>
                Order deny,allow
                Allow from all
        </Proxy>
    
        ProxyPass / http://uat.site.co.jp/
        ProxyPassReverse / http://uat.site.co.jp/
    </VirtualHost>
    

    Neither of the above works at replacing the HTML string

    <link href="//uat.site.co.jp/css/css.css
    

    with

    <link href="//uat.site2uk.co.uk/css/css.css
    

    Conf after changes:

    <VirtualHost *:80>
        ServerName  jp.uat.site2uk.co.uk
        ServerAlias uat.site.co.jp
        ErrorLog "/var/log/proxy/jpuat_prox_error_log"
        CustomLog "/var/log/proxy/jpuat_prox_access_log" common
        ProxyRequests Off
        <Proxy *>
            Order deny,allow
            Allow from all
        </Proxy>
        ProxyPass / http://uat.site.co.jp/
        ProxyPassReverse / http://uat.site.co.jp/
        AddOutputFilterByType SUBSTITUTE text/html
        Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|ni"
    </VirtualHost>
    
    • GregL
      GregL about 9 years
      I'm confused. That looks like it's from an HTML a tag. Clicking on that link likely won't result in the web browser following the link, but rather a file browser (Windows Explorer) trying to open the UNC. Are you trying to replace that string in HTML text?
    • ZZ9
      ZZ9 about 9 years
      They site works correctly. However once we put it behind a firewall we ofcourse get 404s on a bunch of css and images. Normally everything gets 200
    • ZZ9
      ZZ9 about 9 years
      They are from link tags on an IIS server <link href="//fqdn/asset"
    • GregL
      GregL about 9 years
      I don't think you can provide UNC paths in link tags. If you can, I can't say it would be a good idea.. In any event, that's not your question. According to the Apache docs, the substitute directive is only valid inside Directory blocks or .htaccess files. Try creating a <location> block (even if it's for /) and put the directive in there.
    • GregL
      GregL about 9 years
      Try a location block instead, or read about their differences and use whichever one is better.
    • Tero Kilkanen
      Tero Kilkanen about 9 years
      @GregL, this format of URL is a "protocol-relative" URL, it is perfectly valid way to link to pages, although it is not that commonly known. "//domain.com/path" makes the browser request the document with the same protocol that was used to request the page containing the link.
  • ZZ9
    ZZ9 about 9 years
    Hi, thanks for the suggestion, unfortunately I have not had much luck down this path. I have tested it outside of the proxy successfully though. It appears mod_proxy ignores it.
  • Jenny D
    Jenny D about 9 years
    I added some more info which may be helpful.
  • ZZ9
    ZZ9 about 9 years
    I get a HTML 200 on the page but the browser shows: Content Encoding Error The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression.
  • g491
    g491 about 9 years
    Ah, add these to your VirtualHost. RequestHeader unset Accept-Encoding and also RequestHeader set Accept-Encoding identity
  • g491
    g491 about 9 years
    I updated my answer with something to try to get your original substitute line working. I'd recommend trying that first as it's simpler to try and may be what's going on.
  • ZZ9
    ZZ9 about 9 years
    Update for a great answer but I got the other answer working first
  • ZZ9
    ZZ9 about 9 years
    Thanks a lot, this works. Turned out to be a glitch with Apache picking up backups of my files in /etc/httpd/conf.d/ that didn't end in .conf (vhost.bak).
  • user3071284
    user3071284 about 3 years
    This is true when ProxyPass and ProxyPassReverse are not used in <Location>
  • lorenzo-s
    lorenzo-s over 2 years
    Alternatively, you can decompressing incoming content before substituting, and then compress again after, using just SetOutputFilter INFLATE;DEFLATE