Rewrite for robots.txt and favicon.ico

7,620

Solution 1

The problem I was having was actually not even related to where I was looking...

It was the

    RewriteCond     %{HTTP_HOST}            ^www.mysite.com$   [NC]
    RewriteRule     ^(.*)$                  - [L]

The [L] option actually prevented the mod_rewrite from even processing anything beflow, in which where the favicon.ico code was at.

I got rid of that rule&cond and it worked like a charm.

Solution 2

Directives of this form are gibberish, right?

   RewriteCond     ^(.+)favicon.ico        !-f 

The first parameter is a variable interpolated string, not a regex, so it will always fail. Even after removing the regex from the first parm, you need to add in something like a backreference or a prefix for a path to look for favicon.ico in.

Is that your only culprit?

Some of the RewriteRules also have bogus conditions baked into them:

    RewriteRule     ^(?:www\.)?([^.]+?)\.mysite\.com/media/(.*) ...

that optional www. can never be there, since RewriteRule in virtual host context is always matching against a string starting with "/". This means the range will always match the leading slash.

Additionally, why would "mysite.com" be in the URL-path you're matching against? Do you ned another RewriteCond to poke around in HTTP_HOST?

This would all probably be a bit more apparent with a RewriteLog.

Share:
7,620

Related videos on Youtube

ParoX
Author by

ParoX

Updated on September 17, 2022

Comments

  • ParoX
    ParoX over 1 year

    I have setup some rules in which subdomains (my users) will default to where I have located the robots.txt, favicon.ico, and crossdomain.xml

    therefore if a user creates a site say

    testing.mywebsite.com and they don't make their own favicon.ico at testing.mywebsite.com/favicon.ico, then it will use the favicon.ico I have in /misc/favicon.ico

    This works perfect, but it doesn't work for the main website. If you attempt to go to mywebsite.com/favicon.ico it will check if "/" exists, in which it does. And then never redirects to /misc/favicon.ico

    How can I get it so both instances redirect to /misc/favicon.ico ?

        # Set all crossdomain (openxxx file) favorite icons and robots.txt doesnt exist on their
        # side, then redirect to site's just to have something to go on.
    RewriteCond     %{REQUEST_URI}          crossdomain.xml$
    RewriteCond     ^(.+)crossdomain.xml    !-f
    RewriteRule     ^(.*)$                  /misc/crossdomain.xml [L]
    
    RewriteCond     %{REQUEST_URI}          favicon.ico$
    RewriteCond     ^(.+)favicon.ico        !-f 
    RewriteRule     ^(.*)$                  /misc/favicon.ico [L]
    
    RewriteCond     %{REQUEST_URI}          robots.txt$
    RewriteCond     ^(.+)robots.txt         !-f 
    RewriteRule     ^(.*)$                  /misc/robots.txt [L]
    

    Edit:

    Here is my full Vhost if it helps in diagnosing:

    <VirtualHost *>
            ServerName www.mysite.com
            ServerAdmin [email protected]
            DocumentRoot /var/www/mysite/
            <Directory /var/www/mysite/>
                    Options Indexes FollowSymLinks MultiViews
                    AllowOverride All
                    Order allow,deny
                    allow from all
            </Directory>
    
    
            ErrorLog /var/log/apache2/mysite_error.log
            LogLevel warn
    
            CustomLog /var/log/apache2/mysite_access.log combined
            ServerSignature on
    
            RewriteEngine   on        
    
            #RewriteLog     "/var/log/apache2/rewrite.log"
            #RewriteLogLevel 9
    
    
    
            RewriteCond     %{HTTP_HOST}            ^www.mysite.com$   [NC]
            RewriteRule     ^(.*)$                  - [L]
    
    
            RewriteCond     %{HTTP_HOST}            ^mysite.com$   [NC]
            RewriteRule     ^(.*)$                  http://www.mysite.com$1 [NC,L,R=302]
    
    
            # If there is a subdomain and the subdomain has a /home/ directory attached to it then
            # rewrite the HTTP_HOST into the URI so we can process it. If its /media/, go to their
            # media folder, otherwise to their www folder.
    
            RewriteCond     %{HTTP_HOST}                                 ^(?:www\.)?(.+?)\.mysite\.com$
            RewriteCond     /home/%1/                                    -d
            RewriteRule     ^(.+)                                        %{HTTP_HOST}$1
    
            RewriteRule     ^(?:www\.)?([^.]+?)\.mysite\.com/media/(.*) /home/$1/xxx/media/$2 [L]
            RewriteRule     ^(?:www\.)?([^.]+?)\.mysite\.com/(.*)       /home/$1/www/$2
    
    
    
    
            # Set all crossdomain (openpalace file) favorite icons and robots.txt doesnt exist on their
            # side, then redirect to mainsite's just to have something to go on.
            RewriteCond     %{REQUEST_URI}          crossdomain.xml$
            RewriteCond     ^(.+)crossdomain.xml    !-f
            RewriteRule     ^(.*)$                  /misc/crossdomain.xml [L]
    
            RewriteCond     %{REQUEST_URI}          favicon.ico$
            RewriteCond     ^(.+)favicon.ico        !-f 
            RewriteRule     ^(.*)$                  /misc/favicon.ico [L]
    
            RewriteCond     %{REQUEST_URI}          robots.txt$
            RewriteCond     ^(.+)robots.txt         !-f 
            RewriteRule     ^(.*)$                  /misc/robots.txt [L]
    
    
    
    
            # Same as above but this is for if the directory doe's not exist. Typically it would be
            # wise to put this into a skip to emulate an if/else but it would also match if the 
            # HTTP_HOST was anything. The skip If/Else method only works with 1 IF, 2
            # Don't put this as the last thing, there could be redirects later on that may not have a
            # /home/ dir but are just shortcuts to things, like inbox.mysite.com
    
            RewriteCond     %{HTTP_HOST}                                 ^(?:www\.)?(.+?)\.mysite\.com$
            RewriteCond     /home/%1/                                    !-d        
            RewriteRule     ^(.*)$                                       http://www.mysite.com$1 [R=302]
    
    
    
    
    
            #Extract the subdomain (if there is one), domain, and tld. Check id the domain exists in
            #/home/. If it does then rewrite URL to http_host and then see if it can apply to media.
            # There has to be two pairs of rewriteCond because it only applies to the next rule.
    
                                                                          #www.   domain .   tld
            RewriteCond     %{HTTP_HOST}                                ^(?:.*\.)?([^.]+)\.(?:[^.]+)$
            RewriteCond     /home/%1/                                   -d
            RewriteRule     ^(.+)                                       %{HTTP_HOST}$1 [C]
            RewriteRule     ^(?:.*\.)?([^.]+)\.(?:[^.]+?)/media/(.*)$    /home/$1/xxx/media/$2 [L] 
    
                                                                           #www.   domain .   tld
            RewriteCond     %{HTTP_HOST}                                ^(?:.*\.)?([^.]+)\.(?:[^.]+)$
            RewriteCond     /home/%1/                                   -d
            RewriteRule     ^(?:.*\.)?([^.]+)\.(?:[^.]+?)/(.*)$          /home/$1/www/$2 [L] 
    
    
    </VirtualHost>
    
    • KM.
      KM. about 13 years
      What do your apache logs say? Do you have "mysite.com" defined the same way as testing.mysite.com? Does it work for www.mysite.com?
    • ParoX
      ParoX about 13 years
      I added the complete vhost
    • KM.
      KM. about 13 years
      Try adding "ServerAlias mysite.com" after "ServerName" line. Also, do you have a DNS entry for mysite.com -- same as www.mysite.com?
    • ParoX
      ParoX about 13 years
      Yes mysite.com and www.mysite.com are the same base IP. All the sites that need to be routed through this vhost range from that same IP to 50 others.
  • ParoX
    ParoX about 13 years
    However on the optional www, the %{HTTP_HOST}$1 above attaches more to the HTTP_HOST. Therefore if they attempted www.test.mysite.com/media/picture.jpg it would extract test as the subdomain and then apply the request to the file /home/test/xxx/media/picture.jpg. I guess I am more confused on the way you word things, I would understand better with examples.
  • covener
    covener about 13 years
    That rule would almost make sense as a RewriteRule, but not at all as a RewriteCond. You're not comparing against the current URL at all with a RewriteCond.
  • covener
    covener about 13 years
    For the RewriteRule I commented on, it requires that "mysite.com" either be in the path component of the URL or have been added there by a prior rewrite. In other words, http://mysite.com/something/mysite.com/media... which I assume is not the case. RewriteRule matches against the path component of the URI, not the scheme/host/query/etc.
  • ParoX
    ParoX about 13 years
    Correct- My understanding is that RewriteRules are only processed if the preceding RewriteCond are met. Therefore if the HTTP_HOST ends with mysite.com, extract the subdomain if it can, if it can then see if the subdomain exists as a home directory, if it does take the whole path component of the URI and rewrite the path with the host involed.. so you are right.. if the request was test.mysite.com/mystuff/mypage.html then it would be rewritten to test.mysite.com/test.mysite.com/mystuff/mypage.html, of which test.mysite.com/mystuff/mypage.html is now the path compnent...
  • ParoX
    ParoX about 13 years
    The point of doing that is so I can then use just rewrite rules and it would be against the whole URL, including the HTTP_HOST and the path component.