How can I sanitize user input with PHP?

php security xss sql-injection user-input

546

Solution 1

It's a common misconception that user input can be filtered. PHP even has a (now deprecated) "feature", called magic-quotes, that builds on this idea. It's nonsense. Forget about filtering (or cleaning, or whatever people call it).

What you should do, to avoid problems, is quite simple: whenever you embed a a piece of data within a foreign code, you must treat it according to the formatting rules of that code. But you must understand that such rules could be too complicated to try to follow them all manually. For example, in SQL, rules for strings, numbers and identifiers are all different. For your convenience, in most cases there is a dedicated tool for such an embedding. For example, when you need to use a PHP variable in the SQL query, you have to use a prepared statement, that will take care of all the proper formatting/treatment.

Another example is HTML: If you embed strings within HTML markup, you must escape it with htmlspecialchars. This means that every single echo or print statement should use htmlspecialchars.

A third example could be shell commands: If you are going to embed strings (such as arguments) to external commands, and call them with exec, then you must use escapeshellcmd and escapeshellarg.

Also, a very compelling example is JSON. The rules are so numerous and complicated that you would never be able to follow them all manually. That's why you should never ever create a JSON string manually, but always use a dedicated function, json_encode() that will correctly format every bit of data.

And so on and so forth ...

The only case where you need to actively filter data, is if you're accepting preformatted input. For example, if you let your users post HTML markup, that you plan to display on the site. However, you should be wise to avoid this at all cost, since no matter how well you filter it, it will always be a potential security hole.

Solution 2

Do not try to prevent SQL injection by sanitizing input data.

Instead, do not allow data to be used in creating your SQL code. Use Prepared Statements (i.e. using parameters in a template query) that uses bound variables. It is the only way to be guaranteed against SQL injection.

Please see my website http://bobby-tables.com/ for more about preventing SQL injection.

Solution 3

No. You can't generically filter data without any context of what it's for. Sometimes you'd want to take a SQL query as input and sometimes you'd want to take HTML as input.

You need to filter input on a whitelist -- ensure that the data matches some specification of what you expect. Then you need to escape it before you use it, depending on the context in which you are using it.

The process of escaping data for SQL - to prevent SQL injection - is very different from the process of escaping data for (X)HTML, to prevent XSS.

Solution 4

PHP has the new nice filter_input functions now, that for instance liberate you from finding 'the ultimate e-mail regex' now that there is a built-in FILTER_VALIDATE_EMAIL type

My own filter class (uses JavaScript to highlight faulty fields) can be initiated by either an ajax request or normal form post. (see the example below) <? /** * Pork Formvalidator. validates fields by regexes and can sanitize them. Uses PHP filter_var built-in functions and extra regexes * @package pork */

/**
 *  Pork.FormValidator
 *  Validates arrays or properties by setting up simple arrays. 
 *  Note that some of the regexes are for dutch input!
 *  Example:
 * 
 *  $validations = array('name' => 'anything','email' => 'email','alias' => 'anything','pwd'=>'anything','gsm' => 'phone','birthdate' => 'date');
 *  $required = array('name', 'email', 'alias', 'pwd');
 *  $sanitize = array('alias');
 *
 *  $validator = new FormValidator($validations, $required, $sanitize);
 *                  
 *  if($validator->validate($_POST))
 *  {
 *      $_POST = $validator->sanitize($_POST);
 *      // now do your saving, $_POST has been sanitized.
 *      die($validator->getScript()."<script type='text/javascript'>alert('saved changes');</script>");
 *  }
 *  else
 *  {
 *      die($validator->getScript());
 *  }   
 *  
 * To validate just one element:
 * $validated = new FormValidator()->validate('blah@bla.', 'email');
 * 
 * To sanitize just one element:
 * $sanitized = new FormValidator()->sanitize('<b>blah</b>', 'string');
 * 
 * @package pork
 * @author SchizoDuckie
 * @copyright SchizoDuckie 2008
 * @version 1.0
 * @access public
 */
class FormValidator
{
    public static $regexes = Array(
            'date' => "^[0-9]{1,2}[-/][0-9]{1,2}[-/][0-9]{4}\$",
            'amount' => "^[-]?[0-9]+\$",
            'number' => "^[-]?[0-9,]+\$",
            'alfanum' => "^[0-9a-zA-Z ,.-_\\s\?\!]+\$",
            'not_empty' => "[a-z0-9A-Z]+",
            'words' => "^[A-Za-z]+[A-Za-z \\s]*\$",
            'phone' => "^[0-9]{10,11}\$",
            'zipcode' => "^[1-9][0-9]{3}[a-zA-Z]{2}\$",
            'plate' => "^([0-9a-zA-Z]{2}[-]){2}[0-9a-zA-Z]{2}\$",
            'price' => "^[0-9.,]*(([.,][-])|([.,][0-9]{2}))?\$",
            '2digitopt' => "^\d+(\,\d{2})?\$",
            '2digitforce' => "^\d+\,\d\d\$",
            'anything' => "^[\d\D]{1,}\$"
    );
    private $validations, $sanatations, $mandatories, $errors, $corrects, $fields;
    

    public function __construct($validations=array(), $mandatories = array(), $sanatations = array())
    {
        $this->validations = $validations;
        $this->sanitations = $sanitations;
        $this->mandatories = $mandatories;
        $this->errors = array();
        $this->corrects = array();
    }

    /**
     * Validates an array of items (if needed) and returns true or false
     *
     */
    public function validate($items)
    {
        $this->fields = $items;
        $havefailures = false;
        foreach($items as $key=>$val)
        {
            if((strlen($val) == 0 || array_search($key, $this->validations) === false) && array_search($key, $this->mandatories) === false) 
            {
                $this->corrects[] = $key;
                continue;
            }
            $result = self::validateItem($val, $this->validations[$key]);
            if($result === false) {
                $havefailures = true;
                $this->addError($key, $this->validations[$key]);
            }
            else
            {
                $this->corrects[] = $key;
            }
        }
    
        return(!$havefailures);
    }

    /**
     *
     *  Adds unvalidated class to thos elements that are not validated. Removes them from classes that are.
     */
    public function getScript() {
        if(!empty($this->errors))
        {
            $errors = array();
            foreach($this->errors as $key=>$val) { $errors[] = "'INPUT[name={$key}]'"; }

            $output = '$$('.implode(',', $errors).').addClass("unvalidated");'; 
            $output .= "new FormValidator().showMessage();";
        }
        if(!empty($this->corrects))
        {
            $corrects = array();
            foreach($this->corrects as $key) { $corrects[] = "'INPUT[name={$key}]'"; }
            $output .= '$$('.implode(',', $corrects).').removeClass("unvalidated");';   
        }
        $output = "<script type='text/javascript'>{$output} </script>";
        return($output);
    }


    /**
     *
     * Sanitizes an array of items according to the $this->sanitations
     * sanitations will be standard of type string, but can also be specified.
     * For ease of use, this syntax is accepted:
     * $sanitations = array('fieldname', 'otherfieldname'=>'float');
     */
    public function sanitize($items)
    {
        foreach($items as $key=>$val)
        {
            if(array_search($key, $this->sanitations) === false && !array_key_exists($key, $this->sanitations)) continue;
            $items[$key] = self::sanitizeItem($val, $this->validations[$key]);
        }
        return($items);
    }


    /**
     *
     * Adds an error to the errors array.
     */ 
    private function addError($field, $type='string')
    {
        $this->errors[$field] = $type;
    }

    /**
     *
     * Sanitize a single var according to $type.
     * Allows for static calling to allow simple sanitization
     */
    public static function sanitizeItem($var, $type)
    {
        $flags = NULL;
        switch($type)
        {
            case 'url':
                $filter = FILTER_SANITIZE_URL;
            break;
            case 'int':
                $filter = FILTER_SANITIZE_NUMBER_INT;
            break;
            case 'float':
                $filter = FILTER_SANITIZE_NUMBER_FLOAT;
                $flags = FILTER_FLAG_ALLOW_FRACTION | FILTER_FLAG_ALLOW_THOUSAND;
            break;
            case 'email':
                $var = substr($var, 0, 254);
                $filter = FILTER_SANITIZE_EMAIL;
            break;
            case 'string':
            default:
                $filter = FILTER_SANITIZE_STRING;
                $flags = FILTER_FLAG_NO_ENCODE_QUOTES;
            break;
             
        }
        $output = filter_var($var, $filter, $flags);        
        return($output);
    }
    
    /** 
     *
     * Validates a single var according to $type.
     * Allows for static calling to allow simple validation.
     *
     */
    public static function validateItem($var, $type)
    {
        if(array_key_exists($type, self::$regexes))
        {
            $returnval =  filter_var($var, FILTER_VALIDATE_REGEXP, array("options"=> array("regexp"=>'!'.self::$regexes[$type].'!i'))) !== false;
            return($returnval);
        }
        $filter = false;
        switch($type)
        {
            case 'email':
                $var = substr($var, 0, 254);
                $filter = FILTER_VALIDATE_EMAIL;    
            break;
            case 'int':
                $filter = FILTER_VALIDATE_INT;
            break;
            case 'boolean':
                $filter = FILTER_VALIDATE_BOOLEAN;
            break;
            case 'ip':
                $filter = FILTER_VALIDATE_IP;
            break;
            case 'url':
                $filter = FILTER_VALIDATE_URL;
            break;
        }
        return ($filter === false) ? false : filter_var($var, $filter) !== false ? true : false;
    }       
    


}

Of course, keep in mind that you need to do your sql query escaping too depending on what type of db your are using (mysql_real_escape_string() is useless for an sql server for instance). You probably want to handle this automatically at your appropriate application layer like an ORM. Also, as mentioned above: for outputting to html use the other php dedicated functions like htmlspecialchars ;)

For really allowing HTML input with like stripped classes and/or tags depend on one of the dedicated xss validation packages. DO NOT WRITE YOUR OWN REGEXES TO PARSE HTML!

Solution 5

No, there is not.

First of all, SQL injection is an input filtering problem, and XSS is an output escaping one - so you wouldn't even execute these two operations at the same time in the code lifecycle.

Basic rules of thumb

For SQL query, bind parameters (as with PDO) or use a driver-native escaping function for query variables (such as mysql_real_escape_string())
Use strip_tags() to filter out unwanted HTML
Escape all other output with htmlspecialchars() and be mindful of the 2nd and 3rd parameters here.

View more solutions

546

Author by

CStreel

Updated on July 08, 2022

Comments

CStreel almost 2 years

I would like to set the text on the MainPage of my app, based on the response of an Async call to Web Service.

Im getting a "The application called an interface that was marshalled for a different thread". So I know that I need to execute the

MainPage.TB_Response.text = response;

On the Primary/Main Thread, but I am unsure as to how i would go about this

Edit: Here is my Response Handler

    private void ReadResponse(IAsyncResult asyncResult)
    {
        System.Diagnostics.Debug.WriteLine("ReadResponse");
        try
        {
            // The downloaded resource ends up in the variable named content. 
            var content = new MemoryStream();

            // State of request is asynchronous.
            //RequestState myRequestState = (RequestState)asyncResult.AsyncState;
            HttpWebRequest myHttpWebRequest2 = (HttpWebRequest)asyncResult.AsyncState;
            HttpWebResponse response = (HttpWebResponse)myHttpWebRequest2.EndGetResponse(asyncResult);

            //do whatever
            using (Stream responseStream = response.GetResponseStream())
            {
                responseStream.CopyTo(content);
                byte[] data = content.ToArray();
                if (data.Length > 0)
                {
                    string temp = System.Text.Encoding.UTF8.GetString(data, 0, data.Length);
                    MainPage.TB_Reponse.Text = temp;
                    System.Diagnostics.Debug.WriteLine(temp);
                }
            }

        }
        catch (WebException e)
        {

            System.Diagnostics.Debug.WriteLine(e.Message);
        }
    }

Edit2: My MainPage Class

public sealed partial class MainPage : Page
{
    public static TextBlock TB_Reponse;
    public MainPage()
    {
        this.InitializeComponent();
        MainPage.TB_Reponse = this.TB_Response;
    }

    protected override void OnNavigatedTo(NavigationEventArgs e)
    {

    }

    private void BTN_Login_Click(object sender, RoutedEventArgs e)
    {
        ...
    }
}

Kirill Bestemyanov over 11 years

Could you show your asynchronous call?

paan over 15 years

there is no "best way" to do something like sanitizing input.. Use some library, html purifier is good. These libraries have been pounded on many times. So it is much more bulletproof than anything ou can come up yourself
Bobby Jack over 15 years

"This means that every single echo or print statement should use htmlspecialchars" - of course, you mean "every ... statement outputting user input"; htmlspecialchars()-ifying "echo 'Hello, world!';" would be crazy ;)
Kornel over 14 years

There's one case where I think filtering is the right solution: UTF-8. You don't want invalid UTF-8 sequences all over your application (you might get different error recovery depending on code path), and UTF-8 can be filtered (or rejected) easily.
Amit Patil over 14 years

@porneL: Yes, and it can also be worthwhile to filter out control characters other than newline at this point. However given that most PHP apps can't even get the HTML-escaping right yet I'm not going to push the overlong UTF-8 sequence issue (they're only really an issue in IE6 pre-Service-Pack-2 and old Operas).
Axel M. Garcia almost 14 years

Although your answer is helpful, HTML can and is successfully filtered for XSS in numerous applications. E.g. Comment systems in blog software such as WordPress.
Axel M. Garcia almost 14 years

See also bioinformatics.org/phplabware/internal_utilities/htmLawed . From my understanding WordPress uses an older version, core.trac.wordpress.org/browser/tags/2.9.2/wp-includes/kses.‌php
David O. over 13 years

I realize this is an old question, but as of PHP 5.2.0 PHP has introduced Filters (php.net/manual/en/book.filter.php) and the function filter_var(), which when passed a value and an appropriate filter will either sanitize or validate the supplied user input.
troelskn over 13 years

@david 5.2.0 was out when I made this answer.
Jens Roland almost 13 years

Hi Troels -- Thanks for the answer. Most of the frameworks & languages I use every day, including PHP, Javascript and the Apache web server, have failed spectacularly at 'sanitizing' input in the past (magic quotes OMG!), and if they can't get it right, what chance do I have? Currently, I use prepared statements for all SQL I ever write, and for html I use a variant of htmlspecialchars, and frankly I'm still not 100% sure I'm not missing something. Security is HARD.
troelskn almost 13 years

@Jens Roland -- You're quite right; Security is hard. Trying to delegate it to a framework is probably not a good strategy. Much better to actually understand what we're dealing with. It sounds like you have the most common bases covered though.
rjmunro over 12 years

This looks like it might be a handy script for validating inputs, but it is completely irrelevant to the question.
jbyrd over 12 years

But will mysql_real_escape_string properly handle the following: $sub = mysql_real_escape_string("%something"); // still %something mysql_query("SELECT * FROM messages WHERE subject LIKE '{$sub}%'"); ...?
troelskn over 12 years

@jbyrd - no, LIKE uses a specialised regexp language. You will have to escape your input string twice - once for the regexp and once for the mysql string encoding. It's code within code within code.
Jeff Brand over 11 years

You can also wrap call on one thread with the CoreDispatcher.RunAsyc method that will put the delegate code onto the UI thread.
CStreel over 11 years

Are you referring to wrapping the response handler or the UI Interaction code with CoreDispatcher?
Jeff Brand over 11 years

Whatever code you want to have run on the UI thread - i.e., anything that touches UI controls
Robert Mark Bram over 11 years

So you only use strip_tags() or htmlspecialchars() when you know that the input has HTML that you want to get rid of or escape respectively - you are not using it for any security purpose right? Also, when you do the bind, what does it do for stuff like Bobby Tables? "Robert'); DROP TABLE Students;--" Does it just escape the quotes?
Ijas Ameenudeen about 11 years

before you use mysql_real_escape_string , you should be connected to a database.
Marcel Korpel almost 11 years

At this moment mysql_real_escape_string is deprecated. It's considered good practice nowadays to use prepared statements to prevent SQL injection. So switch to either MySQLi or PDO.
Duc Tran almost 11 years

I use $id = intval($id) instead :)
Qix - MONICA WAS MISTREATED about 10 years

I haven't touched PHP for a long while; coming back to it and seeing this answer about the deprecation of magic quotes makes me happy.
jbo5112 almost 10 years

While useful, this doesn't answer the actual question. They wanted to allow some HTML tags in the input. The only advice here on how to do that is to consider not allowing them by using htmlspecialchars. Supporting them may be some sort of customer requirement. I have seen a number of websites that support some HTML markup (e.g. slashdot.org) on input, so I can only assume it's possible.
jbo5112 almost 10 years

If you have user data that will go into a database and later be displayed on web pages, isn't it usually read a lot more than it's written? To me, it makes more sense to filter it once (as input) before you store it, instead of having to filter it every time you display it. Am I missing something or did a bunch of people vote for needless performance overhead in this and the accepted answer?
Jo Smo almost 10 years

@troelskn The only case where you need to actively filter data, is if you're accepting preformatted input. Eg. if you let your users post HTML markup, that you plan to display on the site. but if you don't sanitize the input (a $_POST for example), then a user can always input html right? So... Here you are saying that you should sanitize every user input, because a user can enter html code anywhere if you don't sanitize it. Or have i gotten this wrong somehow?
Jo Smo almost 10 years

Best answer for me. It's short and addresses the question well if you ask me. Is it possible to attack PHP somehow via $_POST or $_GET with some injection or is this impossible?
troelskn almost 10 years

@tastro No, you don't sanitize when data is input - you sanitize when it's used. E.g. as late as possible. That will give you the best level of security.
Jo Smo almost 10 years

@troelskn could you tell me why it will give me the best level of security if i do it as late as possible? Thanks!
troelskn almost 10 years

Because you limit the attack surface. If you sanitize early (when input), you have to be certain that there are no other holes in the application where bad data could enter through. Whereas if you do it late, then your output function doesn't have to "trust" that it is given safe data - it simply assumes that everything is unsafe.
a coder over 9 years

Or visit the official documentation and learn PDO and prepared statements. Tiny learning curve, but if you know SQL pretty well, you'll have no trouble adapting.
test over 9 years

Casting integer is a good way to ensure only numerical data is inserted.
Scott Arciszewski almost 9 years

For the specific case of SQL Injection, this is the correct answer!
Alan Mattano almost 9 years

@troelskn can you include "escape" code for each of the examples?
cryptic ツ almost 9 years

pg_escape_literal() is the recommended function to use for PostgreSQL.
Admin over 8 years

Question: Would you need to sanitize/validate user input if you were using that input to make a curl post request?
troelskn over 8 years

@sudosoul probably not. Depending on how you pass data to curl, you will need to serialize/encode it properly though.
Basic over 8 years

Note that prepared statements don't add any security, parameterised queries do. They just happen to be very easy to use together in PHP.
Ramon Bakker about 8 years

Its not the only guaranteed way. Hex the input and unhex in query will prevent also. Also hex attacks are not possible if you use hexing right.
mwfearnley almost 8 years

Should I infer from this answer that (arbitrary) input categorically cannot be filtered? I don't follow.
tereško over 7 years

@troelskn I was mostly thinking about the SQL part. As in: use of prepared statements instead of escaping.
vladkras over 7 years

$id = (int)$_GET['id'] and $que = sprintf('SELECT ... WHERE id="%d"', $id) is good too
Abraham Brookes over 7 years

What if you're inputting something specialized, like email addresses or usernames?
Si8 about 6 years

I did the following: <input type="hidden" name="my-val" value="<?= htmlspecialchars($_REQUEST['my-val']); ?>"/> and the Url is http://example.com?my-val=<test>this and value is still <test>this in the hidden statement
Teson almost 6 years

If you need dynamic order by PDO doesn't support it so you still need to take height for that with input validation
oldboy over 4 years

so htmlspecialchars() and prepared statements are enough to sanitize user input from input elements that is inserted into the db?!
oldboy over 4 years

@Si8 apparently <?= has been removed from PHP 7+
drtechno over 4 years

oh yes, the $post and $get arrays accept all characters, but some of those characters can be used against you if the character is allowed to be enumerated in the posted php page. so if you don't escape encapsulating characters ( like ", ' and ` ) it could open up an attack vector. the ` character is often missed, and can be used to form command line execution hacks. Sanitation will prevent user input hacking, but will not help you with web application firewall hacks.
drtechno over 4 years

one problem with that is that its not always a database attack, and all user input should be protected from the system. not just one language type. So on your sites, when you enumerate your $_POST data, even with using binding, it could escape out enough to execute shell or even other php code.
drtechno over 4 years

could mb_encode_numericentity be used instead? Since it encodes everything?
drtechno over 4 years

The problem with wordpress is that its not necessarily a php-sql injection attack that causes database breaches. Miss programmed plugins that store data that an xml query reveals secrets is more problematic.
symcbean over 4 years

"its not always a database attack" : "The transforms you apply to data to make it safe for inclusion in an SQL statement are completely different from those...."
symcbean over 4 years

"all user input should be protected from the system" : no the system should be protected from user input.
webaholik over 4 years

@drtechno - mb_encode_numericentity is discussed in the htmlspecialchars link on #3 XSS
drtechno over 4 years

well I ran out of words, but yes the input needs to be prevented from effecting the system operation. to clarify this...
Loduwijk over 4 years

@troelskn "If you sanitize early you have to be certain that there are no other holes in the application where bad data could enter through" You have the same problem when doing it late: You have to be certain that there are no other holes in the application where you are using the data. Intuitively, it seems easier to me to miss a data-usage than a data-input.
Jsowa almost 4 years

Both input and output should be sanitized.
Jsowa almost 4 years

I understood. The form of input is determined of the purpose and we can't sanitize input to fit everywhere. But I don't agree with this practice because even with inserted input (i.e. in database) it can be vulnerable for processing it. We should ensure to prevent any vulnerability at every stage of our data processing. Popular custom is that data we store in database is safe and a lot of developers don't care about sanitization of 'input' in storage.
symcbean almost 4 years

You did not understand. You are describing OUTPUT from PHP not INPUT.
Abel LIFAEFI MBULA over 3 years

From what I know, XSS is an output concern, not an input one.
webaholik over 3 years

@bam - you are correct, just don't miss a spot! Luckily most frameworks will handle for us when used properly.
Your Common Sense over 3 years

@BobbyJack Hope in the course of those 12 years you've learned enough to disavow your comment. What would be indeed crazy is to distinctly judge every piece of data and decide whether it's pure enough to escape the common treatment, under a constant danger of making a human error in such a judgement.
Bobby Jack over 3 years

@YourCommonSense Thanks for the feedback, although it comes across as a little confrontational, especially for this site! I wouldn't disavow that comment, though. I don't think "echo 'hello world';" should be written "echo htmlspecialchars('hello world');". Obviously, I take your point, but I think it's taking defensive programming to an extreme. I'd be very interested to see an example of that approach in a significant codebase, though, if you have one.
Your Common Sense over 3 years

@BobbyJack I understand you didn't mean any harm and all you wanted is to make a quip. But people often underestimate the consequences. You mentioned "user input" in your comment while this term is vague and uncertain, often being completely misunderstood, as it's proven by the link I provided. That's why modern template engines are "taking defensive programming to an extreme" with their auto-escaping feature, completely disregarding the input source. Your distorted example is correct but your broader statement is not. That's the problem with information on Stack Overflow at whole.
Your Common Sense over 3 years

Take, for example, the notorious SQL "escaping" case. In some narrow context it could help against SQL injection, BUT it in the mass conscience it was extended to being an all-embracing protection measure. With disastrous consequences. Even making technically correct statements that escaping only help for "strings" doesn't help as people just overlook this, having a vague idea on what an SQL string is. It requires helluvalot of explanation to explain how SQL injection protection that involves escaping works. That's why there are parameterized queries that do 100% protection when applicable
mopsyd over 3 years

perhaps if (isset($_GET['id']) { if !( (int) $_GET['id'] === intval($_GET['id'] ) ) { throw new \InvalidArgumentException('Invalid page id format'); } /* use a prepared statement for insert here */ }; might suit you. I prefer to make no database call at all if I can identify that a parameter is definitely not valid based on known schema it is being handed to.
mercury over 2 years

I don’t agree with using ORM , it’s over engineering imo.
Reham Fahmy over 2 years

@PHP >= 8.0 gives error Parse error: syntax error, unexpected '->' (T_OBJECT_OPERATOR)
SchizoDuckie almost 2 years

@Reham Fahmy: This code is from 2008. It's 2022 now. Don't Use this. Use a framework.