How can I sanitize user input with PHP?
Solution 1
It's a common misconception that user input can be filtered. PHP even has a (now deprecated) "feature", called magic-quotes, that builds on this idea. It's nonsense. Forget about filtering (or cleaning, or whatever people call it).
What you should do, to avoid problems, is quite simple: whenever you embed a a piece of data within a foreign code, you must treat it according to the formatting rules of that code. But you must understand that such rules could be too complicated to try to follow them all manually. For example, in SQL, rules for strings, numbers and identifiers are all different. For your convenience, in most cases there is a dedicated tool for such an embedding. For example, when you need to use a PHP variable in the SQL query, you have to use a prepared statement, that will take care of all the proper formatting/treatment.
Another example is HTML: If you embed strings within HTML markup, you must escape it with htmlspecialchars
. This means that every single echo
or print
statement should use htmlspecialchars
.
A third example could be shell commands: If you are going to embed strings (such as arguments) to external commands, and call them with exec
, then you must use escapeshellcmd
and escapeshellarg
.
Also, a very compelling example is JSON. The rules are so numerous and complicated that you would never be able to follow them all manually. That's why you should never ever create a JSON string manually, but always use a dedicated function, json_encode()
that will correctly format every bit of data.
And so on and so forth ...
The only case where you need to actively filter data, is if you're accepting preformatted input. For example, if you let your users post HTML markup, that you plan to display on the site. However, you should be wise to avoid this at all cost, since no matter how well you filter it, it will always be a potential security hole.
Solution 2
Do not try to prevent SQL injection by sanitizing input data.
Instead, do not allow data to be used in creating your SQL code. Use Prepared Statements (i.e. using parameters in a template query) that uses bound variables. It is the only way to be guaranteed against SQL injection.
Please see my website http://bobby-tables.com/ for more about preventing SQL injection.
Solution 3
No. You can't generically filter data without any context of what it's for. Sometimes you'd want to take a SQL query as input and sometimes you'd want to take HTML as input.
You need to filter input on a whitelist -- ensure that the data matches some specification of what you expect. Then you need to escape it before you use it, depending on the context in which you are using it.
The process of escaping data for SQL - to prevent SQL injection - is very different from the process of escaping data for (X)HTML, to prevent XSS.
Solution 4
PHP has the new nice filter_input
functions now, that for instance liberate you from finding 'the ultimate e-mail regex' now that there is a built-in FILTER_VALIDATE_EMAIL
type
My own filter class (uses JavaScript to highlight faulty fields) can be initiated by either an ajax request or normal form post. (see the example below) <? /** * Pork Formvalidator. validates fields by regexes and can sanitize them. Uses PHP filter_var built-in functions and extra regexes * @package pork */
/**
* Pork.FormValidator
* Validates arrays or properties by setting up simple arrays.
* Note that some of the regexes are for dutch input!
* Example:
*
* $validations = array('name' => 'anything','email' => 'email','alias' => 'anything','pwd'=>'anything','gsm' => 'phone','birthdate' => 'date');
* $required = array('name', 'email', 'alias', 'pwd');
* $sanitize = array('alias');
*
* $validator = new FormValidator($validations, $required, $sanitize);
*
* if($validator->validate($_POST))
* {
* $_POST = $validator->sanitize($_POST);
* // now do your saving, $_POST has been sanitized.
* die($validator->getScript()."<script type='text/javascript'>alert('saved changes');</script>");
* }
* else
* {
* die($validator->getScript());
* }
*
* To validate just one element:
* $validated = new FormValidator()->validate('blah@bla.', 'email');
*
* To sanitize just one element:
* $sanitized = new FormValidator()->sanitize('<b>blah</b>', 'string');
*
* @package pork
* @author SchizoDuckie
* @copyright SchizoDuckie 2008
* @version 1.0
* @access public
*/
class FormValidator
{
public static $regexes = Array(
'date' => "^[0-9]{1,2}[-/][0-9]{1,2}[-/][0-9]{4}\$",
'amount' => "^[-]?[0-9]+\$",
'number' => "^[-]?[0-9,]+\$",
'alfanum' => "^[0-9a-zA-Z ,.-_\\s\?\!]+\$",
'not_empty' => "[a-z0-9A-Z]+",
'words' => "^[A-Za-z]+[A-Za-z \\s]*\$",
'phone' => "^[0-9]{10,11}\$",
'zipcode' => "^[1-9][0-9]{3}[a-zA-Z]{2}\$",
'plate' => "^([0-9a-zA-Z]{2}[-]){2}[0-9a-zA-Z]{2}\$",
'price' => "^[0-9.,]*(([.,][-])|([.,][0-9]{2}))?\$",
'2digitopt' => "^\d+(\,\d{2})?\$",
'2digitforce' => "^\d+\,\d\d\$",
'anything' => "^[\d\D]{1,}\$"
);
private $validations, $sanatations, $mandatories, $errors, $corrects, $fields;
public function __construct($validations=array(), $mandatories = array(), $sanatations = array())
{
$this->validations = $validations;
$this->sanitations = $sanitations;
$this->mandatories = $mandatories;
$this->errors = array();
$this->corrects = array();
}
/**
* Validates an array of items (if needed) and returns true or false
*
*/
public function validate($items)
{
$this->fields = $items;
$havefailures = false;
foreach($items as $key=>$val)
{
if((strlen($val) == 0 || array_search($key, $this->validations) === false) && array_search($key, $this->mandatories) === false)
{
$this->corrects[] = $key;
continue;
}
$result = self::validateItem($val, $this->validations[$key]);
if($result === false) {
$havefailures = true;
$this->addError($key, $this->validations[$key]);
}
else
{
$this->corrects[] = $key;
}
}
return(!$havefailures);
}
/**
*
* Adds unvalidated class to thos elements that are not validated. Removes them from classes that are.
*/
public function getScript() {
if(!empty($this->errors))
{
$errors = array();
foreach($this->errors as $key=>$val) { $errors[] = "'INPUT[name={$key}]'"; }
$output = '$$('.implode(',', $errors).').addClass("unvalidated");';
$output .= "new FormValidator().showMessage();";
}
if(!empty($this->corrects))
{
$corrects = array();
foreach($this->corrects as $key) { $corrects[] = "'INPUT[name={$key}]'"; }
$output .= '$$('.implode(',', $corrects).').removeClass("unvalidated");';
}
$output = "<script type='text/javascript'>{$output} </script>";
return($output);
}
/**
*
* Sanitizes an array of items according to the $this->sanitations
* sanitations will be standard of type string, but can also be specified.
* For ease of use, this syntax is accepted:
* $sanitations = array('fieldname', 'otherfieldname'=>'float');
*/
public function sanitize($items)
{
foreach($items as $key=>$val)
{
if(array_search($key, $this->sanitations) === false && !array_key_exists($key, $this->sanitations)) continue;
$items[$key] = self::sanitizeItem($val, $this->validations[$key]);
}
return($items);
}
/**
*
* Adds an error to the errors array.
*/
private function addError($field, $type='string')
{
$this->errors[$field] = $type;
}
/**
*
* Sanitize a single var according to $type.
* Allows for static calling to allow simple sanitization
*/
public static function sanitizeItem($var, $type)
{
$flags = NULL;
switch($type)
{
case 'url':
$filter = FILTER_SANITIZE_URL;
break;
case 'int':
$filter = FILTER_SANITIZE_NUMBER_INT;
break;
case 'float':
$filter = FILTER_SANITIZE_NUMBER_FLOAT;
$flags = FILTER_FLAG_ALLOW_FRACTION | FILTER_FLAG_ALLOW_THOUSAND;
break;
case 'email':
$var = substr($var, 0, 254);
$filter = FILTER_SANITIZE_EMAIL;
break;
case 'string':
default:
$filter = FILTER_SANITIZE_STRING;
$flags = FILTER_FLAG_NO_ENCODE_QUOTES;
break;
}
$output = filter_var($var, $filter, $flags);
return($output);
}
/**
*
* Validates a single var according to $type.
* Allows for static calling to allow simple validation.
*
*/
public static function validateItem($var, $type)
{
if(array_key_exists($type, self::$regexes))
{
$returnval = filter_var($var, FILTER_VALIDATE_REGEXP, array("options"=> array("regexp"=>'!'.self::$regexes[$type].'!i'))) !== false;
return($returnval);
}
$filter = false;
switch($type)
{
case 'email':
$var = substr($var, 0, 254);
$filter = FILTER_VALIDATE_EMAIL;
break;
case 'int':
$filter = FILTER_VALIDATE_INT;
break;
case 'boolean':
$filter = FILTER_VALIDATE_BOOLEAN;
break;
case 'ip':
$filter = FILTER_VALIDATE_IP;
break;
case 'url':
$filter = FILTER_VALIDATE_URL;
break;
}
return ($filter === false) ? false : filter_var($var, $filter) !== false ? true : false;
}
}
Of course, keep in mind that you need to do your sql query escaping too depending on what type of db your are using (mysql_real_escape_string() is useless for an sql server for instance). You probably want to handle this automatically at your appropriate application layer like an ORM. Also, as mentioned above: for outputting to html use the other php dedicated functions like htmlspecialchars ;)
For really allowing HTML input with like stripped classes and/or tags depend on one of the dedicated xss validation packages. DO NOT WRITE YOUR OWN REGEXES TO PARSE HTML!
Solution 5
No, there is not.
First of all, SQL injection is an input filtering problem, and XSS is an output escaping one - so you wouldn't even execute these two operations at the same time in the code lifecycle.
Basic rules of thumb
- For SQL query, bind parameters (as with PDO) or use a driver-native escaping function for query variables (such as
mysql_real_escape_string()
) - Use
strip_tags()
to filter out unwanted HTML - Escape all other output with
htmlspecialchars()
and be mindful of the 2nd and 3rd parameters here.
CStreel
Updated on July 08, 2022Comments
-
CStreel almost 2 years
I would like to set the text on the
MainPage
of my app, based on the response of an Async call to Web Service.Im getting a "The application called an interface that was marshalled for a different thread". So I know that I need to execute the
MainPage.TB_Response.text = response;
On the Primary/Main Thread, but I am unsure as to how i would go about this
Edit: Here is my Response Handler
private void ReadResponse(IAsyncResult asyncResult) { System.Diagnostics.Debug.WriteLine("ReadResponse"); try { // The downloaded resource ends up in the variable named content. var content = new MemoryStream(); // State of request is asynchronous. //RequestState myRequestState = (RequestState)asyncResult.AsyncState; HttpWebRequest myHttpWebRequest2 = (HttpWebRequest)asyncResult.AsyncState; HttpWebResponse response = (HttpWebResponse)myHttpWebRequest2.EndGetResponse(asyncResult); //do whatever using (Stream responseStream = response.GetResponseStream()) { responseStream.CopyTo(content); byte[] data = content.ToArray(); if (data.Length > 0) { string temp = System.Text.Encoding.UTF8.GetString(data, 0, data.Length); MainPage.TB_Reponse.Text = temp; System.Diagnostics.Debug.WriteLine(temp); } } } catch (WebException e) { System.Diagnostics.Debug.WriteLine(e.Message); } }
Edit2: My MainPage Class
public sealed partial class MainPage : Page { public static TextBlock TB_Reponse; public MainPage() { this.InitializeComponent(); MainPage.TB_Reponse = this.TB_Response; } protected override void OnNavigatedTo(NavigationEventArgs e) { } private void BTN_Login_Click(object sender, RoutedEventArgs e) { ... } }
-
Kirill Bestemyanov over 11 yearsCould you show your asynchronous call?
-
-
paan over 15 yearsthere is no "best way" to do something like sanitizing input.. Use some library, html purifier is good. These libraries have been pounded on many times. So it is much more bulletproof than anything ou can come up yourself
-
Bobby Jack over 15 years"This means that every single echo or print statement should use htmlspecialchars" - of course, you mean "every ... statement outputting user input"; htmlspecialchars()-ifying "echo 'Hello, world!';" would be crazy ;)
-
Kornel over 14 yearsThere's one case where I think filtering is the right solution: UTF-8. You don't want invalid UTF-8 sequences all over your application (you might get different error recovery depending on code path), and UTF-8 can be filtered (or rejected) easily.
-
Amit Patil over 14 years@porneL: Yes, and it can also be worthwhile to filter out control characters other than newline at this point. However given that most PHP apps can't even get the HTML-escaping right yet I'm not going to push the overlong UTF-8 sequence issue (they're only really an issue in IE6 pre-Service-Pack-2 and old Operas).
-
Axel M. Garcia almost 14 yearsAlthough your answer is helpful, HTML can and is successfully filtered for XSS in numerous applications. E.g. Comment systems in blog software such as WordPress.
-
Axel M. Garcia almost 14 yearsSee also bioinformatics.org/phplabware/internal_utilities/htmLawed . From my understanding WordPress uses an older version, core.trac.wordpress.org/browser/tags/2.9.2/wp-includes/kses.php
-
David O. over 13 yearsI realize this is an old question, but as of PHP 5.2.0 PHP has introduced Filters (php.net/manual/en/book.filter.php) and the function filter_var(), which when passed a value and an appropriate filter will either sanitize or validate the supplied user input.
-
troelskn over 13 years@david 5.2.0 was out when I made this answer.
-
Jens Roland almost 13 yearsHi Troels -- Thanks for the answer. Most of the frameworks & languages I use every day, including PHP, Javascript and the Apache web server, have failed spectacularly at 'sanitizing' input in the past (magic quotes OMG!), and if they can't get it right, what chance do I have? Currently, I use prepared statements for all SQL I ever write, and for html I use a variant of htmlspecialchars, and frankly I'm still not 100% sure I'm not missing something. Security is HARD.
-
troelskn almost 13 years@Jens Roland -- You're quite right; Security is hard. Trying to delegate it to a framework is probably not a good strategy. Much better to actually understand what we're dealing with. It sounds like you have the most common bases covered though.
-
rjmunro over 12 yearsThis looks like it might be a handy script for validating inputs, but it is completely irrelevant to the question.
-
jbyrd over 12 yearsBut will mysql_real_escape_string properly handle the following: $sub = mysql_real_escape_string("%something"); // still %something mysql_query("SELECT * FROM messages WHERE subject LIKE '{$sub}%'"); ...?
-
troelskn over 12 years@jbyrd - no, LIKE uses a specialised regexp language. You will have to escape your input string twice - once for the regexp and once for the mysql string encoding. It's code within code within code.
-
Jeff Brand over 11 yearsYou can also wrap call on one thread with the CoreDispatcher.RunAsyc method that will put the delegate code onto the UI thread.
-
CStreel over 11 yearsAre you referring to wrapping the response handler or the UI Interaction code with CoreDispatcher?
-
Jeff Brand over 11 yearsWhatever code you want to have run on the UI thread - i.e., anything that touches UI controls
-
Robert Mark Bram over 11 yearsSo you only use strip_tags() or htmlspecialchars() when you know that the input has HTML that you want to get rid of or escape respectively - you are not using it for any security purpose right? Also, when you do the bind, what does it do for stuff like Bobby Tables? "Robert'); DROP TABLE Students;--" Does it just escape the quotes?
-
Ijas Ameenudeen about 11 yearsbefore you use
mysql_real_escape_string
, you should be connected to a database. -
Marcel Korpel almost 11 yearsAt this moment
mysql_real_escape_string
is deprecated. It's considered good practice nowadays to use prepared statements to prevent SQL injection. So switch to either MySQLi or PDO. -
Duc Tran almost 11 yearsI use $id = intval($id) instead :)
-
Qix - MONICA WAS MISTREATED about 10 yearsI haven't touched PHP for a long while; coming back to it and seeing this answer about the deprecation of magic quotes makes me happy.
-
jbo5112 almost 10 yearsWhile useful, this doesn't answer the actual question. They wanted to allow some HTML tags in the input. The only advice here on how to do that is to consider not allowing them by using htmlspecialchars. Supporting them may be some sort of customer requirement. I have seen a number of websites that support some HTML markup (e.g. slashdot.org) on input, so I can only assume it's possible.
-
jbo5112 almost 10 yearsIf you have user data that will go into a database and later be displayed on web pages, isn't it usually read a lot more than it's written? To me, it makes more sense to filter it once (as input) before you store it, instead of having to filter it every time you display it. Am I missing something or did a bunch of people vote for needless performance overhead in this and the accepted answer?
-
Jo Smo almost 10 years@troelskn
The only case where you need to actively filter data, is if you're accepting preformatted input. Eg. if you let your users post HTML markup, that you plan to display on the site.
but if you don't sanitize the input (a $_POST for example), then a user can always input html right? So... Here you are saying that you should sanitize every user input, because a user can enter html code anywhere if you don't sanitize it. Or have i gotten this wrong somehow? -
Jo Smo almost 10 yearsBest answer for me. It's short and addresses the question well if you ask me. Is it possible to attack PHP somehow via $_POST or $_GET with some injection or is this impossible?
-
troelskn almost 10 years@tastro No, you don't sanitize when data is input - you sanitize when it's used. E.g. as late as possible. That will give you the best level of security.
-
Jo Smo almost 10 years@troelskn could you tell me
why
it will give me the best level of security if i do it as late as possible? Thanks! -
troelskn almost 10 yearsBecause you limit the attack surface. If you sanitize early (when input), you have to be certain that there are no other holes in the application where bad data could enter through. Whereas if you do it late, then your output function doesn't have to "trust" that it is given safe data - it simply assumes that everything is unsafe.
-
a coder over 9 yearsOr visit the official documentation and learn PDO and prepared statements. Tiny learning curve, but if you know SQL pretty well, you'll have no trouble adapting.
-
test over 9 yearsCasting integer is a good way to ensure only numerical data is inserted.
-
Scott Arciszewski almost 9 yearsFor the specific case of SQL Injection, this is the correct answer!
-
Alan Mattano almost 9 years@troelskn can you include "escape" code for each of the examples?
-
cryptic ツ almost 9 yearspg_escape_literal() is the recommended function to use for PostgreSQL.
-
Admin over 8 yearsQuestion: Would you need to sanitize/validate user input if you were using that input to make a curl post request?
-
troelskn over 8 years@sudosoul probably not. Depending on how you pass data to curl, you will need to serialize/encode it properly though.
-
Basic over 8 yearsNote that prepared statements don't add any security, parameterised queries do. They just happen to be very easy to use together in PHP.
-
Ramon Bakker about 8 yearsIts not the only guaranteed way. Hex the input and unhex in query will prevent also. Also hex attacks are not possible if you use hexing right.
-
mwfearnley almost 8 yearsShould I infer from this answer that (arbitrary) input categorically cannot be filtered? I don't follow.
-
tereško over 7 years@troelskn I was mostly thinking about the SQL part. As in: use of prepared statements instead of escaping.
-
vladkras over 7 years
$id = (int)$_GET['id']
and$que = sprintf('SELECT ... WHERE id="%d"', $id)
is good too -
Abraham Brookes over 7 yearsWhat if you're inputting something specialized, like email addresses or usernames?
-
Si8 about 6 yearsI did the following:
<input type="hidden" name="my-val" value="<?= htmlspecialchars($_REQUEST['my-val']); ?>"/>
and the Url ishttp://example.com?my-val=<test>this
and value is still<test>this
in the hidden statement -
Teson almost 6 yearsIf you need dynamic order by PDO doesn't support it so you still need to take height for that with input validation
-
oldboy over 4 yearsso
htmlspecialchars()
and prepared statements are enough to sanitize user input frominput
elements that is inserted into the db?! -
oldboy over 4 years@Si8 apparently
<?=
has been removed from PHP 7+ -
drtechno over 4 yearsoh yes, the $post and $get arrays accept all characters, but some of those characters can be used against you if the character is allowed to be enumerated in the posted php page. so if you don't escape encapsulating characters ( like ", ' and ` ) it could open up an attack vector. the ` character is often missed, and can be used to form command line execution hacks. Sanitation will prevent user input hacking, but will not help you with web application firewall hacks.
-
drtechno over 4 yearsone problem with that is that its not always a database attack, and all user input should be protected from the system. not just one language type. So on your sites, when you enumerate your $_POST data, even with using binding, it could escape out enough to execute shell or even other php code.
-
drtechno over 4 yearscould mb_encode_numericentity be used instead? Since it encodes everything?
-
drtechno over 4 yearsThe problem with wordpress is that its not necessarily a php-sql injection attack that causes database breaches. Miss programmed plugins that store data that an xml query reveals secrets is more problematic.
-
symcbean over 4 years"its not always a database attack" : "The transforms you apply to data to make it safe for inclusion in an SQL statement are completely different from those...."
-
symcbean over 4 years"all user input should be protected from the system" : no the system should be protected from user input.
-
webaholik over 4 years@drtechno -
mb_encode_numericentity
is discussed in thehtmlspecialchars
link on #3 XSS -
drtechno over 4 yearswell I ran out of words, but yes the input needs to be prevented from effecting the system operation. to clarify this...
-
Loduwijk over 4 years@troelskn "If you sanitize early you have to be certain that there are no other holes in the application where bad data could enter through" You have the same problem when doing it late: You have to be certain that there are no other holes in the application where you are using the data. Intuitively, it seems easier to me to miss a data-usage than a data-input.
-
Jsowa almost 4 yearsBoth input and output should be sanitized.
-
Jsowa almost 4 yearsI understood. The form of input is determined of the purpose and we can't sanitize input to fit everywhere. But I don't agree with this practice because even with inserted input (i.e. in database) it can be vulnerable for processing it. We should ensure to prevent any vulnerability at every stage of our data processing. Popular custom is that data we store in database is safe and a lot of developers don't care about sanitization of 'input' in storage.
-
symcbean almost 4 yearsYou did not understand. You are describing OUTPUT from PHP not INPUT.
-
Abel LIFAEFI MBULA over 3 yearsFrom what I know, XSS is an output concern, not an input one.
-
webaholik over 3 years@bam - you are correct, just don't miss a spot! Luckily most frameworks will handle for us when used properly.
-
Your Common Sense over 3 years@BobbyJack Hope in the course of those 12 years you've learned enough to disavow your comment. What would be indeed crazy is to distinctly judge every piece of data and decide whether it's pure enough to escape the common treatment, under a constant danger of making a human error in such a judgement.
-
Bobby Jack over 3 years@YourCommonSense Thanks for the feedback, although it comes across as a little confrontational, especially for this site! I wouldn't disavow that comment, though. I don't think "echo 'hello world';" should be written "echo htmlspecialchars('hello world');". Obviously, I take your point, but I think it's taking defensive programming to an extreme. I'd be very interested to see an example of that approach in a significant codebase, though, if you have one.
-
Your Common Sense over 3 years@BobbyJack I understand you didn't mean any harm and all you wanted is to make a quip. But people often underestimate the consequences. You mentioned "user input" in your comment while this term is vague and uncertain, often being completely misunderstood, as it's proven by the link I provided. That's why modern template engines are "taking defensive programming to an extreme" with their auto-escaping feature, completely disregarding the input source. Your distorted example is correct but your broader statement is not. That's the problem with information on Stack Overflow at whole.
-
Your Common Sense over 3 yearsTake, for example, the notorious SQL "escaping" case. In some narrow context it could help against SQL injection, BUT it in the mass conscience it was extended to being an all-embracing protection measure. With disastrous consequences. Even making technically correct statements that escaping only help for "strings" doesn't help as people just overlook this, having a vague idea on what an SQL string is. It requires helluvalot of explanation to explain how SQL injection protection that involves escaping works. That's why there are parameterized queries that do 100% protection when applicable
-
mopsyd over 3 yearsperhaps
if (isset($_GET['id']) { if !( (int) $_GET['id'] === intval($_GET['id'] ) ) { throw new \InvalidArgumentException('Invalid page id format'); } /* use a prepared statement for insert here */ };
might suit you. I prefer to make no database call at all if I can identify that a parameter is definitely not valid based on known schema it is being handed to. -
mercury over 2 yearsI don’t agree with using ORM , it’s over engineering imo.
-
Reham Fahmy over 2 years@PHP >= 8.0 gives error
Parse error: syntax error, unexpected '->' (T_OBJECT_OPERATOR)
-
SchizoDuckie almost 2 years@Reham Fahmy: This code is from 2008. It's 2022 now. Don't Use this. Use a framework.