Clean the string? is there any better way of doing it?
Solution 1
OK, consider the following test:
public class CleanString
{
//by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx
public static string UseRegex(string strIn)
{
// Replace invalid characters with empty strings.
return Regex.Replace(strIn, @"[^\w\.@-]", "");
}
// by Paolo Tedesco
public static String UseStringBuilder(string strIn)
{
const string removeChars = " ?&^$#@!()+-,:;<>’\'-_*";
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !removeChars.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
// by Paolo Tedesco, but using a HashSet
public static String UseStringBuilderWithHashSet(string strIn)
{
var hashSet = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*");
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
// by SteveDog
public static string UseStringBuilderWithHashSet2(string dirtyString)
{
HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*");
StringBuilder result = new StringBuilder(dirtyString.Length);
foreach (char c in dirtyString)
if (removeChars.Contains(c))
result.Append(c);
return result.ToString();
}
// original by patel.milanb
public static string UseReplace(string dirtyString)
{
string removeChars = " ?&^$#@!()+-,:;<>’\'-_*";
string result = dirtyString;
foreach (char c in removeChars)
{
result = result.Replace(c.ToString(), string.Empty);
}
return result;
}
// by L.B
public static string UseWhere(string dirtyString)
{
return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray());
}
}
static class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main()
{
var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssyd€sdöf";
var sw = new Stopwatch();
var iterations = 50000;
sw.Start();
for (var i = 0; i < iterations; i++)
CleanString.<SomeMethod>(dirtyString);
sw.Stop();
Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString());
sw.Reset();
....
<repeat>
....
}
}
Output
CleanString.UseReplace: 791
CleanString.UseStringBuilder: 2805
CleanString.UseStringBuilderWithHashSet: 521
CleanString.UseStringBuilderWithHashSet2: 331
CleanString.UseRegex: 1700
CleanString.UseWhere: 233
Conclusion
Does probably not matter which method you use.
The difference in time between the fasted (UseWhere
: 233ms) and the slowest (UseStringBuilder
: 2805ms) method is 2572ms when called 50000(!) times in a row. You should probably not need to care about it if don't run the method that often.
But if you do, use the UseWhere
method (written by L.B); but also note that it is slightly different.
Solution 2
If it's purely speed and efficiency you are after, I would recommend doing something like this:
public static string CleanString(string dirtyString)
{
HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>’\'-_*");
StringBuilder result = new StringBuilder(dirtyString.Length);
foreach (char c in dirtyString)
if (!removeChars.Contains(c)) // prevent dirty chars
result.Append(c);
return result.ToString();
}
RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToString
at the end). This will cut down on memory usage and increase the speed, especially on longer strings.
However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncode
instead of doing it yourself.
Solution 3
use regex [?&^$#@!()+-,:;<>’\'-_*]
for replacing with empty string
Solution 4
This one is even faster!
use:
string dirty=@"tfgtf$@$%gttg%$% 664%$";
string clean = dirty.Clean();
public static string Clean(this String name)
{
var namearray = new Char[name.Length];
var newIndex = 0;
for (var index = 0; index < namearray.Length; index++)
{
var letter = (Int32)name[index];
if (!((letter > 96 && letter < 123) || (letter > 64 && letter < 91) || (letter > 47 && letter < 58)))
continue;
namearray[newIndex] = (Char)letter;
++newIndex;
}
return new String(namearray).TrimEnd();
}
Solution 5
I don't know if, performance-wise, using a Regex
or LINQ would be an improvement.
Something that could be useful, would be to create the new string with a StringBuilder
instead of using string.Replace
each time:
using System.Linq;
using System.Text;
static class Program {
static void Main(string[] args) {
const string removeChars = " ?&^$#@!()+-,:;<>’\'-_*";
string result = "x&y(z)";
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(result.Length);
foreach (char x in result.Where(c => !removeChars.Contains(c))) {
sb.Append(x);
}
result = sb.ToString();
}
}
Related videos on Youtube
patel.milanb
Updated on August 21, 2020Comments
-
patel.milanb over 3 years
I am using this method to clean the string
public static string CleanString(string dirtyString) { string removeChars = " ?&^$#@!()+-,:;<>’\'-_*"; string result = dirtyString; foreach (char c in removeChars) { result = result.Replace(c.ToString(), string.Empty); } return result; }
This method works fine.. BUT there is a performance glitch in this method. everytime i pass the string, every character goes in loop, if i have a large string then it would take too much time to return the object.
Is there any other better way of doing the same thing?. like in LINQ or JQUERY / Javascript
Any suggestion would be appreciated.
-
Russ Cam almost 12 yearsFor what purpose are you
"cleaning"
a string? -
patel.milanb almost 12 yearsi am basically dealing it with a lot of Qurystring values...
-
akhil almost 12 yearsyou just want to make a string null or what?
-
nhahtdh almost 12 yearsPut all characters in a character class of regex, then replace all at once.
-
Furqan Hameedi almost 12 yearsexplore
System.Text.RegularExpression
namespace for this -
Stuart.Sklinar almost 12 yearsCould this be done with RegEx?
-
hatchet - done with SOverflow almost 12 yearsDefine "better". Any solution will have a loop over the characters. The drawback in your code is excess creation of string objects, not the loop over every character.
-
Mark Peters almost 12 yearsI'm a little concerned about you "cleaning" a query string. Can you describe what you are doing with the cleaned string?
-
patel.milanb almost 12 yearsso what do you suggest, which string objects i can remove?
-
patel.milanb almost 12 yearsthere are values in querystring on which i have to build up <a href> tag...there are some cases in which i have values comeing from the database with the html tags included and want to show them on pages.
-
Security Hound almost 12 years@patel.milanb - If you are using this to connect to a SQL database then your doing it wrong.
-
L.B almost 12 years@patel.milanb Then what you are looking for is
HttpUtility.HtmlEncode
not string cleaning
-
-
patel.milanb almost 12 yearsthis certainly helps. opens up a new idea for me using the StringBuilder class
-
L.B almost 12 years
removeChars.Contains
isO(n)
. AHashSet
would be better. -
L.B almost 12 years
removeChars.IndexOf
isO(n)
operation . AHashSet
would be better. -
sloth almost 12 years
output
should beresult
. Also you can omit.ToCharArray()
, since a string implementsIEnumerable<char>
. -
Steven Doggart almost 12 yearsGrrr.. Thanks @BigYellowCactus. Don't know how I missed that.
-
L.B almost 12 yearsYou can also use a one-liner
return new String(dirtyString.Where(c => !removeChars.Contains(c)).ToArray());
-
L.B almost 12 yearsWhat would this give
return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray())
at your machine? -
sloth almost 12 yearsIt's fast. 50000 iterations: 182ms (next one is
UseStringBuilderWithHashSet2
with 266ms) -
Guillaume Beauvois almost 9 yearsJust for the reccords, for UseStringBuilderWithHashSet and UseStringBuilderWithHashSet2 the test will be
if (!removeChars.Contains(c))
-
Evaldas Raisutis over 8 yearshow would you add white space to removeChars hashet?
-
Steven Doggart over 8 years@Qweick well, the space character is already included, but if there were any other white space characters that you wanted to include, you could just concatenate them to the string (e.g. "..." & vbTab).
-
Evaldas Raisutis over 8 years@StevenDoggart grrh, yes, thanks :) For some reason I assumed there had to be a symbol for that :))
-
ATutorMe over 7 yearsCan L.B's UseWhere method be extended to allow additional characters? Like this: public static string UseWhereExtended(string dirtyString) { IEnumerable<char> stringQuery = from ch in dirtyString where char.IsLetterOrDigit(ch) || ch == '.' || ch == ',' || ch == '\'' || ch == '\"' || ch == '?' || ch == '!' select ch; return new string(stringQuery.ToArray()); }
-
Daxtron2 almost 6 yearsI think there's an error in
UseStringBuilderWithHashSet2
, shouldn'tif(removeChars.Contains(c))
beif(!removeChars.Contains(c))
?