Escape command line arguments in c#

39,850

Solution 1

It's more complicated than that though!

I was having related problem (writing front-end .exe that will call the back-end with all parameters passed + some extra ones) and so i looked how people do that, ran into your question. Initially all seemed good doing it as you suggest arg.Replace (@"\", @"\\").Replace(quote, @"\"+quote).

However when i call with arguments c:\temp a\\b, this gets passed as c:\temp and a\\b, which leads to the back-end being called with "c:\\temp" "a\\\\b" - which is incorrect, because there that will be two arguments c:\\temp and a\\\\b - not what we wanted! We have been overzealous in escapes (windows is not unix!).

And so i read in detail http://msdn.microsoft.com/en-us/library/system.environment.getcommandlineargs.aspx and it actually describes there how those cases are handled: backslashes are treated as escape only in front of double quote.

There is a twist to it in how multiple \ are handled there, the explanation can leave one dizzy for a while. I'll try to re-phrase said unescape rule here: say we have a substring of N \, followed by ". When unescaping, we replace that substring with int(N/2) \ and iff N was odd, we add " at the end.

The encoding for such decoding would go like that: for an argument, find each substring of 0-or-more \ followed by " and replace it by twice-as-many \, followed by \". Which we can do like so:

s = Regex.Replace(arg, @"(\\*)" + "\"", @"$1$1\" + "\"");

That's all...

PS. ... not. Wait, wait - there is more! :)

We did the encoding correctly but there is a twist because you are enclosing all parameters in double-quotes (in case there are spaces in some of them). There is a boundary issue - in case a parameter ends on \, adding " after it will break the meaning of closing quote. Example c:\one\ two parsed to c:\one\ and two then will be re-assembled to "c:\one\" "two" that will me (mis)understood as one argument c:\one" two (I tried that, i am not making it up). So what we need in addition is to check if argument ends on \ and if so, double the number of backslashes at the end, like so:

s = "\"" + Regex.Replace(s, @"(\\+)$", @"$1$1") + "\"";

Solution 2

My answer was similar to Nas Banov's answer but I wanted double quotes only if necessary.

Cutting out extra unnecessary double quotes

My code saves unnecessarily putting double quotes around it all the time which is important *when you are getting up close to the character limit for parameters.

/// <summary>
/// Encodes an argument for passing into a program
/// </summary>
/// <param name="original">The value that should be received by the program</param>
/// <returns>The value which needs to be passed to the program for the original value 
/// to come through</returns>
public static string EncodeParameterArgument(string original)
{
    if( string.IsNullOrEmpty(original))
        return original;
    string value = Regex.Replace(original, @"(\\*)" + "\"", @"$1\$0");
    value = Regex.Replace(value, @"^(.*\s.*?)(\\*)$", "\"$1$2$2\"");
    return value;
}

// This is an EDIT
// Note that this version does the same but handles new lines in the arugments
public static string EncodeParameterArgumentMultiLine(string original)
{
    if (string.IsNullOrEmpty(original))
        return original;
    string value = Regex.Replace(original, @"(\\*)" + "\"", @"$1\$0");
    value = Regex.Replace(value, @"^(.*\s.*?)(\\*)$", "\"$1$2$2\"", RegexOptions.Singleline);

    return value;
}

explanation

To escape the backslashes and double quotes correctly you can just replace any instances of multiple backslashes followed by a single double quote with:

string value = Regex.Replace(original, @"(\\*)" + "\"", @"\$1$0");

An extra twice the original backslashes + 1 and the original double quote. i.e., '\' + originalbackslashes + originalbackslashes + '"'. I used $1$0 since $0 has the original backslashes and the original double quote so it makes the replacement a nicer one to read.

value = Regex.Replace(value, @"^(.*\s.*?)(\\*)$", "\"$1$2$2\"");

This can only ever match an entire line that contains a whitespace.

If it matches then it adds double quotes to the beginning and end.

If there was originally backslashes on the end of the argument they will not have been quoted, now that there is a double quote on the end they need to be. So they are duplicated, which quotes them all, and prevents unintentionally quoting the final double quote

It does a minimal matching for the first section so that the last .*? doesn't eat into matching the final backslashes

Output

So these inputs produce the following outputs

hello

hello

\hello\12\3\

\hello\12\3\

hello world

"hello world"

\"hello\"

\\"hello\\\"

\"hello\ world

"\\"hello\ world"

\"hello\\\ world\

"\\"hello\\\ world\\"

hello world\\

"hello world\\\\"

Solution 3

I have ported a C++ function from the Everyone quotes command line arguments the wrong way article.

It works fine, but you should note that cmd.exe interprets command line differently. If (and only if, like the original author of article noted) your command line will be interpreted by cmd.exe you should also escape shell metacharacters.

/// <summary>
///     This routine appends the given argument to a command line such that
///     CommandLineToArgvW will return the argument string unchanged. Arguments
///     in a command line should be separated by spaces; this function does
///     not add these spaces.
/// </summary>
/// <param name="argument">Supplies the argument to encode.</param>
/// <param name="force">
///     Supplies an indication of whether we should quote the argument even if it 
///     does not contain any characters that would ordinarily require quoting.
/// </param>
private static string EncodeParameterArgument(string argument, bool force = false)
{
    if (argument == null) throw new ArgumentNullException(nameof(argument));

    // Unless we're told otherwise, don't quote unless we actually
    // need to do so --- hopefully avoid problems if programs won't
    // parse quotes properly
    if (force == false
        && argument.Length > 0
        && argument.IndexOfAny(" \t\n\v\"".ToCharArray()) == -1)
    {
        return argument;
    }

    var quoted = new StringBuilder();
    quoted.Append('"');

    var numberBackslashes = 0;

    foreach (var chr in argument)
    {
        switch (chr)
        {
            case '\\':
                numberBackslashes++;
                continue;
            case '"':
                // Escape all backslashes and the following
                // double quotation mark.
                quoted.Append('\\', numberBackslashes*2 + 1);
                quoted.Append(chr);
                break;
            default:
                // Backslashes aren't special here.
                quoted.Append('\\', numberBackslashes);
                quoted.Append(chr);
                break;
        }
        numberBackslashes = 0;
    }

    // Escape all backslashes, but let the terminating
    // double quotation mark we add below be interpreted
    // as a metacharacter.
    quoted.Append('\\', numberBackslashes*2);
    quoted.Append('"');

    return quoted.ToString();
}

Solution 4

I was running into issues with this, too. Instead of unparsing args, I went with taking the full original commandline and trimming off the executable. This had the additional benefit of keeping whitespace in the call, even if it isn't needed/used. It still has to chase escapes in the executable, but that seemed easier than the args.

var commandLine = Environment.CommandLine;
var argumentsString = "";

if(args.Length > 0)
{
    // Re-escaping args to be the exact same as they were passed is hard and misses whitespace.
    // Use the original command line and trim off the executable to get the args.
    var argIndex = -1;
    if(commandLine[0] == '"')
    {
        //Double-quotes mean we need to dig to find the closing double-quote.
        var backslashPending = false;
        var secondDoublequoteIndex = -1;
        for(var i = 1; i < commandLine.Length; i++)
        {
            if(backslashPending)
            {
                backslashPending = false;
                continue;
            }
            if(commandLine[i] == '\\')
            {
                backslashPending = true;
                continue;
            }
            if(commandLine[i] == '"')
            {
                secondDoublequoteIndex = i + 1;
                break;
            }
        }
        argIndex = secondDoublequoteIndex;
    }
    else
    {
        // No double-quotes, so args begin after first whitespace.
        argIndex = commandLine.IndexOf(" ", System.StringComparison.Ordinal);
    }
    if(argIndex != -1)
    {
        argumentsString = commandLine.Substring(argIndex + 1);
    }
}

Console.WriteLine("argumentsString: " + argumentsString);

Solution 5

I published small project on GitHub that handles most issues with command line encoding/escaping:

https://github.com/ericpopivker/Command-Line-Encoder

There is a CommandLineEncoder.Utils.cs class, as well as Unit Tests that verify the Encoding/Decoding functionality.

Share:
39,850

Related videos on Youtube

hultqvist
Author by

hultqvist

Updated on July 05, 2022

Comments

  • hultqvist
    hultqvist almost 2 years

    Short version:

    Is it enough to wrap the argument in quotes and escape \ and " ?

    Code version

    I want to pass the command line arguments string[] args to another process using ProcessInfo.Arguments.

    ProcessStartInfo info = new ProcessStartInfo();
    info.FileName = Application.ExecutablePath;
    info.UseShellExecute = true;
    info.Verb = "runas"; // Provides Run as Administrator
    info.Arguments = EscapeCommandLineArguments(args);
    Process.Start(info);
    

    The problem is that I get the arguments as an array and must merge them into a single string. An arguments could be crafted to trick my program.

    my.exe "C:\Documents and Settings\MyPath \" --kill-all-humans \" except fry"
    

    According to this answer I have created the following function to escape a single argument, but I might have missed something.

    private static string EscapeCommandLineArguments(string[] args)
    {
        string arguments = "";
        foreach (string arg in args)
        {
            arguments += " \"" +
                arg.Replace ("\\", "\\\\").Replace("\"", "\\\"") +
                "\"";
        }
        return arguments;
    }
    

    Is this good enough or is there any framework function for this?

    • Sanjeevakumar Hiremath
      Sanjeevakumar Hiremath about 13 years
      did you try passing as is? I think if it is passed to you it can be passed to another command. if you hit any errors then you can think about escaping.
    • hultqvist
      hultqvist about 13 years
      @Sanjeevakumar yes, for example: "C:\Documents and Settings\MyPath \" --kill-all-humans \" except fry" would not be a good thing since I am making privileged call.
    • hultqvist
      hultqvist about 13 years
      @Sanjeevakumar Main(string[] args) is an array of unescaped strings, so if I run my.exe "test\"test" arg[0] will be test"test
    • Sanjeevakumar Hiremath
      Sanjeevakumar Hiremath about 13 years
      1. do your want only escape based on your first comment looks like escaping is not what you want to do. 2. what is unescaped strings? when you get a string like abc"def it is abc"def why do you want to escape it now? if you are adding something like "abc" + """" + "def" this makes sense. observe """" is escaping "
    • hultqvist
      hultqvist about 13 years
      Yes abc"def is correct given the input, however if I am to pass it to another process I must escape it before adding it to the single string argument. See updated question for clarification.
    • ChaseMedallion
      ChaseMedallion almost 10 years
      You might be interested in my MedallionShell library, which automatically handles escaping and concatenating process arguments. The implementation is based on an answer in this thread.
    • Ajedi32
      Ajedi32 over 7 years
      Maybe I'm just unfamiliar with the way argument passing works in Windows, but why do the arguments even have to be converted to a single string like this in the first place? You're not using a terminal emulator here, you're directly starting a program with the Windows equivalent of exec, right? Why can't the array of arguments just be passed directly to the child process? Why does it need to be encoded as a string just so it can be immediately decoded? No other language I've used requires this.
    • hultqvist
      hultqvist over 7 years
      @Ajedi32 my understanding is that at the lowest level the "arguments" is just a single string that the receiving end interpret as a list. This question is basically what algorithm is used in decoding the raw argument string into a list of arguments.
    • Ajedi32
      Ajedi32 over 7 years
      @hultqvist Interesting. Like I said, I'm not familiar with how Windows does it, but that's definitely not how it works on Linux. Now I'm curious: could the receiving choose to not interpret the arguments as a list, and just get the raw string instead?
    • hultqvist
      hultqvist over 7 years
      @Ajedi32 that topic is an interesting question of its own. I won't be able to contribute much to it but I would read the results with interest.
    • Pang
      Pang about 3 years
      For .NET Standard 2.1 / .NET Core 2.1 or above, use ProcessStartInfo.ArgumentList which takes care of properly escaping the arguments on all supported platforms for you.
    • hultqvist
      hultqvist about 3 years
      @Pang write that as an answer so I can accept it. Bonus if you can show it solves all the corner cases mentioned by others.
  • hultqvist
    hultqvist about 13 years
    Your examples work, however @"\test" does not and @"test\" breaks with Win32Exception. The latter is quite common in my work when passing paths as arguments.
  • hultqvist
    hultqvist about 13 years
    I'm afraid your code only wrap the arguments in quotes, but it does no escaping whatsoever. If i would run my.exe "arg1\" \"arg2" giving one single argument arg1" "arg2 your code would generate two arguments, arg1 and arg2
  • Chuck Savage
    Chuck Savage about 13 years
    Ok, I haven't tested versus that. I suppose there is a reason to do arg1" "arg2 though I can't imagine why. Your right I should have escaping in there anyway, I'll watch this thread to see who comes up with the best mechanism for that.
  • hultqvist
    hultqvist about 13 years
    I can think of two. 1: Someone with bad intentions tries to trick your program into executing dangerous commands. 2: Passing the argument John "The Boss" Smith
  • Amit Patil
    Amit Patil almost 13 years
    +1 for explaining this insanity. However shouldn't the * and the + be inside the grouping parentheses in the above match expressions? Otherwise the $1 replacement will only ever be a single backslash.
  • Amit Patil
    Amit Patil almost 13 years
    Actually I think the two replacements can be combined into: "\""+Regex.Replace(s, "(\\\\*)(\\\\$|\")", "$1$1\\$2")+"\"". However my brain is beginning to sink now so appreciated if you could check correctness :-)
  • quetzalcoatl
    quetzalcoatl over 10 years
  • Joey Adams
    Joey Adams about 9 years
    One minor fix: when original is empty, you need to return a pair of double quotes "" instead of an empty string, so the command line will know an argument is there. Other than that, this works perfectly!
  • vojta
    vojta over 8 years
    There must be a bug... Input: <a>\n <b/>\n</a>. Output: <a>\n <b/>\n</a>. Looks like outer qoutes are missing! Am I doing something wrong? (\n means newline, of course, SO comments are not really newline-friendly)
  • vojta
    vojta over 8 years
    Thanks for your answer! Could you please add TL; DR static method that handle everything? I really like your answer, but I have to read it and understand it each time I need the information (because I am too stupid to remember it completely)...
  • Nas Banov
    Nas Banov over 8 years
    @vojta - my apologies but it's been five years and i don't remember the details. By re-reading what i wrote i guess it was just needed to call those two lines. But you probably have better understanding of the case now, why don't you edit the answer and for posterity add the TL-DNR ?
  • Matt Vukomanovic
    Matt Vukomanovic over 8 years
    I'd never even thought of doing an argument with a new line in it. Can't paste code in here it seems.. I'll change my answer to include both the original and one that handles new lines
  • Ajedi32
    Ajedi32 over 7 years
    Hmm, so rephrased: backslashes are interpreted literally unless they're being used to escape a quote mark or another backslash. Therefore two backslashes in a row are interpreted as just two literal backslashes unless they're followed by a quote mark, in which case the first backslash escapes the second one? And three backslashes in a row are interpreted literally unless they're followed by a quote mark, in which case the first backslash escapes the second one and the third one escapes the quote mark? (And so on?) Weird...
  • 7vujy0f0hy
    7vujy0f0hy about 7 years
    Turned your code into a C function: LPWSTR GetArgStrFromCommandLine(LPWSTR c) {if (*c++ != L'"') c = wcspbrk(--c, L" \t\r\n\v\f"); else while (*c && *c++ != L'"') if (*c == L'\\') ++c; return c;}