Bug (or Feature?) in Qt's Command-Line Parser/Tokenizer



  • Hi.

    I spend the whole day trying to figure out why some of the CLI arguments passed to my application get screwed up. After all, I figured out that it's Qt's command-line parser that is going insane here!

    My application is a Windows GUI application, so I need to link in "qtmain.lib" in order to be able to use a normal main() function. After looking at the sources, I noticed that "qtmain.lib" basically implements a Dummy WinMain() method which calls the actual main() function. It also does the parsing of the command-line to generate the argc and argv[] parameters for my main() function. And that is the source of the problem. Bummer!

    More specifically, Qt's command-line parser fails with the following command-line string:
    @program.exe --add "C:\Some Path''Foobar''.mp3"@

    Note that the file is called ''Foobar''.mp3 and not "Foobar".mp3 - there are two ('), not a single ("), in the file name.

    Expected result would be:
    @argv[0] = program.exe
    argv[1] = --add
    argv[2] = C:\Some Path''Foobar''.mp3@

    What Qt 4.8.1 actually does is this:
    @argv[0] = program.exe
    argv[1] = --add
    argv[2] = C:\Some Path''Foobar''.mp3@

    The Backslash has been removed for no good reason. Application will fail to open the file :-(

    Note: If the main() function is called directly from the C++ Runtime, e.g. in a Console app, this command-line is parsed correctly. Also if I parse the command-line "by hand" using GetCommandLineW() + CommandLineToArgvW(), all is fine...

    My suggestion would be that "qtmain.lib" should use CommandLineToArgvW() on Win32.



  • it might be because ' is the escape sequence for ' in strings...have you tried using a simpler file name?



  • Well, there is no need to escape ('), because you can pass a literal (') simply by butting a single (').

    This works just fine:
    @program.exe --add "This is 'just' a test!"@

    Arguments will be:
    @program.exe
    --add
    This is 'just' a test!@

    If you want to pass a literal (") you have to escape it as ", because (") has a special meaning.

    Like this:
    @program.exe --add "This is "just" a test!"@

    Still it's definitely wrong and superfluous to treat ' as an escape sequence!

    And of course I could use a "simpler" file name, but I think that's not the point here ;-)

    If the user has such a file name, which is perfectly legitimate, he must be able to pass that name.

    Actually I was made aware of this bug by one of my users!

    Also, to make this clear again, Qt's current behavior is inconsistent with how the C++ Runtime tokenizes the CLI arguments for main(). It's also inconsistent with the result of CommandLineToArgvW()...



  • [quote author="MuldeR" date="1340391592"]
    If you want to pass a literal (") you have to escape it as ", because (") has a special meaning.
    [/quote]

    well ' has a special meaning as well...it is used for single characters(just like " is for the strings), hence the escape sequence ' for it exists...so when the compiler or rte encounters the ' , it translates it into ' on its own...its kind of like the keywords in a language...they're predefined
    neways, ''foobar'' is really not a name someone would use normally(read 'EVER' ;-)).that's one strange user(a double quote is somewhat acceptable but two single quotes,why would that be done?!? 0_o)
    i'm no expert(in fact i'm a noob), but that's what i think is happening with your program



  • [quote author="raaghuu" date="1340432235"][quote author="MuldeR" date="1340391592"]
    If you want to pass a literal (") you have to escape it as ", because (") has a special meaning.
    [/quote]

    well ' has a special meaning as well...it is used for single characters(just like " is for the strings), hence the escape sequence ' for it exists...so when the compiler or rte encounters the ' , it translates it into ' on its own...its kind of like the keywords in a language...they're predefined[/quote]

    Apparently you are talking about C source code here. And what you say is correct for C source code. But that's a completely different matter. This topic is not about string literals in C source code at all!

    This is about the command-line and how it is tokenized into argv[]'s by Qt. And on the command-line the (') symbol certainly does not have a special meaning and thus does not need to be escaped.

    You can confirm this easily with a simple Console program like that:
    @int main(int argc, char *argv[])
    {
    for(int i = 0; i < argc; i++) printf("argv[%d] = %s\n", i, argv[i]);
    return getchar();
    }@

    If this program is compiled as "Console" and qtmain.lib is not used (i.e. C++ Runtime generates argv[]), we get:
    @program.exe --add "This is 'just' a Test"

    argv[0] = program.exe
    argv[1] = --add
    argv[2] = This is 'just' a Test

    program.exe --add 'This is "just" a Test'

    argv[0] = program.exe
    argv[1] = --add
    argv[2] = 'This
    argv[3] = is
    argv[4] = just
    argv[5] = a
    argv[6] = Test'

    program.exe --add "C:\Test''Foobar''.mp3"

    argv[0] = program.exe
    argv[1] = --add
    argv[2] = C:\Test''Foobar''.mp3@

    The above behavior is consistent between the C++ Runtime and CommandLineToArgvW().

    Now compile the same program as "Windows" (GUI) using the "qtmain.lib".

    You will see that Qt's result is very different and inconsistent with the expected result...

    More specifically, Qt (version 4.8.1) will do this:
    @program.exe --add "C:\Test''Foobar''.mp3"

    argv[0] = program.exe
    argv[1] = --add
    argv[2] = C:\Test''Foobar''.mp3@

    (Note how the Backslash has been removed from the path, invalidating that path!)



  • well in that case...
    [quote author="raaghuu" date="1340432235"]
    neways, ''foobar'' is really not a name someone would use normally(read 'EVER' ;-)).that's one strange user(a double quote is somewhat acceptable but two single quotes,why would that be done?!? 0_o)
    [/quote]
    i don't know why this is happening...but seriously...why would one name a file ''<>''!?!



  • [quote author="raaghuu" date="1340447285"]well in that case...
    [quote author="raaghuu" date="1340432235"]
    neways, ''foobar'' is really not a name someone would use normally(read 'EVER' ;-)).that's one strange user(a double quote is somewhat acceptable but two single quotes,why would that be done?!? 0_o)
    [/quote]
    seriously...why would one name a file ''<>''!?!
    [/quote]

    This doesn't matter. It is a perfectly valid filename and the program needs to be able to deal with it. Actually such file names exist in the wild. I was made aware of the problem, because one of my customers sent a bug report! Furthermore, both, the C++ Runtime and CommandLineToArgvW(), do handle this path perfectly fine. It's only Qt's internal command-line parser/tokenizer that is giving a "wrong" (or at least "non-standard") result here...

    ("This bug probably won't be triggered at runtime, so let's ignore it!" always is a bad excuse for not fixing your bugs. Murphy's Law tells us that one day things will go terribly wrong. And it's often this kind of "rarely triggered" bugs that will give you the biggest headache at some point in the future.)



  • I've located the problem:
    QCoreApplication::arguments() (qcoreapplication.cpp) uses the inline function qWinCmdArgs which itself uses qWinCmdLine (qcorecmdlineargs_p.h). This function explicitly handles "" as an escape char if followed by ' or ":

    @
    if (*p == '\') { // escape char?
    p++;
    if (*p == Char('"') || *p == Char('''))
    ; // yes
    else
    p--; // treat \ literally
    }
    @
    (line 105 of qcorecmdlineargs_p.h of Qt 4.8.1)

    So I guess it's intended behaviour. However "" is wrong for path delimiters anyway, "/" is the proper character. This has been a major bug in Microsoft® Windows® products for decades and the engineers there obviously still couldn't fix it. In fact, computer-illiterate people have grown so used to it, they have started thinking it isn't a bug. I recommend to the Microsoft® engineers to tackle the problem in smaller steps since they can't handle it in one go. I.e. use "|" for a couple of years, and then finally switch to "/" like all proper operating systems.



  • [quote author="DerManu" date="1340493815"]I've located the problem:
    QCoreApplication::arguments() (qcoreapplication.cpp) uses the inline function qWinCmdArgs which itself uses qWinCmdLine (qcorecmdlineargs_p.h). This function explicitly handles "" as an escape char if followed by ' or ":

    @
    if (*p == '\') { // escape char?
    p++;
    if (*p == Char('"') || *p == Char('''))
    ; // yes
    else
    p--; // treat \ literally
    }
    @
    (line 105 of qcorecmdlineargs_p.h of Qt 4.8.1)

    So I guess it's intended behaviour[/quote]

    Intended or not. That behavior is definitely wrong! Treating a backslash as escape character right before a (") does make sense. This way we can pass a literal (") on the command-line. And this behavior is consistent with CommandLineToArgvW(). But treating a backslash as escape character right before a (') is superfluous and, as we have seen, can even cause big problems. That's because a single (') on the command-line simply is a literal ('). There is no need to escape it! CommandLineToArgvW() doesn't do that either...

    [quote author="DerManu" date="1340493815"]. However "" is wrong for path delimiters anyway, "/" is the proper character. This has been a major bug in Microsoft® Windows® products for decades and the engineers there obviously still couldn't fix it. In fact, computer-illiterate people have grown so used to it, they have started thinking it isn't a bug. I recommend to the Microsoft® engineers to tackle the problem in smaller steps since they can't handle it in one go. I.e. use "|" for a couple of years, and then finally switch to "/" like all proper operating systems.[/quote]

    Well, Windows and MS-DOS have been using the backslash as path delimiter since forever. Unix is using the forward-slash instead. It's pointless to discuss what is more "correct" or "wrong". It's a matter of fact that Windows uses () and Unix uses (/) as path delimiter. You can't change that (unless you have a time machine ^^). Thus a cross-platfrom toolkit like Qt unavoidably needs to deal with this unfortunate situation...

    BTW: Actually the Win32 API accepts forward-slashes as well. They'll be converted to backslashes internally. Though you'll rarely see anybody use forward-slashes on Windows. Especially you can't expect Windows users to switch to forward-slashes in order to workaround bugs in Qt's command-line tokenizer ;-)



  • see...see... i knew that was the problem :D(although not exactly knew why...thanks DerManu)...
    well, if its there, there must be a reason to do so explicitely...probably a moderator who is reading this could send someone with appropriate knowledge to guide us?



  • [quote author="raaghuu" date="1340516806"]probably a moderator who is reading this could send someone with appropriate knowledge to guide us?[/quote]

    No, the moderator's can't :)

    But I would advise to open a bug report on "Jira":https://bugreports.qt-project.org/, including your special test case. I'm not sure if it will be fixed in the Qt 4 series, as one might argue that it could break backwards compatibility. But at least for the upcoming Qt 5 this should be fixed, IMHO (but I'm not a Qt dev, I have no power to decide this).



  • [quote author="Volker" date="1340534251"]But I would advise to open a bug report on "Jira":https://bugreports.qt-project.org/, including your special test case.[/quote]

    Okay, will do that.

    [quote author="Volker" date="1340534251"]I'm not sure if it will be fixed in the Qt 4 series, as one might argue that it could break backwards compatibility. But at least for the upcoming Qt 5 this should be fixed, IMHO (but I'm not a Qt dev, I have no power to decide this).[/quote]

    Yes and No.

    The exact behavior of Qt's internal command-line tokenizer would change, indeed. But the current implementation already is inconsistent with the C++ Runtime. That is: If you write a Qt-based Console application, your main() method will be called directly by the C++ Runtime, giving you the "standard" (correct) behavior. Only if you write a Qt-based GUI application and compile it with "qtmain.lib", your main() method will now be called from Qt's own dummy WinMain() function and thus you'll get the "non-standard" (wrong) behavior. If you do not use "qtmain.lib" and instead write a WinMain() function yourself, then you'll probably use GetCommandLineW() + CommandLineToArgvW() and also get the "standard" (correct) behavior. Consequently, at the moment, Qt already breaks compatibility itself - showing different behavior depending on how the binary is compiled. That is: Relying on the "old" behavior is unsafe anyway! IMHO fixing the inconsistent behavior would only improve things...

    BTW: The C++ Runtime's on Windows (MSVC) and on Linux (GCC) show the same behavior. The string "C:\Test&#39;'Foo''.mp3" is processed correctly. So Qt's tokenizer currently doesn't mimic any of those.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.