How to use VC++ intrinsic functions w/o run-time library
Solution 1
I think I finally found a solution:
First, in a header file, declare memset()
with a pragma, like so:
extern "C" void * __cdecl memset(void *, int, size_t);
#pragma intrinsic(memset)
That allows your code to call memset()
. In most cases, the compiler will inline the intrinsic version.
Second, in a separate implementation file, provide an implementation. The trick to preventing the compiler from complaining about re-defining an intrinsic function is to use another pragma first. Like this:
#pragma function(memset)
void * __cdecl memset(void *pTarget, int value, size_t cbTarget) {
unsigned char *p = static_cast<unsigned char *>(pTarget);
while (cbTarget-- > 0) {
*p++ = static_cast<unsigned char>(value);
}
return pTarget;
}
This provides an implementation for those cases where the optimizer decides not to use the intrinsic version.
The outstanding drawback is that you have to disable whole-program optimization (/GL and /LTCG). I'm not sure why. If someone finds a way to do this without disabling global optimization, please chime in.
Solution 2
I'm pretty sure there's a compiler flag that tells VC++ not to use intrinsics
The source to the runtime library is installed with the compiler. You do have the choice of excerpting functions you want/need, though often you'll have to modify them extensively (because they include features and/or dependencies you don't want/need).
There are other open source runtime libraries available as well, which might need less customization.
If you're really serious about this, you'll need to know (and maybe use) assembly language.
Edited to add:
I got your new test code to compile and link. These are the relevant settings:
Enable Intrinsic Functions: No
Whole Program Optimization: No
It's that last one that suppresses "compiler helpers" like the built-in memset.
Edited to add:
Now that it's decoupled, you can copy the asm code from memset.asm into your program--it has one global reference, but you can remove that. It's big enough so that it's not inlined, though if you remove all the tricks it uses to gain speed you might be able to make it small enough for that.
I took your above example and replaced the memset()
with this:
void * __cdecl memset(void *pTarget, char value, size_t cbTarget) {
_asm {
push ecx
push edi
mov al, value
mov ecx, cbTarget
mov edi, pTarget
rep stosb
pop edi
pop ecx
}
return pTarget;
}
It works, but the library's version is much faster.
Solution 3
I think you have to set Optimization to "Minimize Size (/O1)" or "Disabled (/Od)" to get the Release configuration to compile; at least this is what did the trick for me with VS 2005. Intrinsics are designed for speed so it makes sense that they would be enabled for the other Optimization levels (Speed and Full).
Solution 4
This definitely works with VS 2015: Add the command line option /Oi-. This works because "No" on Intrinsic functions isn't a switch, it's unspecified. /Oi- and all your problems go away (it should work with whole program optimization, but I haven't properly tested this).
Solution 5
This certainly wasn't an answer when you first asked the question, but it is now possible to do what you want by using the version of Clang that is available with Visual Studio 2019, where it works just as you would like without any particular hoops to jump through.
The use of Clang has some other benefits too - especially if you wish to achieve similar goals using x64 architecture too, as it seems to be the only way to make the blasted pdata section go away!
Per Visual C++ itself, I took the approach of putting the implementations of memset/memcpy in a separate source file and, as rc-1290 mentioned, excluded just that one file from Global Optimizations, so the cost was not so high - albeit irritating!
Adrian McCarthy
I'm a life-long programmer with a variety of interests, including ray tracing, text processing. I've written a murder mystery about a software engineer who discovers that solving a crime is a lot like debugging a program. Check out Blue Screen of Death at your favorite ebook retailer. A second novel, Access Violation, is in the works. SOreadytohelp
Updated on May 02, 2021Comments
-
Adrian McCarthy about 3 years
I'm involved in one of those challenges where you try to produce the smallest possible binary, so I'm building my program without the C or C++ run-time libraries (RTL). I don't link to the DLL version or the static version. I don't even
#include
the header files. I have this working fine.Some RTL functions, like
memset()
, can be useful, so I tried adding my own implementation. It works fine in Debug builds (even for those places where the compiler generates an implicit call tomemset()
). But in Release builds, I get an error saying that I cannot define an intrinsic function. You see, in Release builds, intrinsic functions are enabled, andmemset()
is an intrinsic.I would love to use the intrinsic for
memset()
in my release builds, since it's probably inlined and smaller and faster than my implementation. But I seem to be a in catch-22. If I don't definememset()
, the linker complains that it's undefined. If I do define it, the compiler complains that I cannot define an intrinsic function.Does anyone know the right combination of definition, declaration,
#pragma
, and compiler and linker flags to get an intrinsic function without pulling in RTL overhead?Visual Studio 2008, x86, Windows XP+.
To make the problem a little more concrete:
extern "C" void * __cdecl memset(void *, int, size_t); #ifdef IMPLEMENT_MEMSET void * __cdecl memset(void *pTarget, int value, size_t cbTarget) { char *p = reinterpret_cast<char *>(pTarget); while (cbTarget > 0) { *p++ = static_cast<char>(value); --cbTarget; } return pTarget; } #endif struct MyStruct { int foo[10]; int bar; }; int main() { MyStruct blah; memset(&blah, 0, sizeof(blah)); return blah.bar; }
And I build like this:
cl /c /W4 /WX /GL /Ob2 /Oi /Oy /Gs- /GF /Gy intrinsic.cpp link /SUBSYSTEM:CONSOLE /LTCG /DEBUG /NODEFAULTLIB /ENTRY:main intrinsic.obj
If I compile with my implementation of
memset()
, I get a compiler error:error C2169: 'memset' : intrinsic function, cannot be defined
If I compile this without my implementation of
memset()
, I get a linker error:error LNK2001: unresolved external symbol _memset
-
Adrian McCarthy almost 14 yearsGood idea, but it doesn't work. I wrote my own version, called
ClearMemory()
using a namespace to make sure it doesn't conflict with anything else. The optimizer replaced my implementation ofClearMemory()
with a call tomemset()
(with a byte value of 0)! Too smart for its own good. :-) -
Adrian McCarthy almost 14 yearsBut that's working against the ultimate goal of trying to make the smallest possible binary. In many cases, including
memset
, the inlined intrinsic function is smaller than the function call. -
Adrian McCarthy almost 14 yearsI already have /O1, and /Od kinda defeats the goal of making the smallest possible binary. Speed is also an issue.
-
Fabio Ceconello almost 14 yearsThe lib version is faster just because it aligns the target pointer to 4 bytes (in 32 bits machines, 8 bytes in 64 bits) and uses rep stosd instead of rep stosb, writing separately the unaligned bytes at the start and the end. Doing that would make memset even larger. Again (as I stated in the comments to my answer) I don't think your compiler is really generating the intrinsic. Egrunin's implementation is as small as you can get. In very specific cases maybe the intrinsic would be able to spare the pushs/pops, if ecx&edi are available. Would you have a net gain? Rarely, I guess.
-
Adrian McCarthy almost 14 yearsThe code in egrunin's second edit is essentially the same as the code generated by the compiler when it uses the intrinsic. The compiler is often able to save a few bytes when it knows that it doesn't need to preserve ecx and edi. The library version pays off when the number of bytes to clear gets larger. There's overhead in dealing with the possibly unaligned beginning and end.
-
egrunin almost 14 years@Adrian: so...did I answer your question?
-
Luke almost 14 yearsWell, I don't have VS2008 in front of me so maybe they changed something. In VS2005 this was the only change I had to make to get it to build successfully.
-
Adrian McCarthy almost 14 yearsEverything you wrote it true, but it didn't really address my question. That's probably my fault for not being clear enough in the question. Turning off optimizations is counter to keeping the program small (which is why I'm trying to omit the RTL in the first place) and fast (which is a secondary goal). There doesn't seem to be a need to insert assembly into my code, when it's virtually identical to what the compiler generates. Thanks for the input.
-
AnT stands with Russia over 13 yearsWhat are all those casts doing there? Also, pointer conversions to and from
void *
are normallystatic_cast
-s, notreinterpret_cast
-s. -
Adrian McCarthy over 12 years@AndreyT: I've changed the cast from
void *
to use astatic_cast
. At the time I originally wrote this, which cast to use in that situation was unclear and hotly debated. (stackoverflow.com/questions/310451/…) I'm not sure what you mean about "all" those cases. There are two. The first is necessary because you cannot write via a pointer to void (which is whatmemset
takes). The second is so that the compiler doesn't warn about assigning an int to an unsigned char. -
Roman Starkov over 12 yearsThis also doesn't work if it's the compiler that uses
memset
in the first place (like in a class initializer). -
Harry Johnston almost 12 yearsIn the specific case where you want to write zeroes, the SecureZeroMemory function seems to work. (It's implemented as a forced inline function embedded into winnt.h.)
-
Adrian McCarthy over 7 yearsFrom MSDN: "/Oi is only a request to the compiler to replace some function calls with intrinsics; the compiler may call the function (and not replace the function call with an intrinsic) if it will result in better performance." So it might or might not work in all cases.
-
RC-1290 almost 7 yearsYou can limit the disabling of whole-program optimization to the intrinsics only, by compiling these intrinsics into a separate static library.
-
Adrian McCarthy over 2 years@KeyC0de: Not exactly. I created a regular function and told the compiler to use it anywhere that it would have used the corresponding intrinsic. This allowed me to link without the compiler's private run-time library and thus let me keep the executable small. It's possible (likely) the real intrinsic would have been better optimized for speed, but the smaller size was more important for me.
-
KeyC0de over 2 yearsSo if I get you, you did the opposite of what I said.
memset
is an intrinsic in MSVS compiler (right?) and you un-intrinsic-ed it? And that's because intrinsics cost space for the binary which was important for you.