Why does register_tm_clones and deregister_tm_clones reference an address past the .bss section? Where is this memory allocated?
It is just very silly pointer arithmetic code generated by gcc for deregister_tm_clones()
. It does not actually access the memory at those addresses.
Summary
No accesses are done at those pointers; they just act as address labels, and GCC is being silly about how it compares the two (relocated) addresses.
The two functions are needed as part of transaction support in C and C++. For further details, see GNU libitm.
Background
I'm running Ubuntu 16.04.3 LTS (Xenial Xerus) on x86-64, with GCC versions 4.8.5, 4.9.4, 5.4.1, 6.3.0, and 7.1.0 installed. The register_tm_clones()
and deregister_tm_clones()
get compiled in from /usr/lib/gcc/x86-64/VERSION/crtbegin.o
. For all versions, register_tm_clones()
is okay (no odd addresses). For versions 4.9.4, 5.4.1, and 6.3.0, the code for deregister_tm_clones()
is the same, and includes a very odd pointer comparison test. The code for deregister_tm_clones()
is fixed in 7.1.0, where it is a straightforward address test.
The sources for the two functions are in libgcc/crtstuff.c in the GCC sources.
On this machine, objdump -t /usr/lib/gcc/ARCH/VERSION/crtbegin.o
shows .tm_clone_table
, __TMC_LIST__
, and __TMC_END__
, for all GCC versions I mentioned above, so in the GCC sources, both USE_TM_CLONE_REGISTRY
and HAVE_GAS_HIDDEN
are defined. Thus, we can describe the two functions in C as
typedef void (*func_ptr) (void);
extern void _ITM_registerTMCloneTable(void *, size_t);
extern void _ITM_deregisterTMCloneTable(void *);
static func_ptr __TMC_LIST__[] = { };
extern func_ptr __TMC_END__[];
void deregister_tm_clones(void)
{
void (*fn)(void);
if (__TMC_LIST__ != __TMC_END__) {
fn = _ITM_deregisterTMCloneTable;
if (fn != NULL)
fn(__TMC_LIST__);
}
}
void register_tm_clones(void)
{
void (*fn)(void);
size_t size;
size = (__TMC_END__ - __TMC_LIST__) / 2;
if (size > 0) {
fn = _ITM_registerTMCloneTable;
if (fn != NULL)
fn(__TMC_LIST__, size);
}
}
Essentially, __TMC_LIST__
is an array of function pointers, and size
is the number of function pointer pairs in the array. If the array is not empty, a function called _ITM_registerTMCloneTable()
or _ITM_deregisterTMCloneTable()
, which are defined in libitm.a
, GNU libitm. When the _ITM_registerTMCloneTable
/_ITM_deregisterTMCloneTable
symbols are not defined, the relocation code yields zero as their address.
So, when the array is empty, and/or _ITM_registerTMCloneTable
/_ITMderegisterTMCloneTable
symbols are not defined, the code does nothing: only some fancy pointer arithmetic.
Note that the code does not load the pointer values from any memory address. The addresses (of __TMC_LIST__
, __TMC_END__
, _ITM_registerTMCloneTable
, and _ITM_deregisterTMCloneTable
) are supplied by the linker/relocator, as immediate 32-bit literals in the code. (This is why, if you look at the disassembly of the object files, you see only zeros for the addresses.)
Investigation
The problematic code for deregister_tm_clones
occurs at the very beginning:
004008c0 <deregister_tm_clones>:
4008c0: b8 57 bb 6c 00 mov $0x6cbb57,%eax
4008c5: 55 push %rbp
4008c6: 48 2d 50 bb 6c 00 sub $0x6cbb50,%rax
4008cc: 48 83 f8 0e cmp $0xe,%rax
4008d0: 48 89 e5 mov %rsp,%rbp
4008d3: 76 1b jbe 4008f0 <deregister_tm_clones+0x30>
4008d5: b8 00 00 00 00 mov $0x0,%eax
4008da: 48 85 c0 test %rax,%rax
4008dd: 74 11 je 4008f0 <deregister_tm_clones+0x30>
4008df: 5d pop %rbp
4008e0: bf 50 bb 6c 00 mov $0x6cbb50,%edi
4008e5: ff e0 jmpq *%rax
4008e7: (9-byte NOP)
4008f0: 5d pop %rbp
4008f1: c3 retq
4008f2: (14-byte NOP)
400900:
(This particular example comes from compiling a basic Hello, World! example in C using gcc-6.3.0 on x86-64 statically).
If we look at the section headers (objdump -h
) for the same binary, we see that addresses 0x6cbb50
to 0x6cbb5f
are actually not mapped to any segment; that
24 .data 00001ad0 00000000006ca080 00000000006ca080 000ca080 2**5
25 .bss 00001878 00000000006cbb60 00000000006cbb60 000cbb50 2**5
i.e. .data
covers addresses 0x6ca080
to 0x6cbb4f
, and .bss
covers
0x6cbb60
to 0x6cd3d8
.
It would seem like the assembly code is using invalid addresseses!
However, the 0x6cbb50
address is quite valid, because there is a zero-size hidden symbol at that address (objdump -t
):
006cbb50 g O .data 0000000000000000 .hidden __TMC_END__
Because I compiled the binary statically, the __TMC_END__
symbol is part of the .data
segment here; normally, it is in .bss
. In any case, it does not matter, because __TMC_END__
symbol is of zero size: We can use its address as part of whatever calculations we want, we just cannot dereference it, because it contains no data, having zero size.
This leaves the very first relocated address in the deregister_tm_clones
function, 0x0x6cbb57
in this case.
If we look at what the code actually does with that value, it turns out that for some braindead reason, the compiled binary code is essetially computing
long temporary = relocated__TMC_LIST__address + 7;
long difference = temporary - relocated__TMC_END__address;
if (difference <= 14)
return;
Because the comparison function used is a signed comparison, the above behaves exactly the same as
long temporary = relocated__TMC_LIST__address;
long difference = temporary - relocated__TMC_END__address;
if (difference <= 7)
return;
In any case, it is obvious that __TMC_LIST__ == __TMC_END__
, and that the relocated addresses are the same, in both OP's binary, and the binary above.
Addendum
I do not know exactly why GCC generates
if ((__TMC_END__ + 7) - __TMC_LIST <= 14)
rather than
if (__TMC_END__ <= __TMC_LIST__)
but in GCC bug 77813 Marc Glisse does mention that it (the former above) is indeed what GCC ends up generating. (The bug itself is not directly related to this, as it is about GCC optimizing the expression to zero, affecting only libitm users, and easily fixed.)
Also, between gcc-6.3.0 and gcc-7.1.0, when the generated code dropped that inanity, the C sources for the functions did not change. What changed is how GCC generates code (in some situations) for such pointer comparisons.
Related videos on Youtube
brookbot
http://linkedin.com/in/brookbot I enjoy working on new technology that has a large impact on society. My passion lies in robotics and automation, but I enjoy challenging low level development and insist on quality. I'm an EV activist and would enjoy working on autonomous vehicles. Meanwhile I have no option but to continue on my own with Brookbot, my mobile robot. Insta: brookewallace7731 (F* Facebook) https://www.instagram.com/brookewallace7731/
Updated on June 04, 2022Comments
-
brookbot almost 2 years
register_tm_clones
andderegister_tm_clones
are referencing memory addresses past the end of my RW sections. How is this memory tracked?Example: In the example below
deregister_tm_clones
references memory address0x601077
, but the last RW section we allocated,.bss
starts at0x601069
and has size0x7
, adding we get0x601070
. So the reference is clearly past whats been allocated for the.bss
section and should be in our heap space, but who's managing it.objdump -d main ... 0000000000400540 <deregister_tm_clones>: 400540: b8 77 10 60 00 mov $0x601077,%eax 400545: 55 push %rbp 400546: 48 2d 70 10 60 00 sub $0x601070,%rax 40054c: 48 83 f8 0e cmp $0xe,%rax ... readelf -S main ... [25] .data PROGBITS 0000000000601040 00001040 0000000000000029 0000000000000000 WA 0 0 16 [26] .bss NOBITS 0000000000601069 00001069 0000000000000007 0000000000000000 WA 0 0 1 [27] .comment PROGBITS 0000000000000000 00001069 0000000000000058 0000000000000001 MS 0 0 1 [28] .shstrtab STRTAB 0000000000000000 000019f2 000000000000010c 0000000000000000 0 0 1 [29] .symtab SYMTAB 0000000000000000 000010c8 00000000000006c0 0000000000000018 30 47 8 [30] .strtab STRTAB 0000000000000000 00001788 000000000000026a 0000000000000000 0 0 1
Note that the references start exactly at the end of the
.bss
section. When I examine the memory allocated using gdb, I see that there is plenty of space, so it works fine, but I don't see how this memory is managed.Start Addr End Addr Size Offset objfile 0x400000 0x401000 0x1000 0x0 /home/nobody/main 0x600000 0x601000 0x1000 0x0 /home/nobody/main 0x601000 0x602000 0x1000 0x1000 /home/nobody/main 0x7ffff7a17000 0x7ffff7bd0000 0x1b9000 0x0 /usr/lib64/libc-2.23.so
I can find no other reference to it in any other sections. There is also no space reserved for it in by the segment loaded for .bss:
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x0000000000000259 0x0000000000000260 RW 200000
Can anyone clarify these functions? Where is the source? I've read all the references on transactional memory, but they cover programming not implementation. I can not find a compiler option to remove this code, except of course
-nostdlibs
which leaves you with nothing.Are these part of malloc perhaps? Still for code that's not using malloc, threading, or STM, I'm not sure I agree these should be linked into my code.
See also What functions does gcc add to the linux ELF?
More details:
$ make main cc -c -o main.o main.c cc -o main main.o $ which cc /usr/bin/cc $ cc --version cc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2) Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ cc --verbose Using built-in specs. COLLECT_GCC=cc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC)
-
Leeor over 6 yearsThanks, great answer. I guess it's not related to my case (where I do see unexplained writes into the BSS section), but I still learned something useful.
-
Nominal Animal over 6 years@Leeor: Could you upload a small binary (perhaps a Hello World program) somewhere, so I could examine it? Or even send it to me via email (my address is shown on my home page, linked to from my profile)? You see, I have a bit of a difficulty accepting that your case is different -- your question does not show anything contrary to my answer here --; and I'd like to be able to verify that I am wrong.
-
Leeor over 6 yearsUnfortunately I can't, the code is a simple raytracer (smallpt) using OpenMP, but on a certain system and with a certain gcc version, it suddenly slows down due to collisions on a read-only array residing in the BSS section. profiling shows a surge in snoops indicating that somebody is writing to these shared lines and invalidating them from all the cores. I was hoping that this question could explain why (but unfortunately did not read the code carefully enough..). Still, the bounty was well deserved :)
-
Nominal Animal over 6 years@Leeor: Ah, now I understand; the problem in your case is not in the
deregister_tm_clones()
function at all, but elsewhere. (And I agree, slowdowns due to cacheline ping-pong are infuriating. I've basically resigned to use anonymous memory maps for read-only data shared by parallel simulator threads, just so I can callmprotect(map, len, PROT_READ)
to catch any write attempts.) -
Linas over 2 yearsThe construction
if ((__TMC_END__ + 7) - __TMC_LIST <= 14)
is very typical of attempts to force things to land on an 8-byte boundary. Without this style, stuff sometimes falls on 4-byte boundaries, or even completely unaligned, depending on the target CPU. The lack of alignment then just snowballs into later issues.