Quantcast
Channel: Does C have an equivalent of std::less from C++? - Stack Overflow
Viewing all articles
Browse latest Browse all 5

Answer by Peter Cordes for Does C have an equivalent of std::less from C++?

$
0
0

On implementations with a flat memory mode (basically everything), casting to uintptr_t will Just Work.

But systems with non-flat memory models do exist, and thinking about them can help explain the current situation, like C++ having different specs for < vs. std::less.


Part of the point of < on pointers to separate objects being UB in C (or at least unspecified in some C++ revisions) is to allow for weird machines, including non-flat memory models.

A well-known example is x86-16 real mode where pointers are segment:offset, forming a 20-bit linear address via (segment << 4) + offset. The same linear address can be represented by multiple different seg:off combinations.

C++ std::less on pointers on weird ISAs might need to be expensive, e.g. "normalize" a segment:offset on x86-16 to have offset <= 15. However, there's no portable way to implement this. The manipulation required to normalize a uintptr_t (or the object-representation of a pointer object) is implementation-specific.

But even on systems where C++ std::less has to be expensive, < doesn't have to be. Assuming a "large" memory model where an object fits within one segment, < can just compare the offset part and not even bother with the segment part. (Pointers inside the same object will have the same segment, and otherwise it's UB in C. C++17 changed to merely "unspecified", which might still allow skipping normalization and just comparing offsets.) This is assuming all pointers to any part of an object always use the same seg value, never normalizing; I guess an ABI could require that.

(Such a memory model might have a max object size of 64kiB for example, but a much larger max total address space that has room for many such max-sized objects. ISO C allows implementations to have a limit on object size that's lower than the max value (unsigned) size_t can represent, SIZE_MAX. For example even on flat memory model systems, GNU C limits max object size to PTRDIFF_MAX so size calculation can ignore signed overflow.) See this answer and discussion in comments.

If you want to allow objects larger than a segment, you need a "huge" memory model that has to worry about overflowing the offset part of a pointer when doing p++ to loop through an array, or when doing indexing / pointer arithmetic. This leads to slower code everywhere, but would probably mean that p < q would happen to work for pointers to different objects, because an implementation targeting a "huge" memory model would normally choose to keep all pointers normalized all the time. See What are near, far and huge pointers? - some real C compilers for x86 real mode did have an option to compile for the "huge" model where all pointers defaulted to "huge" unless declared otherwise.

x86 real-mode segmentation isn't the only non-flat memory model possible, it's merely a useful concrete example to illustrate how it's been handled by C/C++ implementations. In real life, implementations extended ISO C with the concept of far vs. near pointers, allowing programmers to choose when they can get away with just storing / passing around the 16-bit offset part, relative to some common data segment.

But a pure ISO C implementation would have to choose between a small memory model (everything except code in the same 64kiB with 16-bit pointers) or large or huge with all pointers being 32-bit. Some loops could optimize by incrementing just the offset part, but pointer objects couldn't be optimized to be smaller.


If you knew what the magic manipulation was for any given implementation, you could implement it in pure C. The problem is that different systems use different addressing and the details aren't parameterized by any portable macros.

Or maybe not: it might involve looking something up from a special segment table or something, e.g. like x86 protected mode instead of real mode where the segment part of the address is an index, not a value to be left shifted. You could set up partially-overlapping segments in protected mode, and the segment selector parts of addresses wouldn't necessarily even be ordered in the same order as the corresponding segment base addresses. Getting a linear address from a seg:off pointer in x86 protected mode might involve a system call, if the GDT and/or LDT aren't mapped into readable pages in your process.

(Of course mainstream OSes for x86 use a flat memory model so the segment base is always 0 (except for thread-local storage using fs or gs segments), and only the 32-bit or 64-bit "offset" part is used as a pointer.)

You could manually add code for various specific platforms, e.g. by default assume flat, or #ifdef something to detect x86 real mode and split uintptr_t into 16-bit halves for seg -= off>>4; off &= 0xf; then combine those parts back into a 32-bit number.


Viewing all articles
Browse latest Browse all 5

Latest Images

Trending Articles





Latest Images