Why are there no hashtables in the C standard library?

68,083

Solution 1

There is no hashtable in the standard C library because either:

  • no-one has submitted a proposal to the working group; or
  • the working group has deemed it unnecessary.

That's the way ISO works. Proposals are put forward and accepted or rejected.

You have to be careful what you add to the standard library since you have two conflicting groups. As a user, you might want every data structure under the sun to be added to the standard to make the language more useful.

But, as a language implementor (as an aside, these are probably the people that tend to make up most of the various working groups so their view is likely to have more impact), you don't really want the hassle of having to implement stuff that may not be used by everyone. All the stuff that was there when C89 appeared was to do with the fact that the primary purpose was to codify existing practice rather than introduce new practices. All iterations of the standards since then have been a little freer in what they can do but backwards compatibility is still an important issue.

Myself, I also have conflicts. I'd love to have all the features of the Java, C++ or Python libraries at my disposal in C. Of course, that would make it so much harder to learn everything for newcomers and, as one commenter stated, probably make it so any code monkey can pump out useful code, reducing my value in the process :-)

And I pretty much have all the data structures I'll ever need, from my long and (mostly) illustrious career. You're not limited to the standard library for this sort of stuff. There are plenty of third-party tools you can get to do the job and (like me) you can also roll your own.

If you want to know why certain decisions were made in each iteration, ISO (and ANSI originally, before ISO took over) usually publish rationale documents. The C89 one from ANSI can be found here. It contains this little beauty in the scope:

This Rationale focuses primarily on additions, clarifications, and changes made to the language as described in the Base Documents. It is not a rationale for the C language as a whole: the Committee was charged with codifying an existing language, not designing a new one. No attempt is made in this Rationale to defend the pre-existing syntax of the language, such as the syntax of declarations or the binding of operators.

I especially enjoy the admission that they're not responsible for any unholy mess that may have predated their attempts to standardise.

But, perhaps the real answer to your question lies in this bit, one of the guiding principles:


Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like:

  • Trust the programmer.
  • Don't prevent the programmer from doing what needs to be done.
  • Keep the language small and simple.
  • Provide only one way to do an operation.
  • Make it fast, even if it is not guaranteed to be portable.

That third one is probably the main reason why the library wasn't massively expanded with the initial standardisation effort - that, and the fact that such an expansion from a committee would probably have resulted in ANSI C being labeled C2038 rather than C89.

Solution 2

C seems unusual by today's standards because there are no useful data structures defined. None. Not even strings — and if you think a C string is a data structure, well, we'll have to disagree on what a "data structure" is.

If you like C, then think of it as a "blank slate"... your entire application is made of code written by you and libraries you choose to pull in, plus a few fairly primitive standard library functions, with maybe one or two exceptions like qsort. People use C these days to implement things like Python, Ruby, Apache, or the Linux kernel. These are projects that use all of their own data structures anyway, and they wouldn't be likely to use something like the STL.

Many C libraries implement generic hash tables. There are tradeoffs, and you can pick your favorite. Some of them are configurable using callbacks.

  • Glib has a hash table object (documentation)
  • Apache Portable Runtime has a hash table (documentation)
  • Apple's Core Foundation library has a hash table (documentation) Note: Yes you can insert any object as key or value.
  • UTHash is a hash table library (documentation)
  • Another hash table library (link)

With all of these libraries that do what you want, what's the point of adding a hash table to the C standard?

Solution 3

The standard C library doesn't include any large, persistent data structures - neither lists, nor trees, nor stacks, nor hashtables.

It's not really possible to give a definitive answer without asking the authors of the original C library. However, a plausible explanation is that the implementation of such data structures involves various tradeoffs, and only the author of the application is in the correct position to make those tradeoffs.

Note that the POSIX standard C library does specify generic hashtable functions: hcreate(), hsearch() and hdestroy(); and note also that their "one size fits all" nature tends to render them inadequate for most real-world use cases, supporting the argument above.

Solution 4

Due to the lack of templates

This is a guess, but not having templates in the language like C++ does makes implementing containers very inelegant, as you'd need dozens of definitions to cover all possible types, not to mention user defined types.

There are C strategies to mitigate this like playing around with void *, but they lose compile time type checks.

GLib and gnulib are my recommended implementations at the moment: Quick Way to Implement Dictionary in C

Share:
68,083
Shankar Raju
Author by

Shankar Raju

I am a Senior Software Engineer at Microsoft in Bellevue, WA. My opinions are my own and not part of my Employer. I have varied interests that include Music, playing Piano, Photography, Cooking and Technology.

Updated on July 05, 2022

Comments

  • Shankar Raju
    Shankar Raju almost 2 years

    Why is that there is no Hashtable support as part of Standard C Library? Is there any specific reason for this?

  • Dietrich Epp
    Dietrich Epp about 13 years
    Hm, looks like the POSIX hash tables only let you create one hash table at a time. That sounds very useful.
  • paxdiablo
    paxdiablo about 13 years
    Hmm, not sure I agree with the "C string is not a data structure" bit (not enough to downvote though) - they have a defined structure and functions for manipulating them. Other than complexity, I'm not sure that's any different to a BTree or a HashTable. In any case, we also have FILE * :-)
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE about 13 years
    The POSIX hash table functions are pretty much useless in real-world code.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE about 13 years
    +1 for an answer that actually does a decent job of answering the "why" part.
  • Dietrich Epp
    Dietrich Epp about 13 years
    @paxdiablo: I thought about mentioning FILE*... that's a good point.
  • Lundin
    Lundin about 13 years
    "Keep the language small and simple. Provide only one way to do an operation." In this still C you are talking about? int or or signed or signed int, if-else or ?:, if-else or switch, function-like macros or functions, continue or break or goto or return ;, const int or int const, typedef struct or struct, enum or const or #define, malloc or calloc, NULL or 0 or '\0', *(ptr+n) or ptr[n]. Etc, etc, this is just the top of the iceberg I could go on all day, really.
  • Lundin
    Lundin about 13 years
    Also, the committee really made an admirable effort to make the language needlessly complex. When was the last time you used the following headers: complex.h, errno.h, fenv.h, float.h, inttypes.h, iso646.h, locale.h, signal.h, wctype.h. To me, a header file containing a hash table seems about one hundred times more useful than these.
  • paxdiablo
    paxdiablo about 13 years
    @Lundin: see the section containing "charged with codifying an existing language, not designing a new one". Most of that stuff was already there pre-ANSI. Of those headers, I've often used errno, locale and signal - I don't think I've ever used any of the others but, then again, I'm not the only C coder on the planet :-) Things like complex, fenv, inttypes, iso646 and wctype came later so have nothing to do with the C89 rationale. You would have to look at the rationales for the specific iterations.
  • paxdiablo
    paxdiablo about 13 years
    @Lundin: for example: "A new feature of C99, complex types were added to C as part of the effort to make C suitable and attractive for general numerical programming. Complex arithmetic is used heavily in certain important application areas." - from open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
  • Lundin
    Lundin about 13 years
    So if the C90 committee was charged with codifying an existing language, one may then wonder what the C99 committee was charged with. "Come up with as many pointless new features as possible, while avoiding any notable improvements of the language", or something along those lines. Apparently they are still meeting several times per year, speaking of the good old days.
  • Lundin
    Lundin about 13 years
    @paxdiablo What I've heard through rumours from people who were in the C99 committee, complex numbers were only added because there was a single representative some a big important company demanding them, and not for the benefit of the general public.
  • paxdiablo
    paxdiablo about 13 years
    It's all there in the C99 rationale I provided a link to, starting on page 3: internationalisation, further codification for areas C89 was deficient, minimise incompatibilities with C89 and C++, maintain conceptual simplicity. If you think you can do a better job, feel free to apply for the next WG :-) And, re your rumour comment, see my paragraph beginning "As an aside".
  • paxdiablo
    paxdiablo about 13 years
    Keep in mind I'm not saying their decisions were right, just explaining how the process works. All those libraries would have been a godsend in my early career. Not so much now since I have all my own stuff.
  • evanflash
    evanflash about 12 years
    Try writing software to design or simulate electronics components without a good library for complex arithmetic
  • jjg
    jjg over 10 years
    See hcreate_r which does not have this restriction (but is a non-POSIX extension).
  • Dmytro
    Dmytro almost 8 years
    I hereby propose punchcard_t data structure to be the final solution to C's string problems. The data structure shall be defined as typedef struct { char data[81];} punchcard_t; and shall be defined in stdpunchcard.h. It will be the solution to "returning common human-sensible strings from functions".
  • sudo
    sudo almost 7 years
    Best answer. There's no need to create a standard hash table library because at this level, you really care which one you use, so you should go look at some open source libraries or write your own.
  • sudo
    sudo almost 7 years
    "One way to do everything" is where they kinda failed, but they tried. They didn't try with C++...
  • sudo
    sudo almost 7 years
    I love UTHash; would recommend it if you just want something quickly. It's super easy to learn and works without making you create a bunch of objects. And yeah, the author totally abused macros to make it work, but it's small and simple enough that I don't care.
  • wingerse
    wingerse over 5 years
    Why not just use macros to generate code like how templates do?
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    @WingerSendon perhaps that is possible, and I've never tried, but I have the gut feeling that the resulting codebase would be insaner than templates, if that is even possible :-) Glad to be proven wrong.
  • Yuri Nudelman
    Yuri Nudelman about 4 years
    C does not have template only for those who don't know how to use them. Have a look at openssl hash table implementation, they do it very well using preprocessor.
  • Neil
    Neil over 2 years
    @CiroSantilli新疆再教育营六四事件法轮功郝海东 I find using the pre-processor to do cat as Kernighan and Ritchie, 1988, A12.3 p231, is more natural when one has a lot of code.