How to turn plural words singular?

22,442

Solution 1

Those are all general rules (and good ones) but English is not a language for the faint of heart :-).

My own preference would be to have a transformation engine along with a set of transformations (surprisingly enough) for doing the actual work. You would run through the transformations (from specific to general) and, when a match was found, apply the transformation to the word and stop.

Regular expressions would be an ideal approach to this due to their expressiveness. An example rule set:

 1. If the word is fish, return fish.
 2. If the word is sheep, return sheep.
 3. If the word is "radii", return "radius".
 4. If the word ends in "ii", replace that "ii" with "us" (octopii,virii).
 5. If a word ends with -ies, replace the ending with -y
 6. If a word ends with -es, remove it.
 7. Otherwise, just remove any trailing -s.

Note the requirement to keep this transformation set up to date. For example, let's say someone adds the table name types. This would currently be captured by rule #6 and you would get the singular value typ, which is obviously wrong.

The solution is to insert a new rule somewhere before #6, something like:

 3.5: If the word is "types", return "type".

for a very specific transformation, or perhaps somewhere later if it can be made more general.

In other words, you'll basically need to keep this transformation table updated as you find all those wondrous exceptions that English has spawned over the centuries.


The other possibility is to not waste your time with general rules at all.

Since the use case of this requirement is currently only to singularise the table names, and that set of table names will be relatively tiny (at least compared to the set of plural English words), just create another table (or some sort of data structure) called singulars which maps all the current plural table names (employees, customers) to singular object names (employee, customer).

Then every time a table is added to your schema, ensure you add an entry to the singulars "table" so you can singularize it.

Solution 2

The problem is that's based on the general rules, but English has (figuratively) a billion exceptions... What do you do with words like "fish", or "geese"?

Also, the rules are for how to turn singular nouns to plurals. The reverse mapping isn't necessarily possible (consider "freebies").

Solution 3

Andrew Peters has a class called Inflector.NET which provides plural-to-singular and singular-to-plural methods. As Tal has pointed out no algorithm is infallible but this covers a decent number of irregular English nouns.

Solution 4

Maybe take a look at source code of something like Rails Inflector

Solution 5

See also this answer, which recommends using Morpha (or studying the algorithm behind it).

If you know that the words that you want to lemmatize are plural nouns then you can tag them with NNS to get a more accurate output.

Input example:

$ cat test.txt 
Types_NNS
Pies_NNS
Trees_NNS
Buses_NNS
Radii_NNS
Communities_NNS
Sheep_NNS
Fish_NNS

Output example:

$ cat test.txt | ./morpha -c
Type
Pie
Tree
Bus
Radius
Community
Sheep
Fish
Share:
22,442
Dmitri Nesteruk
Author by

Dmitri Nesteruk

Updated on May 16, 2021

Comments

  • Dmitri Nesteruk
    Dmitri Nesteruk almost 3 years

    I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now:

    1. If a word ends with -ies, I replace the ending with -y
    2. If a word ends with -es, I remove this ending. This doesn't always work however - for example, it replaces Types with Typ
    3. Otherwise, I just remove the trailing -s

    Does anyone know of a better algorithm?

  • paxdiablo
    paxdiablo almost 15 years
    I don't think you realize how big a billion actually is :-) Or were you being figurative? [That's actually a bug-bear of mine, the people that say "literally a billion" when they really mean figuratively].
  • Tal Pressman
    Tal Pressman almost 15 years
    Well, I didn't say "literally", now did I? :p Still, if it bothers you that much...
  • Dmitri Nesteruk
    Dmitri Nesteruk almost 15 years
    ooh, I love this. it so downprioritizes (is that a new word) my cases that i feel a bit embarassed. okay. point taken. will work with exceptions rather than rules.
  • BenAlabaster
    BenAlabaster almost 15 years
    I've used this and it's great... I've extended it a little. There are many examples on the net of uncommon pluralization to add to the basic version you can get online.
  • BenAlabaster
    BenAlabaster almost 15 years
    Regular expressions only really takes you part way there, you need to create a class that will allow you to define basic rules, exceptions, uncountables, uncommon variations and a host of other variants - some use latin for pluralization, some use greek it's a complex subject.
  • Tal Pressman
    Tal Pressman about 14 years
    That would be the correct singular for "freebie", but going according to the original rules in the question you would have to make it freeby which is wrong.
  • tchrist
    tchrist over 12 years
    English has about 400 rules if you count on the one-off foreign borrowings.
  • Daniel Bradley
    Daniel Bradley about 12 years
    Inflector.NET is a great solution to this problem. If the link is dead above, then here's a github link instead github.com/srkirkland/Inflector
  • trillions
    trillions over 11 years
    it would be nice if it can transform plural back to single entity name. or maybe i missed something? :)
  • ira
    ira over 7 years
    just wondering, what about rule 98... what about, for instance, "blades" ? according to that rule it converts to "blad" :-/
  • paxdiablo
    paxdiablo over 7 years
    @iraklisg, yes it would. So you would then insert a rule somewhere between 0 and 98 to cover that case, as the rest of that answer segment suggests :-)