How to turn plural words singular?
Solution 1
Those are all general rules (and good ones) but English is not a language for the faint of heart :-).
My own preference would be to have a transformation engine along with a set of transformations (surprisingly enough) for doing the actual work. You would run through the transformations (from specific to general) and, when a match was found, apply the transformation to the word and stop.
Regular expressions would be an ideal approach to this due to their expressiveness. An example rule set:
1. If the word is fish, return fish.
2. If the word is sheep, return sheep.
3. If the word is "radii", return "radius".
4. If the word ends in "ii", replace that "ii" with "us" (octopii,virii).
5. If a word ends with -ies, replace the ending with -y
6. If a word ends with -es, remove it.
7. Otherwise, just remove any trailing -s.
Note the requirement to keep this transformation set up to date. For example, let's say someone adds the table name types
. This would currently be captured by rule #6
and you would get the singular value typ
, which is obviously wrong.
The solution is to insert a new rule somewhere before #6
, something like:
3.5: If the word is "types", return "type".
for a very specific transformation, or perhaps somewhere later if it can be made more general.
In other words, you'll basically need to keep this transformation table updated as you find all those wondrous exceptions that English has spawned over the centuries.
The other possibility is to not waste your time with general rules at all.
Since the use case of this requirement is currently only to singularise the table names, and that set of table names will be relatively tiny (at least compared to the set of plural English words), just create another table (or some sort of data structure) called singulars
which maps all the current plural table names (employees
, customers
) to singular object names (employee
, customer
).
Then every time a table is added to your schema, ensure you add an entry to the singulars "table" so you can singularize it.
Solution 2
The problem is that's based on the general rules, but English has (figuratively) a billion exceptions... What do you do with words like "fish", or "geese"?
Also, the rules are for how to turn singular nouns to plurals. The reverse mapping isn't necessarily possible (consider "freebies").
Solution 3
Andrew Peters has a class called Inflector.NET which provides plural-to-singular and singular-to-plural methods. As Tal has pointed out no algorithm is infallible but this covers a decent number of irregular English nouns.
Solution 4
Maybe take a look at source code of something like Rails Inflector
Solution 5
See also this answer, which recommends using Morpha (or studying the algorithm behind it).
If you know that the words that you want to lemmatize are plural nouns then you can tag them with NNS
to get a more accurate output.
Input example:
$ cat test.txt
Types_NNS
Pies_NNS
Trees_NNS
Buses_NNS
Radii_NNS
Communities_NNS
Sheep_NNS
Fish_NNS
Output example:
$ cat test.txt | ./morpha -c
Type
Pie
Tree
Bus
Radius
Community
Sheep
Fish
Dmitri Nesteruk
Updated on May 16, 2021Comments
-
Dmitri Nesteruk almost 3 years
I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now:
- If a word ends with -ies, I replace the ending with -y
- If a word ends with -es, I remove this ending. This doesn't always work however - for example, it replaces Types with Typ
- Otherwise, I just remove the trailing -s
Does anyone know of a better algorithm?
-
paxdiablo almost 15 yearsI don't think you realize how big a billion actually is :-) Or were you being figurative? [That's actually a bug-bear of mine, the people that say "literally a billion" when they really mean figuratively].
-
Tal Pressman almost 15 yearsWell, I didn't say "literally", now did I? :p Still, if it bothers you that much...
-
Dmitri Nesteruk almost 15 yearsooh, I love this. it so downprioritizes (is that a new word) my cases that i feel a bit embarassed. okay. point taken. will work with exceptions rather than rules.
-
BenAlabaster almost 15 yearsI've used this and it's great... I've extended it a little. There are many examples on the net of uncommon pluralization to add to the basic version you can get online.
-
BenAlabaster almost 15 yearsRegular expressions only really takes you part way there, you need to create a class that will allow you to define basic rules, exceptions, uncountables, uncommon variations and a host of other variants - some use latin for pluralization, some use greek it's a complex subject.
-
Tal Pressman about 14 yearsThat would be the correct singular for "freebie", but going according to the original rules in the question you would have to make it freeby which is wrong.
-
tchrist over 12 yearsEnglish has about 400 rules if you count on the one-off foreign borrowings.
-
Daniel Bradley about 12 yearsInflector.NET is a great solution to this problem. If the link is dead above, then here's a github link instead github.com/srkirkland/Inflector
-
trillions over 11 yearsit would be nice if it can transform plural back to single entity name. or maybe i missed something? :)
-
ira over 7 yearsjust wondering, what about rule 98... what about, for instance, "blades" ? according to that rule it converts to "blad" :-/
-
paxdiablo over 7 years@iraklisg, yes it would. So you would then insert a rule somewhere between 0 and 98 to cover that case, as the rest of that answer segment suggests :-)