Find the year with the most number of people alive in Python

12,119

Solution 1

>>> from collections import Counter
>>> from itertools import chain
>>> def most_pop(pop):
...     pop_flat = chain.from_iterable(range(i,j+1) for i,j in pop)
...     return Counter(pop_flat).most_common()
...
>>> most_pop([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939)])[0]

Solution 2

Another solution I just though of:

  • Create 2 tables, birthdates and deathdates.
  • Accumulate birth dates and death dates in those tables.
  • Browse those tables to accumulate the number of alive people at the time.

Grand total complexity is O(n)

Implementation

from collections import Counter

def most_populated(population, single=True):
    birth = map(lambda x: x[0], population)
    death = map(lambda x: x[1] + 1, population)
    b = Counter(birth)
    d = Counter(death)
    alive = 0
    years = {}
    for year in range(min(birth), max(death) + 1):
        alive = alive + b[year] - d[year]
        years[year] = alive
    return max(years, key=years.get) if single else \
           [key for key, val in years.iteritems() if val == max(years.values())]

Better

from collections import Counter
from itertools import accumulate
import operator

def most_populated(population, single=True):
    delta = Counter(x[0] for x in population)
    delta.subtract(Counter(x[1]+1 for x in population))
    start, end = min(delta.keys()), max(delta.keys())
    years = list(accumulate(delta[year] for year in range(start, end)))
    return max(enumerate(years), key=operator.itemgetter(1))[0] + start if single else \
           [i + start for i, val in enumerate(years) if val == max(years)]

Solution 3

I would go like this:

  • Sort persons by birth year (unborn list)
  • Starting from the first born
    • Put that person in the alive list
    • Using an insertion sort by date of death (the list stays sorted, so use a binary search)
    • Until you reach a person that was not born that year
  • Then, starting from the person in the alive list that dies first, remove it from the list.
  • Put the size of the alive list in a dict
  • Increment the year
  • Loop until the unborn and alive lists are empty

Complexity should be around O((m + n) * log(m)) (each year is considered only once, and each person only twice, multiplied by the insertion cost in the alive list)

Implementation

from bisect import insort

def most_populated(population, single=True):
    years = dict()
    unborn = sorted(population, key=lambda x: -x[0])
    alive = []
    dead = []
    for year in range(unborn[-1][0], max(population, key=lambda x: x[1])[1] + 1):
        while unborn and unborn[-1][0] == year:
            insort(alive, -unborn.pop()[1])
        while alive and alive[-1] == -(year - 1):
            dead.append(-alive.pop())
        years[year] = len(alive)
    return max(years, key=years.get) if single else \
           [key for key, val in years.iteritems() if val == max(years.values())]

Solution 4

We can also use numpy slicing, which is quite neat, and should also be quite efficient:

import numpy as np
from collections import namedtuple

Person = namedtuple('Person', ('birth', 'death'))
people = [Person(1900,2000), Person(1950,1960), Person(1955, 1959)]

START_YEAR = 1900
END_YEAR = 2000
people_alive = np.zeros(END_YEAR - START_YEAR + 1) # Alive each year

for p in people:
    a = p.birth - START_YEAR
    b = p.death - START_YEAR + 1 # include year of death
    people_alive[a:b] += 1

# Find indexes of maximum aliveness and convert to year
most_alive = np.flatnonzero(people_alive == people_alive.max()) + START_YEAR

EDIT It seems like the namedtuple adds a bit of overhead, so to speed up a bit more, remove the namedtuple and do for birth, death in people: instead.

Solution 5

  1. Just put the birth and death years into a dict. If it is birth, increase the value by 1. or vice versa.
  2. Sort the dict by keys and iterate by reading the current number of the alive people.
  3. Follow the 'maxAlive' an 'theYear' to get the first year with the highest number

    years = {} 
    
    for p in people:
        if p.birth in years:
            years[p.birth] += 1
        else:
            years[p.birth] = 1
    
        if p.death in years:
            years[p.death] -= 1
        else:
            years[p.death] = -1
    
    alive = 0
    maxAlive = 0
    theYear = people[0].birth
    for year in sorted(years):
        alive += years[year]
        if alive > maxAlive:
            maxAlive = alive
            theYear = year
    
Share:
12,119

Related videos on Youtube

oski86
Author by

oski86

I graduated from Polish-Japanese Academy of Information Technology with a MSc Eng. degree in computer science (major databases). After 3 years of commercial work I returned to my university to become a PhD student in the field of social informatics. My work is connected with problems in a category of web intelligence, especially swarm creativity and collaborative innovation networks (COINs). I specialize in analysing quality of work in Open Source Software (OSS) repositories on the GitHub portal. My hobbies are Kendo, reading about successful startups, and travelling to Southeast Asia. Currently I work for company called TogetherData as a Data scientist & data engineer.

Updated on October 20, 2022

Comments

  • oski86
    oski86 over 1 year

    Given a list of people with their birth and end years (all between 1900 and 2000), find the year with the most number of people alive.

    Here is my somewhat brute-force solution:

    def most_populated(population, single=True):
        years = dict()
        for person in population:
            for year in xrange(person[0], person[1]):
                if year in years:
                    years[year] += 1
                else:
                    years[year] = 0
        return max(years, key=years.get) if single else \
               [key for key, val in years.iteritems() if val == max(years.values())]
    
    print most_populated([(1920, 1939), (1911, 1944),
                          (1920, 1955), (1938, 1939)])
    print most_populated([(1920, 1939), (1911, 1944),
                          (1920, 1955), (1938, 1939), (1937, 1940)], False)
    

    I'm trying to find a more efficient way to solve this problem in Python. Both - readability and efficiency counts. Moreover, for some reason my code won't print [1938, 1939] while it should.

    Update

    Input is a list of tuples, where first element of a tuple is a year when person was born, and second element of a tuple is the year of death.

    Update 2

    End year (2nd part of tuple) counts as well as a year of the person being alive (so If the person dies in Sept 1939 (we don't care about the month), he is actually alive in 1939, at least part of it). That should fix the 1939' missing in results.

    Best solution?

    While readability counts in favor of @joran-beasley, for bigger input most efficient algorithm was provided by @njzk2. Thanks @hannes-ovrén for providing analysis in IPython notebook on Gist

    • njzk2
      njzk2 almost 9 years
      years[year] = 1 for starters
    • Greg Hilston
      Greg Hilston almost 9 years
      @njzk2 Technically I think you're right from a readability standpoint, but every years[n] will be initialized to 0 and will still yield the "most_populated" year correctly. If the count was bring returned, I'd agree with you but only the year is being returned.
    • DSM
      DSM almost 9 years
      Let's say that you have 10 people born in 2000 and they all die in 2080. Then there are another 10 who are born in 2080 and they die in 2160. Do you want the maximum people alive to be 10 or 20 or 15 or something else? What is your assumption about what's going on in 2080?
    • Colonel Panic
      Colonel Panic almost 9 years
      For a harder problem playing with some of the same ideas, see Google Code Jam's The Great Wall code.google.com/codejam/contest/2437488/dashboard#s=p2
  • oski86
    oski86 almost 9 years
    Thanks, looks very neat. How can I change the code to count the 2nd part of tuple as well ? (death year should also count)
  • Mazdak
    Mazdak almost 9 years
    xrange(i,j+1) for i,j in pop in python2 and range(i,j+1) for i,j in pop is better!
  • DSM
    DSM almost 9 years
    This will be very slow for large populations because of insort.
  • njzk2
    njzk2 almost 9 years
    @DSM: insort is O(n), which makes my initial complexity computation wrong, but nontheless this is equivalent in complexity to considering each year of each person.
  • njzk2
    njzk2 almost 9 years
    you really think people_alive[a:b] += 1 is O(1)?
  • Hannes Ovrén
    Hannes Ovrén almost 9 years
    That is incredibly true. Not sure what I was thinking. :)
  • njzk2
    njzk2 almost 9 years
    using a table instead of a Counter, we could use itertools.accumulate to compute the values for each year in a single call
  • DSM
    DSM almost 9 years
    @nkjz2: I think I'd treat the input data as an event queue, with births being (t, 1) and deaths being (t, -1). You can sort that in N log N, and then simply loop over each event either adding or decreasing to the total population, tracking whatever you want.
  • oski86
    oski86 almost 9 years
    Accepted due to best timeit test: gist.github.com/oskar-j/308aafbb7ae44df7fed1 , result: joran-beasley: 0.0668609038671
  • njzk2
    njzk2 almost 9 years
    @DSM: yep, I realized something like that can be done. Also, since it is a relatively known set of dates, you can use a bucket sort (O(n)), or a Counter, like what I started in another answer.
  • oski86
    oski86 almost 9 years
    Timeit 10000 times seems to return a constant execution time, probably with bigger test data on the input it may work quite fine, but it needs further tests. +1 for using the numpy
  • njzk2
    njzk2 almost 9 years
    I like that answer because it is short, clear and readable, but beware of the complexity, if you have a more significant number of people or years to handle.
  • Hannes Ovrén
    Hannes Ovrén almost 9 years
    Considering that the maximum length of the range a:b is (assuming normal humans) in the order of 100, you could consider it to be O(1).
  • Hannes Ovrén
    Hannes Ovrén almost 9 years
    @oski86, Love the benchmark. It would be interesting to see results for different sizes of the input list as well. I would guess that my numpy version suffers for small N due to the construction of the people_alive array. For the record, I do think this answer should be the accepted one. I like it :)
  • Hannes Ovrén
    Hannes Ovrén almost 9 years
    Actually, I take that back. This algorithm is by far the slowest of the ones tried so far. It just happens to be fast for really small population sizes. For larger populations njzk2_var2 is fastest. See benchmark plot here: gist.github.com/hovren/bf5335ba45cd7eee811e the hannes_v2 algorithm is exactly the same as my old one, but without the namedtuple, since that added a bit of overhead.
  • oski86
    oski86 almost 9 years
    That's what I wanted to try yesterday (provide bigger rand test data), but it where already wee hours. I'm not sure about changing the accepted answer, but I have one idea, I'll update content in the question with providing a winning solution separately for a small and big data on input.
  • Colonel Panic
    Colonel Panic almost 9 years
    Neat. This is the dynamic programming algorithm I hinted to.
  • Nas Banov
    Nas Banov over 7 years
    so if i call it with most_populated([(-10025, -10000), (1960, 2010)]), for 2 people it will do >10200 iterations? This is terrible!
  • Machavity
    Machavity almost 7 years
    A link to a potential solution is always welcome, but please add context around the link so your fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline. Take into account that being barely more than a link to an external site is a possible reason as to Why and how are some answers deleted?.
  • Mr. T
    Mr. T about 6 years
    This is a Java answer, the OP asked for Python solutions. Please read How do I write a good answer?
  • Hadh
    Hadh almost 6 years
    I don't get the use of the second if statement, why are you adding +1 to the year of death? The first if statement is clear though.
  • Chris
    Chris over 2 years
    Welcome to Stack Overflow. Code dumps without any explanation are rarely helpful. Stack Overflow is about learning, not providing snippets to blindly copy and paste. Please edit your question and explain how it answers the specific question being asked. See How to Answer. This is particularly important when answering old questions (this one is over six years old) with existing answers.