How to calculate the percentage of each element in a list?

18,932

Solution 1

You can use count(i) to determine to number of occurrences of the numbers 1-4 and divide it by 5 to obtain the percentage:

sequence=list(zip(*['123', '134', '234', '214', '223']))
percentages=[]
for x in sequence:
    t=list(x)
    temp=[t.count(str(i))/len(x) for i in range(1,5)]  #work out the percentage of each number
    percentages.append(temp) #add percentages to list

Or, as one list comprehension:

percentages=[[list(x).count(str(i))/len(x) for i in range(1,5)]for x in sequence]

Output:

[[0.4, 0.6, 0.0, 0.0], [0.2, 0.4, 0.4, 0.0], [0.0, 0.0, 0.4, 0.6]]

Solution 2

starting from your approach, you could do the rest with a Counter

from collections import Counter

for item in zip(*['123', '134', '234', '214', '223']):
    c = Counter(item)
    total = sum(c.values())
    percent = {key: value/total for key, value in c.items()}
    print(percent)

    # convert to list
    percent_list = [percent.get(str(i), 0.0) for i in range(5)]
    print(percent_list)

which prints

{'2': 0.6, '1': 0.4}
[0.0, 0.4, 0.6, 0.0, 0.0]
{'2': 0.4, '3': 0.4, '1': 0.2}
[0.0, 0.2, 0.4, 0.4, 0.0]
{'4': 0.6, '3': 0.4}
[0.0, 0.0, 0.0, 0.4, 0.6]

Solution 3

You could start by creating the zipped list as you did:

zipped = zip(*l)

then map an itertools.Counter to it as to get the counts of each item in the results from zip:

counts = map(Counter, zipped)

and then go through it, creating a list out of their counts divided by their sizes:

res = [[c[i]/sum(c.values()) for i in '1234'] for c in counts]
print(res) 
[[0.4, 0.6, 0.0, 0.0], [0.2, 0.4, 0.4, 0.0], [0.0, 0.0, 0.4, 0.6]]

If you are a one-liner kind of person, mush the first two in the comprehension to get this in one line:

res = [[c[i]/sum(c.values()) for i in '1234'] for c in map(Counter, zip(*l))]

additionally, as noted in a comment, if you don't know the elements ahead of time, sorted(set(''.join(l))) could replace '1234'.

Share:
18,932

Related videos on Youtube

Jassy.W
Author by

Jassy.W

Updated on September 26, 2022

Comments

  • Jassy.W
    Jassy.W over 1 year

    I have this list with 5 sequence of numbers:

    ['123', '134', '234', '214', '223'] 
    

    and I want to obtain the percentage of each number 1, 2, 3, 4 in the ith position of each sequence of numbers. For example, the numbers at 0th position of this 5 sequences of numbers are 1 1 2 2 2, then I need to calculate the percentage of 1, 2, 3, 4 in this sequence of numbers and return the percentage as 0th element of a new list.

    ['123', '134', '234', '214', '223']
    
    0th position: 1 1 2 2 2   the percentage of 1,2,3,4 are respectively: [0.4, 0.6, 0.0, 0.0]
    
    1th position: 2 3 3 1 2   the percentage of 1,2,3,4 are respectively: [0.2, 0.4, 0.4, 0.0]
    
    2th position: 3 4 4 4 3   the percentage of 1,2,3,4 are respectively: [0.0, 0.0, 0.4, 0.6]]
    

    Then desired result is to return:

    [[0.4, 0.6, 0.0, 0.0], [0.2, 0.4, 0.4, 0.0], [0.0, 0.0, 0.4, 0.6]]
    

    My attempt so far:

    list(zip(*['123', '134', '234', '214', '223']))
    

    Result:

     [('1', '1', '2', '2', '2'), ('2', '3', '3', '1', '2'), ('3', '4', '4', '4', '3')]
    

    But I got stuck here, then I don't know how to calculate the percentage of the element of each numbers of 1, 2, 3, 4, then obtain the desired result. Any suggestion is appreciated!

  • BallpointBen
    BallpointBen over 7 years
    If you don't know the elements ahead of time, set(''.join(l)) should replace '1234'.
  • BallpointBen
    BallpointBen over 7 years
    Too much hard-coding. /5 should be replaced by /len(x), and range(1,5) should be replaced by set(''.join(l)) (and you can then replace str(i) with just i).
  • Jassy.W
    Jassy.W over 7 years
    @Jim Fasarakis-Hilliard Thanks for your solution, but I don't very understand this syntax [[c[i]/sum(c.values()) for i in '1234'] for c in map(Counter, zip(*l))] because I am not very familiar with map function. I checked and got to know that it may mean to apply the function Counter on zip(*l)) , then I thought why not directly use Counter(zip(*l)) , but i found Counter(zip(*l)) do not work. Then I checked and found map(Counter, zip(*l)) returns <map object at 0x033C1930> . I don't know what it means. Could you explain for me why map(Counter, zip(*l)) works? Thank you.