How to implement R's p.adjust in Python
Solution 1
If you wish to be sure of what you are getting from R, you can also indicate that you wish to use the function in the R package 'stats':
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import FloatVector
stats = importr('stats')
p_adjust = stats.p_adjust(FloatVector(pvalue_list), method = 'BH')
Solution 2
This question is a bit old, but there are multiple comparison corrections available in statsmodels for Python. We have
Solution 3
Here is an in-house function I use:
def correct_pvalues_for_multiple_testing(pvalues, correction_type = "Benjamini-Hochberg"):
"""
consistent with R - print correct_pvalues_for_multiple_testing([0.0, 0.01, 0.029, 0.03, 0.031, 0.05, 0.069, 0.07, 0.071, 0.09, 0.1])
"""
from numpy import array, empty
pvalues = array(pvalues)
n = float(pvalues.shape[0])
new_pvalues = empty(n)
if correction_type == "Bonferroni":
new_pvalues = n * pvalues
elif correction_type == "Bonferroni-Holm":
values = [ (pvalue, i) for i, pvalue in enumerate(pvalues) ]
values.sort()
for rank, vals in enumerate(values):
pvalue, i = vals
new_pvalues[i] = (n-rank) * pvalue
elif correction_type == "Benjamini-Hochberg":
values = [ (pvalue, i) for i, pvalue in enumerate(pvalues) ]
values.sort()
values.reverse()
new_values = []
for i, vals in enumerate(values):
rank = n - i
pvalue, index = vals
new_values.append((n/rank) * pvalue)
for i in xrange(0, int(n)-1):
if new_values[i] < new_values[i+1]:
new_values[i+1] = new_values[i]
for i, vals in enumerate(values):
pvalue, index = vals
new_pvalues[index] = new_values[i]
return new_pvalues
Solution 4
Using Python's numpy library, without calling out to R at all, here's a reasonably efficient implementation of the BH method:
import numpy as np
def p_adjust_bh(p):
"""Benjamini-Hochberg p-value correction for multiple hypothesis testing."""
p = np.asfarray(p)
by_descend = p.argsort()[::-1]
by_orig = by_descend.argsort()
steps = float(len(p)) / np.arange(len(p), 0, -1)
q = np.minimum(1, np.minimum.accumulate(steps * p[by_descend]))
return q[by_orig]
(Based on the R code BondedDust posted)
Solution 5
(I know this is not the answer... just trying to be helpful.) The BH code in R's p.adjust is just:
BH = {
i <- lp:1L # lp is the number of p-values
o <- order(p, decreasing = TRUE) # "o" will reverse sort the p-values
ro <- order(o)
pmin(1, cummin(n/i * p[o]))[ro] # n is also the number of p-values
}
drbunsen
Updated on June 17, 2022Comments
-
drbunsen almost 2 years
I have a list of p-values and I would like to calculate the adjust p-values for multiple comparisons for the FDR. In R, I can use:
pval <- read.csv("my_file.txt",header=F,sep="\t") pval <- pval[,1] FDR <- p.adjust(pval, method= "BH") print(length(pval[FDR<0.1])) write.table(cbind(pval, FDR),"pval_FDR.txt",row.names=F,sep="\t",quote=F )
How can I implement this code in Python? Here was my feable attempt in Python with the help of Google:
pvalue_list [2.26717873145e-10, 1.36209234286e-11 , 0.684342083821...] # my pvalues pvalue_lst = [v.r['p.value'] for v in pvalue_list] p_adjust = R.r['p.adjust'](R.FloatVector(pvalue_lst),method='BH') for v in p_adjust: print v
The above code throws an
AttributeError: 'float' object has no attribute 'r'
error. Can anyone help point out my problem? Thanks in advance for the help! -
drbunsen over 12 years@Igautier Thanks for the help! When i run your code, Python throws an
ImportError: No module named packages
error. Any idea what the problem is? I'm running R 2.13.1. -
lgautier over 12 yearsI'd say you are using an outdated version of rpy2. Try rpy2.__version__ if unsure. Current is 2.2.2.
-
drbunsen over 12 yearsYep, this works for me with R 2.2x. Unfortunately, I'm stuck with using R 2.13.1 on a remote server. Any suggestions?
-
lgautier over 12 yearshmmm... I am referring to rpy2 version, not R versions. Ask an upgrade of rpy2 to your system administrators or upgrade it for yourself (consider using the Python package 'virtualenv' to create your customized Python).
-
drbunsen over 12 yearsSorry for the confusion. I misread your comments. I updated my local rpy2 to 2.2x and your code worked. Thank you very much for the help!
-
Vladimir over 8 yearsShould be
float(len(p))
, otherwise it will be integer division -
Dataman almost 8 years@jseabold: Hi, a quick question about the
multipletests
? How does this function take care of NaN values in the list of p-values when using it withBH
? It seems that it assumes that all the p-values are finite, is that right? -
Stunts about 7 yearsExcellent solution. I have ported it to python 3 and placed it on a repository on github. If you wish me to add your name to the copyright line, please provide me with it via PM.