Error: only defined on a data frame with all numeric variables with ddply on large dataset

14,624

Solution 1

There are a number of ways to get statistics by group. I'll assume you have a bias for plyr, since your example uses it.

Remember that dlply() splits the data into smaller dataframes by the grouping variables, then it applies the requested function to each of the smaller dataframes. Therefore the function you pass should operate on a whole dataframe. sum() does not do this. You can write your own function, though.

Based on your description, what you want is something like this

myfun <- function(x) colSums(x[, c("Er_Count", "Mn_Count", "Bp_Count")])
dlply(Effort_All, c("Month", "Year", "Grid_ID", "SurveyID"), myfun)

Remember that the second argument to dlply() is the set of variables used for grouping. Not sure why you want the output as a list. Would it be easier to read if you used ddply (with the same arguments)?

Other approaches include using sqldf() or something like lapply().

=============== EDIT: Other approaches =============

sqldf is always very easy to read and understand:

output <- sqldf('select Month,Year,Grid_ID,SurveyID,
                        sum(Er_Count) as ercount, 
                        sum(Mn_Count) as mncount,
                        sum(Bp_Count) as bpcount
                 from Effort_All 
                 group by Month, Year, Grid_ID, SurveyID')

lapply works pretty much the same way as dlply. Just different arguments.

Solution 2

Also, you could use colwise from plyr

 dlply(Effort_All, .(Month, Year, Grid_ID, SurveyID), colwise(sum, .(Er_Count, Mn_Count, Bp_Count)))

Or summarise_each from dplyr

library(dplyr)
Effort_All%>%
group_by(Month, Year, Grid_ID, SurveyID) %>% 
summarise_each(funs(sum), Er_Count, Mn_Count, Bp_Count)
#Source: local data frame [6 x 7]
#Groups: Month, Year, Grid_ID

#    Month Year Grid_ID    SurveyID Er_Count Mn_Count Bp_Count
#  1     2 2002     884   200203SSL        8        7        0
#  2     4 2001     126   200104SSL       26       12        6
#  3     4 2004     399   200404SSL       12       33        5
#  4     6 2001     125   200106SSL       75        9        3
#  5    10 2000     517   200009SSL       60       30        3
#  6    12 2009     340 200912GAP09       14        6        2
Share:
14,624
akbreezo
Author by

akbreezo

Updated on June 28, 2022

Comments

  • akbreezo
    akbreezo almost 2 years

    I'm trying to calculate sums and means on a very large dataset (~22000 records) for several parameters (e.g. Er_Count, Mn_Count) by month, year , Survey ID and Grid ID. I tried this code initially to get overall sums:

    dlply(Effort_All,c("Er_Count","Mn_Count","Bp_Count"),sum)
    

    And received the following error: Error: only defined on a data frame with all numeric variables Since I cannot even get overall sums, I am unable to get statistics by the specific variables either. Do I need to split the data in some manner?

    I have included a sample dataset of 25 records below.

        structure(list(Grid_ID = structure(c(527L, 92L, 331L, 395L, 934L, 
        93L), .Label = c("1", "1,000", "1,001", "1,002", "1,003", "1,004", 
    "1,005", "1,006", "1,007", "1,008", "1,009", "1,010", "1,011", 
    "1,012", "1,013", "1,014", "1,015", "1,016", "1,017", "1,018", 
    "1,019", "1,020", "1,021", "1,022", "1,023", "1,024", "1,025", 
    "1,026", "1,027", "1,028", "1,029", "1,030", "1,031", "1,032", 
    "1,033", "1,034", "1,035", "1,036", "1,037", "1,038", "1,039", 
    "1,040", "1,041", "1,042", "1,043", "1,044", "1,045", "1,046", 
    "1,047", "1,048", "1,049", "1,050", "1,051", "1,052", "1,053", 
    "1,054", "1,055", "1,056", "1,057", "1,058", "1,059", "1,060", 
    "1,061", "10", "100", "101", "102", "103", "104", "105", "106", 
    "107", "108", "109", "11", "110", "111", "112", "113", "114", 
    "115", "116", "117", "118", "119", "12", "120", "121", "122", 
    "123", "124", "125", "126", "127", "128", "129", "13", "130", 
    "131", "132", "133", "134", "135", "136", "137", "138", "139", 
    "14", "140", "141", "142", "143", "144", "145", "146", "147", 
    "148", "149", "15", "150", "151", "152", "153", "154", "155", 
    "156", "157", "158", "159", "16", "160", "161", "162", "163", 
    "164", "165", "166", "167", "168", "169", "17", "170", "171", 
    "172", "173", "174", "175", "176", "177", "178", "179", "18", 
    "180", "181", "182", "183", "184", "185", "186", "187", "188", 
    "189", "19", "190", "191", "192", "193", "194", "195", "196", 
    "197", "198", "199", "2", "20", "200", "201", "202", "203", "204", 
    "205", "206", "207", "208", "209", "21", "210", "211", "212", 
    "213", "214", "215", "216", "217", "218", "219", "22", "220", 
    "221", "222", "223", "224", "225", "226", "227", "228", "229", 
    "23", "230", "231", "232", "233", "234", "235", "236", "237", 
    "238", "239", "24", "240", "241", "242", "243", "244", "245", 
    "246", "247", "248", "249", "25", "250", "251", "252", "253", 
    "254", "255", "256", "257", "258", "259", "26", "260", "261", 
    "262", "263", "264", "265", "266", "267", "268", "269", "27", 
    "270", "271", "272", "273", "274", "275", "276", "277", "278", 
    "279", "28", "280", "281", "282", "283", "284", "285", "286", 
    "287", "288", "289", "29", "290", "291", "292", "293", "294", 
    "295", "296", "297", "298", "299", "3", "30", "300", "301", "302", 
    "303", "304", "305", "306", "307", "308", "309", "31", "310", 
    "311", "312", "313", "314", "315", "316", "317", "318", "319", 
    "32", "320", "321", "322", "323", "324", "325", "326", "327", 
    "328", "329", "33", "330", "331", "332", "333", "334", "335", 
    "336", "337", "338", "339", "34", "340", "341", "342", "343", 
    "344", "345", "346", "347", "348", "349", "35", "350", "351", 
    "352", "353", "354", "355", "356", "357", "358", "359", "36", 
    "360", "361", "362", "363", "364", "365", "366", "367", "368", 
    "369", "37", "370", "371", "372", "373", "374", "375", "376", 
    "377", "378", "379", "38", "380", "381", "382", "383", "384", 
    "385", "386", "387", "388", "389", "39", "390", "391", "392", 
    "393", "394", "395", "396", "397", "398", "399", "4", "40", "400", 
    "401", "402", "403", "404", "405", "406", "407", "408", "409", 
    "41", "410", "411", "412", "413", "414", "415", "416", "417", 
    "418", "419", "42", "420", "421", "422", "423", "424", "425", 
    "426", "427", "428", "429", "43", "430", "431", "432", "433", 
    "434", "435", "436", "437", "438", "439", "44", "440", "441", 
    "442", "443", "444", "445", "446", "447", "448", "449", "45", 
    "450", "451", "452", "453", "454", "455", "456", "457", "458", 
    "459", "46", "460", "461", "462", "463", "464", "465", "466", 
    "467", "468", "469", "47", "470", "471", "472", "473", "474", 
    "475", "476", "477", "478", "479", "48", "480", "481", "482", 
    "483", "484", "485", "486", "487", "488", "489", "49", "490", 
    "491", "492", "493", "494", "495", "496", "497", "498", "499", 
    "5", "50", "500", "501", "502", "503", "504", "505", "506", "507", 
    "508", "509", "51", "510", "511", "512", "513", "514", "515", 
    "516", "517", "518", "519", "52", "520", "521", "522", "523", 
    "524", "525", "526", "527", "528", "529", "53", "530", "531", 
    "532", "533", "534", "535", "536", "537", "538", "539", "54", 
    "540", "541", "542", "543", "544", "545", "546", "547", "548", 
    "549", "55", "550", "551", "552", "553", "554", "555", "556", 
    "557", "558", "559", "56", "560", "561", "562", "563", "564", 
    "565", "566", "567", "568", "569", "57", "570", "571", "572", 
    "573", "574", "575", "576", "577", "578", "579", "58", "580", 
    "581", "582", "583", "584", "585", "586", "587", "588", "589", 
    "59", "590", "591", "592", "593", "594", "595", "596", "597", 
    "598", "599", "6", "60", "600", "601", "602", "603", "604", "605", 
    "606", "607", "608", "609", "61", "610", "611", "612", "613", 
    "614", "615", "616", "617", "618", "619", "62", "620", "621", 
    "622", "623", "624", "625", "626", "627", "628", "629", "63", 
    "630", "631", "632", "633", "634", "635", "636", "637", "638", 
    "639", "64", "640", "641", "642", "643", "644", "645", "646", 
    "647", "648", "649", "65", "650", "651", "652", "653", "654", 
    "655", "656", "657", "658", "659", "66", "660", "661", "662", 
    "663", "664", "665", "666", "667", "668", "669", "67", "670", 
    "671", "672", "673", "674", "675", "676", "677", "678", "679", 
    "68", "680", "681", "682", "683", "684", "685", "686", "687", 
    "688", "689", "69", "690", "691", "692", "693", "694", "695", 
    "696", "697", "698", "699", "7", "70", "700", "701", "702", "703", 
    "704", "705", "706", "707", "708", "709", "71", "710", "711", 
    "712", "713", "714", "715", "716", "717", "718", "719", "72", 
    "720", "721", "722", "723", "724", "725", "726", "727", "728", 
    "729", "73", "730", "731", "732", "733", "734", "735", "736", 
    "737", "738", "739", "74", "740", "741", "742", "743", "744", 
    "745", "746", "747", "748", "749", "75", "750", "751", "752", 
    "753", "754", "755", "756", "757", "758", "759", "76", "760", 
    "761", "762", "763", "764", "765", "766", "767", "768", "769", 
    "77", "770", "771", "772", "773", "774", "775", "776", "777", 
    "778", "779", "78", "780", "781", "782", "783", "784", "785", 
    "786", "787", "788", "789", "79", "790", "791", "792", "793", 
    "794", "795", "796", "797", "798", "799", "8", "80", "800", "801", 
    "802", "803", "804", "805", "806", "807", "808", "809", "81", 
    "810", "811", "812", "813", "814", "815", "816", "817", "818", 
    "819", "82", "820", "821", "822", "823", "824", "825", "826", 
    "827", "828", "829", "83", "830", "831", "832", "833", "834", 
    "835", "836", "837", "838", "839", "84", "840", "841", "842", 
    "843", "844", "845", "846", "847", "848", "849", "85", "850", 
    "851", "852", "853", "854", "855", "856", "857", "858", "859", 
    "86", "860", "861", "862", "863", "864", "865", "866", "867", 
    "868", "869", "87", "870", "871", "872", "873", "874", "875", 
    "876", "877", "878", "879", "88", "880", "881", "882", "883", 
    "884", "885", "886", "887", "888", "889", "89", "890", "891", 
    "892", "893", "894", "895", "896", "897", "898", "899", "9", 
    "90", "900", "901", "902", "903", "904", "905", "906", "907", 
    "908", "909", "91", "910", "911", "912", "913", "914", "915", 
    "916", "917", "918", "919", "92", "920", "921", "922", "923", 
    "924", "925", "926", "927", "928", "929", "93", "930", "931", 
    "932", "933", "934", "935", "936", "937", "938", "939", "94", 
    "940", "941", "942", "943", "944", "945", "946", "947", "948", 
    "949", "95", "950", "951", "952", "953", "954", "955", "956", 
    "957", "958", "959", "96", "960", "961", "962", "963", "964", 
    "965", "966", "967", "968", "969", "97", "970", "971", "972", 
    "973", "974", "975", "976", "977", "978", "979", "98", "980", 
    "981", "982", "983", "984", "985", "986", "987", "988", "989", 
    "99", "990", "991", "992", "993", "994", "995", "996", "997", 
    "998", "999"), class = "factor"), ER_Groups = c(2, 2, 2, 3, 5, 
    6), Er_Count = c(60, 75, 14, 12, 8, 26), Mn_Count = c(30, 9, 6, 33, 
    7, 12), Bp_Groups = c(1, 2, 1, 1, 0, 1), Bp_Count = c(3, 3, 2, 
    5, 0, 6), Mn_Groups = c(1, 1, 3, 1, 0, 0), Month = c(10L, 6L, 
    12L, 4L, 2L, 4L), Year = c(2000L, 2001L, 2009L, 2004L, 2002L, 
    2001L), SurveyID = structure(c(16L, 24L, 93L, 56L, 34L, 22L), .Label = c("199708HS", 
    "199808HS", "199908HS", "199909SSLQ", "199910SSL", "199911SSL", 
    "200001SSLQ", "200002SSL", "200003SSLQ", "200004SSLQ", "200005SSL", 
    "200006SSL", "200007SSL", "200008HS", "200008SSL", "200009SSL", 
    "200010SSL", "200011SSL", "200101SSL", "200102SSL", "200103SSL", 
    "200104SSL", "200105SSL", "200106SSL", "200107SSL", "200108HS", 
    "200108SSL", "200109SSL", "200110SSL", "200111SSL", "200112SSL", 
    "200201SSL", "200202SSL", "200203SSL", "200204SSL", "200205SSL", 
    "200206SSL", "200207SSL", "200208HS", "200208SSL", "200210SSL", 
    "200211SSL", "200212SSL", "200301SSL", "200302SSL", "200303SSL", 
    "200304SSL", "200305SSL", "200306SSL", "200307SSL", "200309SSL", 
    "200310SSL", "200311SSL", "200312SSL", "200403SSL", "200404SSL", 
    "200405SSL", "200406SSL", "200407SSL", "200408HS", "200408SSL", 
    "200409SSL", "200505SSL", "200506SSL", "200507SSL", "200510SSL", 
    "200512SSL", "200603SSL", "200609SSL", "200612SSL", "200709GAP07", 
    "200710GAP07", "200712GAP07", "200802GAP07", "200803GAP07", "200804GAP07", 
    "200805GAP07", "200806GAP07", "200807GAP07", "200808GAP07", "200809GAP08", 
    "200810GAP08", "200812GAP08", "200901GAP08", "200903GAP08", "200904GAP08", 
    "200905GAP08", "200906GAP08", "200907GAP08", "200908GAP08", "200909GAP08", 
    "200910GAP09", "200912GAP09", "201001GAP09", "201002GAP09", "201003GAP09", 
    "201004GAP09", "201005GAP09", "201006GAP09", "201007GAP09", "201008GAP09", 
    "201009GAP09", "201010GAP09", "201011GAP09", "201101GAP09", "201102GAP09", 
    "201103GAP09", "201104GAP09", "201106GAP09", "201108GAP09", "201109GAP09", 
    "201111GAP09", "201201GAP09", "201203GAP09", "201205GAP09", "201207GAP09", 
    "201208GAP09", "201211GAP09", "201301GAP09", "201303GAP09", "201305GAP09", 
    "201307GAP09", "201309GAP09", "201311GAP09"), class = "factor"), 
        Er_Group_Density = c(4, 9, 12, 4, 1, 0), Mn_Group_Density = c(3, 
        1, 1, 1, 0, 2), Bp_Group_Density = c(1, 2, 1, 0, 1, 0), Er_Count_Density = c(50, 
        14, 12, 9, 6, 4), Mn_Count_Density = c(9, 5, 2, 3, 2, 0), Bp_Count_Density = c(2, 
        3, 0, 4, 1, 0)), .Names = c("Grid_ID", "ER_Groups", "Er_Count", 
    "Mn_Count", "Bp_Groups", "Bp_Count", "Mn_Groups", "Month", "Year", 
    "SurveyID", "Er_Group_Density", "Mn_Group_Density", "Bp_Group_Density", 
    "Er_Count_Density", "Mn_Count_Density", "Bp_Count_Density"), row.names = c(2770L, 
    4421L, 17348L, 11263L, 6736L, 3974L), class = "data.frame")
    
  • akbreezo
    akbreezo almost 10 years
    Thanks! That worked well. I'm not partial to plyr necessarily and would be interested to know how you might tackle the problem differently