What does the error mean in a data.table with the function "seq" -- "The RHS length must either be 1 or match the LHS length exactly"?

10,130

I think what you are looking for can be done more easily with the special symbols in data.table. The one with .N is very helpful because it simply counts the number of rows in the data.table and if you specify a group it will count the number of rows within that group. So the code would look like this:

call_duration_diff_by_unit[, duration_seq := 1:.N, by = c("ID")]

Is this what you are going for?

Share:
10,130
Alice Kassinger
Author by

Alice Kassinger

Updated on August 01, 2022

Comments

  • Alice Kassinger
    Alice Kassinger almost 2 years

    I am attempting to:

    1. calculate the difference in call duration between police units responding to the same call
    2. identify the longest duration among a group of calls with the same call ID
    3. arrange in descending order of duration

    My steps to do so are found in the code snippets below.

    First, I arrange in descending order by ID (multiple calls with the same ID) and then arrange within that by the call duration in hours (descending).

    Then, I make my data.frame into a data.table.

    Then, apply sequences (descending) by duration.

    call_duration_diff_by_unit[, duration_seq := seq(CALL_DURATION_HOURS), by = c("ID")]

    This is where the problem occurs: I get an error that says

    "Error in [.data.table(call_duration_diff_by_unit, , :=(duration_seq, : Supplied 2 items to be assigned to group 1 of size 1 in column 'duration_seq'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code."

    The only explanation for this error I have found was specific to a unique package that I am not using. I understand the concept of "recycling" now, but not sure how it applies to this scenario... there aren't two vectors with different lengths.

    Could R be reading the by = c("ID") part incorrectly as a second input?

    call_duration_diff_by_unit <- cad_cfs_data %>% 
      arrange(desc(ID), desc(CALL_DURATION_HOURS))
    
    call_duration_diff_by_unit <- 
      data.table(call_duration_diff_by_unit)
    
    call_duration_diff_by_unit[, duration_seq := seq(CALL_DURATION_HOURS), by = c("ID")]
    
    

    I expected it to make a unique numeric ID (assigning 1 to the longest duration) for each group of unique call IDs. Instead, I get the error and it doesn't save the variable "duration_seq" for use later down in the code.

    • IceCreamToucan
      IceCreamToucan about 5 years
      The two lengths which need to match are the length of the vectors (columns) in the given ID group (i.e. the number of rows for which ID is equal to the given value), and the output of the RHS of :=.
    • Roland
      Roland about 5 years
      I think you actually want DT[, .(duration_seq = seq(...)), by = ...] but I'm not sure from the description. The error message is pretty clear: you assign a vector into the data.table that doesn't match its number of rows.
    • Alice Kassinger
      Alice Kassinger about 5 years
      Thanks @IceCreamToucan and @Roland! I guess I don't see how I'm assigning a vector to the data.table that doesn't match the number of rows. I'm using a function (seq) that should automatically create a numeric sequence that exactly matches the number of rows (and restarts at 1 each time a new ID starts). Can you explain which vector in the code it could be saying doesn't match?
    • IceCreamToucan
      IceCreamToucan about 5 years
      Based on what you just said, I think you should use seq_along instead of seq
  • Alice Kassinger
    Alice Kassinger about 5 years
    Thank you!! It doesn't explain why my old code stopped working, but your code works perfectly! That makes sense - I didn't know about those special symbols. I'll do some reading up.
  • Jason Johnson
    Jason Johnson about 5 years
    Great! Glad it helped and yes I’m not sure about the issues with seq but when using data.table the symbols are very helpful and efficient. There are lots of other cool tools in the data.table package and worth spending time learning. Good luck!
  • IceCreamToucan
    IceCreamToucan about 5 years
    As for why the previous code was working before and is not working now, Version 1.12.2 of data.table changed the recycling behavior. See the first item under "Changes in v1.12.2" here