How to get dot product of two sparsevectors in O(m+n) , where m and n are the number of elements in both vectors

10,128

You can do it if your vectors are stored as a linked list of tuples whith each tuple containing the index and the value of a non zero element and sorted by the index.

You iterate through both vectors, by selecting the next element from the vector where you are at the lower index. If the indexes are the same you multiply the elements and store the result.

Repeat until one list reaches the end.

Since you have one step per non zero element in each list, the complexity is O(m+n) as required.

Footnote: The datastructure doesn't have to be linked list, but must provide a O(1) way to access the next non 0 element and it's index.

Share:
10,128
Altaïr
Author by

Altaïr

I study computer science

Updated on June 18, 2022

Comments

  • Altaïr
    Altaïr almost 2 years

    I have two sparse vectors X and Y and want to get the dot product in O(m+n) where m and n are the numbers of non-zero elements in X and Y. The only way I can think of is picking each element in vector X and traverse through vector Y to find if there is element with the same index. But that would take O(m * n). I am implementing the vector as a linked list and each node has an element.

    • A.S.H
      A.S.H over 8 years
      Are the lists already sorted by the index?
    • Jens Schauder
      Jens Schauder over 8 years
      @A.S.H m and n aren't the sizes of the vectors, but the number of non-zero elements in each vector
    • A.S.H
      A.S.H over 8 years
      Yes i saw that after edit. But still, the answers are assuming that the lists are sorted by the index. This needs to be clearly stated I think,
    • stgatilov
      stgatilov over 8 years
      I suggest switching from linked lists to arrays if you worry about performance.
    • Altaïr
      Altaïr over 8 years
      @stgatilov I can't switch. It is an assignment and has to be linked lists implementation.
    • Jens Schauder
      Jens Schauder over 8 years
      @stgatilov arrays are a very bad idea for sparse data.
    • stgatilov
      stgatilov over 8 years
      @JensSchauder: You are absolutely wrong here. Look at sparse formats in MKL. If you can find any linked list there, please let me know =) Sparse data are stored in arrays to reduce memory overhead, reduce memory fragmentation, and reduce low-level time needed to iterate over elements.
    • Jens Schauder
      Jens Schauder over 8 years
      @stgatilov ok, I see what you mean. I understood your comment to mean the naive storage in an array, which would result in lots of zeros stored.
  • Jens Schauder
    Jens Schauder over 8 years
    @A.S.H I tried to explain it in my edit but it seems rather obvious to me that it is O(m+n)
  • Altaïr
    Altaïr over 8 years
    But what if I have a vector with more elements than the other ? For instance X = [(15,3.0), (1500,3.14) , (15000, 3.141), (150000, 3.1415)] and Y = [(15,1.0), (150000,1.0)] I will run out of elements in Y while iterating through X
  • A.S.H
    A.S.H over 8 years
    @Jens, i agree, but you are assuming that the vector lists are sorted by the index. This is not clearly stated by the OP :)
  • Altaïr
    Altaïr over 8 years
    I can't see how this is O(m+N). How is it gonna be done on vectors with different number of non-zero elements ? You can't iterate through both at the sametimes
  • Altaïr
    Altaïr over 8 years
    We still didn't cover yet merge algorithms in class. one person before you suggested that I iterate through both vectors at the same time and check if the indices are equal. if they are multiply the values and store the result. But I don't how this will work. What if i have two vectors with different number of elements. For instance X = [(15,3.0), (1500,3.14) , (15000, 3.141), (150000, 3.1415)] and Y = [(15,1.0), (150000,1.0)] I will run out of elements in Y while iterating through X and I will null pointer exception. Isn't that right ?
  • stgatilov
    stgatilov over 8 years
    @Altaïr: Since merge is a well-known algorithm, you can find explanation yourself. Here is a similar question, here is algolist article, here is some question, etc. You can see some code here in context of merge sort, it shows what to do when one sequence ends first. P.S. Added solution for unsorted case.
  • Jens Schauder
    Jens Schauder over 8 years
    @A.S.H you are right about the sorting, no idea why your edit was rejected.
  • A.S.H
    A.S.H over 8 years
    @JensSchauder no worries, I suggest you edit it yourself, so that it becomes complete.
  • Jens Schauder
    Jens Schauder over 8 years
    @Altaïr you only advance the list where you are currently at the lower index, until you reach the end of one of the lists.