Sparse Vector vs Dense Vector

48,898

Solution 1

Unless I have thoroughly misunderstood your doubt, the MLlib data type documentation illustrates this quite clearly:

import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;

// Create a dense vector (1.0, 0.0, 3.0).
Vector dv = Vectors.dense(1.0, 0.0, 3.0);
// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries.
Vector sv = Vectors.sparse(3, new int[] {0, 2}, new double[] {1.0, 3.0});

Where the second argument of Vectors.sparse is an array of the indices, and the third argument is the array of the actual values in those indices.

Solution 2

Sparse vectors are when you have a lot of values in the vector as zero. While a dense vector is when most of the values in the vector are non zero.

If you have to create a sparse vector from the dense vector you specified, use the following syntax:

import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;

Vector sparseVector = Vectors.sparse(4, new int[] {1, 3}, new double[] {3.0, 4.0});
Share:
48,898

Related videos on Youtube

Anoop Toffy
Author by

Anoop Toffy

Ninja Warrior

Updated on April 02, 2020

Comments

  • Anoop Toffy
    Anoop Toffy about 4 years

    How to create SparseVector and dense Vector representations

    if the DenseVector is:

    denseV = np.array([0., 3., 0., 4.])
    

    What will be the Sparse Vector representation ?

    • Nick Chammas
      Nick Chammas over 7 years
      For those who read the title of "Sparse Vector vs Dense Vector" and were looking for an explanation of when to use which, this answer has the information you're looking for.
  • Anoop Toffy
    Anoop Toffy almost 9 years
    Oh, I was not passing the right count of indices. SparseV = SparseVector(4, [0, 1, 2, 3], [0., 3., 0., 4.])
  • Anoop Toffy
    Anoop Toffy almost 9 years
    what is the significance of a dot after number ie 1. ?
  • Abhinav Sood
    Abhinav Sood over 7 years
    The dot just indicates a floating point type. 1. is equivalent to 1.0
  • Chthonic Project
    Chthonic Project almost 5 years
    @MohitShah (i) It is literally the first code example on the linked documentation, and (ii) the answer also includes the example showing exactly how to create a sparse vector.