8

I'm trying to implement neural networks in spark and scala but unable to perform any vector or matrix multiplication. Spark provide two vectors. Spark.util vector support dot operation but it is deprecated. mllib.linalg vectors do not support operations in scala.

Which one to use to store weights and training data?

How to perform vector multiplication in spark scala with mllib like w*x where w is vector or matrix of weights and x is input. pyspark vector support dot product but in scala I'm not able to find such function in vectors

zero323
  • 322,348
  • 103
  • 959
  • 935
gaurav.rai
  • 81
  • 1
  • 4

1 Answers1

7

Well, if you need a full support for linear algebra operators you have to implement these by yourself or use an external library. In the second case the obvious choice is Breeze.

It is already used behind the scenes so doesn't introduce additional dependencies and you can easily modify existing Spark code for conversions:

import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}

def toBreeze(v: Vector): BV[Double] = v match {
  case DenseVector(values) => new BDV[Double](values)
  case SparseVector(size, indices, values) => {
    new BSV[Double](indices, values, size)
  }
}

def toSpark(v: BV[Double]) = v match {
  case v: BDV[Double] => new DenseVector(v.toArray)
  case v: BSV[Double] => new SparseVector(v.length, v.index, v.data)
}

Mahout provides interesting Spark and Scala bindings you may find interesting as well.

For simple matrix vector multiplications it can be easier to leverage existing matrix methods. For example IndexedRowMatrix and RowMatrix provide multiply methods which can take a local matrix. You can check Matrix Multiplication in Apache Spark for an example usage.

Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935