Simple PySpark BigDL test: Optimizer fails

Question

Running BigDL example at: https://bigdl-project.github.io/0.4.0/#ProgrammingGuide/optimization/ in PySpark local node:

from bigdl.nn.layer import Linear
from bigdl.util.common import *
from bigdl.nn.criterion import MSECriterion
from bigdl.optim.optimizer import Optimizer, MaxIteration
import numpy as np

sc = SparkContext(appName="simple",conf=create_spark_conf())
init_engine()

model = Linear(2, 1)
samples = [
  Sample.from_ndarray(np.array([5, 5]), np.array([2.0])),
  Sample.from_ndarray(np.array([-5, -5]), np.array([-2.0])),
  Sample.from_ndarray(np.array([-2, 5]), np.array([1.3])),
  Sample.from_ndarray(np.array([-5, 2]), np.array([0.1])),
  Sample.from_ndarray(np.array([5, -2]), np.array([-0.1])),
  Sample.from_ndarray(np.array([2, -5]), np.array([-1.3]))
]

train_data = sc.parallelize(samples, 1)
optimizer = Optimizer(model, train_data, MSECriterion(), MaxIteration(100), 4)
optimizer.optimize()
model.get_weights()[0]

Results in the following exception. Other then BigDL tests work in PySpark. Environment: openjdk version "1.8.0_141, Python 3.5.3 (default, Jan 19 2017, 14:11:04) [GCC 6.3.0 20170118] on linux

Any ideas? Is BigDL a live project, actively maintained?

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    2018-02-28 22:40:20 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-02-28 22:40:20 WARN  Utils:66 - Your hostname, dk resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
2018-02-28 22:40:20 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-02-28 22:40:24 WARN  SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
cls.getname: com.intel.analytics.bigdl.python.api.Sample
BigDLBasePickler registering: bigdl.util.common  Sample
cls.getname: com.intel.analytics.bigdl.python.api.EvaluatedResult
BigDLBasePickler registering: bigdl.util.common  EvaluatedResult
cls.getname: com.intel.analytics.bigdl.python.api.JTensor
BigDLBasePickler registering: bigdl.util.common  JTensor
cls.getname: com.intel.analytics.bigdl.python.api.JActivity
BigDLBasePickler registering: bigdl.util.common  JActivity
disableCheckSingleton is deprecated. Please use bigdl.check.singleton instead
                                                                                                                                                                                                                                                                                                                                                                        /usr/local/lib/python3.5/dist-packages/bigdl/util/engine.py:41: UserWarning: Find both SPARK_HOME and pyspark. You may need to check whether they match with each other. SPARK_HOME environment variable is set to: /opt/spark, and pyspark is found in: /usr/local/lib/python3.5/dist-packages/pyspark/__init__.py. If they are unmatched, please use one source only to avoid conflict. For example, you can unset SPARK_HOME and use pyspark only.
warnings.warn(warning_msg)
Prepending /usr/local/lib/python3.5/dist-packages/bigdl/share/conf/spark-bigdl.conf to sys.path
creating: createLinear
creating: createMSECriterion
creating: createMaxIteration
creating: createDefault
creating: createSGD
creating: createDistriOptimizer
Traceback (most recent call last):
  File "simple.py", line 22, in <module>
    optimizer.optimize()
  File "/usr/local/lib/python3.5/dist-packages/bigdl/optim/optimizer.py", line 591, in optimize
    jmodel = callJavaFunc(get_spark_context(), self.value.optimize)
  File "/usr/local/lib/python3.5/dist-packages/bigdl/util/common.py", line 590, in callJavaFunc
    result = func(*args)
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/local/lib/python3.5/dist-packages/py4j/protocol.py", line 319, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o48.optimize.
: java.lang.ExceptionInInitializerError
    at com.intel.analytics.bigdl.optim.DistriOptimizer.optimize(DistriOptimizer.scala:860)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException
    at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1314)
    at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1237)
    at java.util.concurrent.Executors.newFixedThreadPool(Executors.java:151)
    at com.intel.analytics.bigdl.parameters.AllReduceParameter$.<init>(AllReduceParameter.scala:47)
    at com.intel.analytics.bigdl.parameters.AllReduceParameter$.<clinit>(AllReduceParameter.scala)
    ... 12 more

score 0 · Answer 1 · answered Mar 01 '18 at 15:13

0

Yes BIGDL is actively maintained. The proper way to define a bigdl model is by using sequential API or functional API.
Sequential API

model = Sequential()
model.add(Linear(...))
model.add(Sigmoid())
model.add(Softmax())

Functional API

linear = Linear(...)()
sigmoid = Sigmoid()(linear)
softmax = Softmax()(sigmoid)
model = Model([linear], [softmax])

see here.

answered Mar 01 '18 at 15:13

pauli

4,191
2
25
41

Thanks, but this doesn't answer my question: Why example from BigDL site that I post here fails? – dokondr Mar 02 '18 at 19:11

score 0 · Answer 2 · answered Apr 22 '20 at 16:03

I just started using BigDL myself. I use PySpark and noticed that even the default function calls fail. I literally dug into the source code, read the documentation there, and then changed the way I call it based on what I read.

It may help for you to do the same. From the error you've posted, it looks like it doesn't like some argument being passed to it. This is less a "you" problem and more a "code-not-being-in-line-with-documentation" problem.

Simple PySpark BigDL test: Optimizer fails

2 Answers2