0

I need to build filtered classifier in weka multiple times on different training instances in one go. I have post the sample code to make my point clear

import weka.classifiers.meta.FilteredClassifier;
 import weka.classifiers.trees.J48;
 import weka.filters.unsupervised.attribute.Remove;
 ...
 Instances train = ...         // from somewhere
 Instances test = ...          // from somewhere
 // filter
 Remove rm = new Remove();
 rm.setAttributeIndices("1");  // remove 1st attribute
 // classifier
 J48 j48 = new J48();
 j48.setUnpruned(true);        // using an unpruned J48
 // meta-classifier
 FilteredClassifier fc = new FilteredClassifier();
 fc.setFilter(rm);
 fc.setClassifier(j48);
 // train and make predictions
 fc.buildClassifier(train);
 for (int i = 0; i < test.numInstances(); i++) {
   double pred = fc.classifyInstance(test.instance(i));
   System.out.print("ID: " + test.instance(i).value(0));
   System.out.print(", actual: " + test.classAttribute().value((int) test.instance(i).classValue()));
   System.out.println(", predicted: " + test.classAttribute().value((int) pred));
 }

Inside the for loop after printing data to console, i need to rebuild FilteredClassifier (fc) again on another training data set. I am currently trying to do it but have no success as if i use same instance of FilteredClassifier (fc) or create a new instance of FilteredClassifier, Weka raises a NullPointerException.

How can i do what i want to do ? Do i need to use any wait() or notify() operations if FilteredClassifier creates a thread in order to suspend its operation in case i am using another Instance of FilteredClassifier ?

Here is pintStack of exception raised by JVM

java.lang.NullPointerException
    at java.util.Hashtable.hash(Unknown Source)
    at java.util.Hashtable.get(Unknown Source)
    at weka.core.Attribute.addStringValue(Attribute.java:868)
    at weka.core.StringLocator.copyStringValues(StringLocator.java:148)
    at weka.core.StringLocator.copyStringValues(StringLocator.java:93)
    at weka.filters.Filter.copyValues(Filter.java:364)
    at weka.filters.Filter.bufferInput(Filter.java:301)
    at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:697)
    at weka.filters.Filter.useFilter(Filter.java:661)
    at weka.classifiers.meta.FilteredClassifier.buildClassifier(FilteredClassifier.java:390)

i appreciate any sort of help ...

Kashif Khan
  • 301
  • 6
  • 17

1 Answers1

1

First, I don't know the reason but this could be useful: I come across exactly with the same exception and I solved it.

I was merging two datasets into a bigger one. in abstract

for (int i=0; i < datasetB.numInstances(); i++) { Instance instance = datasetB.instance(i); datasetA.add(instance); }

datasetA contains A+B

But, when I tried to work with datasetA, smth like

public MyResponse classify(String msg) {
    ...

    // rebuild classififer and filter 
    Instances filteredData = Filter.useFilter(dataset, filter); //BREAKS
    ...

    // classify
    MyResponse response = classifier.classifyInstance(filteredInstance)
}

It says

java.lang.NullPointerException
at java.util.Hashtable.hash(Unknown Source)
at java.util.Hashtable.get(Unknown Source)
at weka.core.Attribute.addStringValue(Attribute.java:868)
at weka.core.StringLocator.copyStringValues(StringLocator.java:148)
at weka.core.StringLocator.copyStringValues(StringLocator.java:93)
at weka.filters.Filter.copyValues(Filter.java:364)
at weka.filters.Filter.bufferInput(Filter.java:301)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:697)
at weka.filters.Filter.useFilter(Filter.java:661)

The solution was: think in the instance of the datasetB as if it were a new one.

If you build a new instance, you do something similar to

// Msg: String, Class: String
private Instance makeInstance(String text, String classValue) {
  Instance instance = new Instance(2); // two attributes
  Attribute messageAttribute = data.attribute("Msg");
  instance.setValue(messageAttribute, messageAttribute.addStringValue(text));
  instance.setClassValue(classValue);
  instance.setDataset(this.dataset);
  return instance;
}

The same with the instance of datasetB

private Instance makeInstance(Instance i) {
    Instance instance = new Instance(2); // two attributes
Attribute messageAttribute = dataset.attribute("Msg");
    instance.setValue(messageAttribute, messageAttribute.addStringValue(getMsg(i)));
instance.setDataset(this.dataset);
instance.setClassValue(getClassValue(i));
    return instance
}

And call this method in the merging method

for (int i=0; i < data.numInstances(); i++) {
Instance instance = data.instance(i);
Instance buildInstance = makeInstance(instance);
dataset.add(buildInstance);
 }
Manu Artero
  • 9,238
  • 6
  • 58
  • 73