How to Microbenchmark using data from a file?

Question

I'm trying to microbenchmark two different implementations of calculating a running median using Scalameter. I've some test files, small and large, from where the numbers are going to come from. Problem is, the following code completes instantly without generating any sort of benchmark at all.

object MedianMaintenanceBenchmark extends Bench[Double] {

  /* configuration */

  lazy val executor = LocalExecutor(
    new Warmer.Default,
    Aggregator.median[Double],
    measurer
  )
  lazy val measurer = new Measurer.Default
  lazy val reporter = new LoggingReporter[Double]
  lazy val persistor: Persistor.None.type = Persistor.None

  /* inputs */

  private val files: Gen[String] = Gen.enumeration("files")("median-test")
  private val num: Gen[Seq[Int]] = (for (f <- files) yield numbers(f)).cached

  /* tests */

  performance of "MedianMaintenance" config (
    exec.benchRuns -> 10
    ) in {
    measure method "using heap" in {
      using(num) in {
        xs => MedianMaintenanceUsingHeaps(xs).medians
      }
    }
  }

  private def numbers(filename: String): Seq[Int] = // elided
}

Output:

::Benchmark MedianMaintenance.using heap::
cores: 8
hostname: ***
name: OpenJDK 64-Bit Server VM
osArch: x86_64
osName: Mac OS X
vendor: Azul Systems, Inc.
version: 11.0.1+13-LTS
Parameters(files -> median-test): 3.612799 ms

What is going on here?

Edit:

Changing the code as follows atleast does something, but doesn't honor the options. It seems to be running the test a total of 18 times for file 'Median', which is not a sum of 3 + 10.

object MedianMaintenanceBenchmark extends Bench.ForkedTime {

  /* configuration */
  override def aggregator: Aggregator[Double] = Aggregator.median

  private val opts = Context(
    exec.minWarmupRuns-> 3,
    exec.maxWarmupRuns -> 3,
    exec.benchRuns -> 10,
    exec.jvmflags -> List("-Xms2g", "-Xmx2g")
  )

  /* inputs */

  private val files: Gen[String] = Gen.enumeration("files")("median-test", "Median")
  private val num: Gen[Seq[Int]] = (for (f <- files) yield numbers(f)).cached

  /* tests */

  performance of "MedianMaintenance" config opts in {
    measure method "using heap" in {
      using(num) in {
        xs => MedianMaintenanceUsingHeaps(xs).medians
      }
    }

    measure method "using red-black BST" in {
      using(num) in {
        xs => MedianMaintenanceUsingRedBlackTree(xs).medians
      }
    }
  }

  private def numbers(filename: String): Seq[Int] = // elided
}

Abhijit Sarkar · Accepted Answer · 2018-12-11T23:33:10.993

OP here: after several hours, I was finally able to get through the pathetically outdated documentation, what exists anyway, and figure out the following:

In addition to my edit above, there are several ways to override execution count et. al.

For all benchmarks in the current file, use override def defaultConfig: Context = Context(exec.benchRuns -> 10)
For a specific benchmark, define inline or define a val opts: Context and use config opts in DSL.
For a specific method, do same as #2 except use config opts in at method DSL.
The docs claim that it's possible to override configurations for each "curve", but I was not able to find what a "curve" is or how to override configurations for it.

independentSamples = number of independent JVMs spawned.

Warmup is run (minWarmupRuns to maxWarmupRuns) times on each JVM (makes sense) using one set of test data (picked randomly?), then the tests are run on each JVM for benchRuns times. How many warmups are run depends on the detection of "steady state". There seems to be one execution unaccounted for in the end for each JVM.

Total number of executions = independentSamples * ((minWarmupRuns to maxWarmupRuns) + benchRuns + 1)

For instance, given:

Context(
  exec.minWarmupRuns -> 5,
  exec.maxWarmupRuns -> 5,
  exec.benchRuns -> 10,
  exec.independentSamples -> 2
)

There will be 32 executions of the code under test.

How to Microbenchmark using data from a file?

1 Answers1