September 27, 2024

Pitfalls with Mill Build

Here are some challenges and pitfalls I ran into when using Mill.

Pitfalls
Figure 1. Pitfalls

Tasks Must Return a Meaningful Result

Let’s re-visit a simplified front-end build script:

object `front-end` extends Module{
  def packageConfig = T.sources(millSourcePath / "package.json", millSourcePath / "package-lock.json")
  def sources = T.sources(millSourcePath / "src", millSourcePath / "css")

  // Most node commands expect that the dependencies are already installed.
  def installPackages = T {
    // Important: This task depends on the package description.
    val packages = packageConfig()
    os.proc("npm", "install").call(cwd = millSourcePath, stdout = Inherit)
  }

  def buildFrontEnd = T {
    // node build process expects packages to be installed
    val _ = installPackages()
    // depend on the source code
    val _ = sources()
    os.proc("npm", "run", "build-my-frontend").call(cwd = millSourcePath, stdout = Inherit)
  }
}

The first run looks great, and the second run uses the cached results as expected:

./mill front-end.buildFrontEnd
#03 [3/4] front-end.installPackages
#03
#03 up to date, audited 65 packages in 617ms
#03
#03 12 packages are looking for funding
#03   run `npm fund` for details
#03
#03 5 vulnerabilities (2 moderate, 3 high)
#03
#03 To address all issues, run:
#03   npm audit fix
#03
#03 Run `npm audit` for details.
#04 [4/4] front-end.buildFrontEnd
#04
#04 > build-my-frontend
#04 > echo `date` && mkdir -p ./public
#04
#04 Tue Sep 17 10:12:36 PM CEST 2024

./mill front-end.buildFrontEnd
#02 [2/4] front-end.sources

Ok, then we touch the package.json and run the build again:

./mill front-end.buildFrontEnd
#03 [3/4] front-end.installPackages
#03
#03 up to date, audited 65 packages in 596ms
#03
#03 12 packages are looking for funding
#03   run `npm fund` for details
#03
#03 5 vulnerabilities (2 moderate, 3 high)
#03
#03 To address all issues, run:
#03   npm audit fix
#03
#03 Run `npm audit` for details.

It ran the installPackages step. HOWEVER, the buildFrontEnd step got skipped? Why? The installPackages step only returns the command result (eg. status code, stdout). The return value of a target determins if something changed for the downstream tasks. The installPackages returns a successful status code after the package.json change: Therefore the inputs for follow-up tasks did not change.

Remember, you must return a meaningful result in each task. The best option is to return the build result, like the data produced, the artifact build (via PathRef), etc. In the worst case you can return again the inputs of that task or even a time stamp to indicate 'something changed'.

def myTarget = T{
  // the build step

  // Best case, something that got built
  PathRef(T.dest/"the-build-result")
  // Or a result of a calculation
  s"some-meta-info-used-later"
  // Example for 'npm' would be to return the 'node_modules'
  PathRef(millSourcePath/"node_modules")
  // However, node_modules can be a gigantic state, so you could return the "package-lock.json" as proxy
  PathRef(millSourcePath/"package-lock.json")
  // Worst case, return a time stamp to indicate 'something' changed. This should be your absolute last result
  s"last-build-${Instant.now()}"
}

Keep Things in the out directory

Mill avoids fragile builds by ensuring each build step gets their own build directory and the results are passed via explicit return values. This only works when your build step does write its data to the T.dest path, which will be created in the out directory.

This might be hard with some tools, eg NodeJS ecosystem, that prefer to implicitly look for files in the working directory and write files to there.

My tip: Do be afraid to copy things ino the T.dest directory for a few build steps and then run the tool. This way you keep the task isolation guarantees of Mill.

Sometimes you have to compromise. For example, I keep the NodeJS node_modules in the projects working dir, otherwise tooling like IDEs will get confused.

Mills Magic Tasks

Task Scheduling

Let’s take a look at these tasks:

def targetOne = T{
  for(i<-0 until 10){
    println("targetOne")
    Thread.sleep(100)
  }
  System.currentTimeMillis()
}

def targetTwo = T{
  for(i<-0 until 10){
    println("targetTwo")
    Thread.sleep(100)
 }
  System.currentTimeMillis()
}

def dependsOnTargets = T{
  targetOne()
  targetTwo()
}

From reading this as regular Scala code, you think that first targetOne, then targetTwo is executed. However, if you run it you will see this:

$ ./mill dependsOnTargets # In mill 0.12+ by default. Older versions required explict parallism opt in
#1 [1/3] targetOne
#2 targetTwo
#1 targetOne
#2 targetTwo
#1 targetOne
#1 targetOne
#2 targetTwo
#2 targetTwo
#1 targetOne
#2 targetTwo
#1 targetOne
#2 targetTwo
#1 targetOne
#2 targetTwo
#1 targetOne
#2 targetTwo
#1 targetOne
#1 targetOne
#2 targetTwo
#1 targetOne
#2 targetTwo
#3 [3/3] dependsOnTargets

The two tasks ran concurrently! WHAT? I think this is the hardest part of Mill: The bodies of tasks are macros that rewrite your code to enable concurrent building and all the other Mill features.

That means you must: - Rely on the return values of tasks only, not on any other side effect. - The order of when tasks run isn’t 1:1 in the code: For example I once tried to startup some server and shut it down across tasks: That never worked, because the re-ordering made it unreliable. I later used a proper Mill worker to achieve what I wanted.

Forgetting Parenthesis: Aka always consume the result

If you have a task you run only for the side effects (eg. installing NodeJS modules), then you might call it like this by accident:

def installPackages = T {
  // Important: This task depends on the package description.
  val packages = packageConfig()
  os.proc("npm", "install").call(cwd = millSourcePath, stdout = Inherit)
 }
def buildFrontEnd = T {
  // node build process expects packages to be installed
  installPackages
  // depend on source code
  sources
  os.proc("npm", "run", "build-my-frontend").call(cwd = millSourcePath, stdout = Inherit)
}

When you run the build:

$ ./mill front-end.buildFrontEnd
#02 [2/2] front-end.buildFrontEnd
#02 > build-my-frontend
#02 > ls node_modules && echo `date` && mkdir -p ./public
#02
#02 ls: cannot access 'node_modules': No such file or directory

Oh no, the build ran, but it didn’t install the node packages. Why? Because we forgot the add the parenthesis to the installPackage. Without the those, the target isn’t actually executed, only referenced. You must add the parenthesis:

def buildFrontEnd = T {
  // node build process expects packages to be installed
  installPackages()
  // depend on source code
  sources()
  os.proc("npm", "run", "build-my-frontend").call(cwd = millSourcePath, stdout = Inherit)
}

In general, if you do not use a return value of a task you call, it’s already sub-optimal because it means you rely on a side effect of that task. But sometimes it is unavoidable

No Conditional Tasks

You might are tempted to write if-else to run some optional tasks. Something like this:

def alwaysRun = T{
  System.currentTimeMillis()
}
def isExtraWorkEnabled = T.input {
  T.ctx().env.get("DO_EXTRA").isDefined
}
def optionalTask = T{
  println("Im optional")
  "Optional"
}
def build = T{
  val time = alwaysRun()
  val doExtra = isExtraWorkEnabled()
  println(s"Do Extra Work: ${doExtra}")
  if(doExtra){
    val optional = optionalTask()
  }
  time
}

Then you run it and you will get this surprising result:

$ ./mill build
#02 [2/4] isExtraWorkEnabled
#03 Im optional
#04 [4/4] build
#04 Do Extra Work: false

The if statement says that the extra task should not run, but it ran already. This is a deliberate design limitation of Mill: The task graph is build at the startup of the build. Tasks then cannot change it dynamically once they are running. So, of a task might need the result of another task, then that task will be executed.

Stability: Upcoming Mill versions, Scala 3 etc

Mill isn’t as rock stable as I would like, yet.

  • Mill 0.12 is right around the corner with some improvements, but also some breakages.

  • Mill 0.13 is planned to move to Scala 3, which is another set of breaking changes.

Even assuming Mill doesn’t introduce any breaking changes itself: Scala itself won’t be as stable as for example Java.

Overall: bumping a Mill version (0.11.x → 0.12.x) requires adjustments to your build at the moment

Debugging Build Issues

When a build has problems, I always try to run Mill in the most conservative mode:

  1. Clean the out directory, to start from scratch: ./mill clean

  2. Then run Mill in the most conservative mode: ./mill --no-server -j 1 <target>

This ensures Mill runs everything in serial fashion and in a single process, without running a background helper process. Once I get the build correct in this in constellation, I remove these parameters. So far it always worked. But building up the correct build is easier with less interleaving and on a single process.

If it works with --no-server, but it doesn’t work without it, then try killing the background worker with ./mill shutdown

Extra Information in /out directory

On every run Mill produces information about the run in the /out directory. These can be helpful:

  • mill-chrome-profile.json to open with the Chrome Dev tools.

  • mill-chrome-profile.json tells you each task executed, if was cached or not and its time etc.

Summary

This was a short run down of some of the pitfalls and troubles I encountered with Mill. If I encounter more, I’ll probably update this post retroactively in the near future.

Tags: Scala Mill Build Java Development