Pitfalls with Mill Build
Here are some challenges and pitfalls I ran into when using Mill.
Tasks Must Return a Meaningful Result
Let’s re-visit a simplified front-end build script:
object `front-end` extends Module{
def packageConfig = T.sources(millSourcePath / "package.json", millSourcePath / "package-lock.json")
def sources = T.sources(millSourcePath / "src", millSourcePath / "css")
// Most node commands expect that the dependencies are already installed.
def installPackages = T {
// Important: This task depends on the package description.
val packages = packageConfig()
os.proc("npm", "install").call(cwd = millSourcePath, stdout = Inherit)
}
def buildFrontEnd = T {
// node build process expects packages to be installed
val _ = installPackages()
// depend on the source code
val _ = sources()
os.proc("npm", "run", "build-my-frontend").call(cwd = millSourcePath, stdout = Inherit)
}
}
The first run looks great, and the second run uses the cached results as expected:
./mill front-end.buildFrontEnd #03 [3/4] front-end.installPackages #03 #03 up to date, audited 65 packages in 617ms #03 #03 12 packages are looking for funding #03 run `npm fund` for details #03 #03 5 vulnerabilities (2 moderate, 3 high) #03 #03 To address all issues, run: #03 npm audit fix #03 #03 Run `npm audit` for details. #04 [4/4] front-end.buildFrontEnd #04 #04 > build-my-frontend #04 > echo `date` && mkdir -p ./public #04 #04 Tue Sep 17 10:12:36 PM CEST 2024 ./mill front-end.buildFrontEnd #02 [2/4] front-end.sources
Ok, then we touch the package.json
and run the build again:
./mill front-end.buildFrontEnd #03 [3/4] front-end.installPackages #03 #03 up to date, audited 65 packages in 596ms #03 #03 12 packages are looking for funding #03 run `npm fund` for details #03 #03 5 vulnerabilities (2 moderate, 3 high) #03 #03 To address all issues, run: #03 npm audit fix #03 #03 Run `npm audit` for details.
It ran the installPackages
step. HOWEVER, the buildFrontEnd
step got skipped?
Why?
The installPackages
step only returns the command result (eg. status code, stdout).
The return value of a target determins if something changed for the downstream tasks.
The installPackages
returns a successful status code after the package.json change:
Therefore the inputs for follow-up tasks did not change.
Remember, you must return a meaningful result in each task.
The best option is to return the build result, like the data produced, the artifact build (via PathRef
), etc.
In the worst case you can return again the inputs of that task
or even a time stamp to indicate 'something changed'.
def myTarget = T{
// the build step
// Best case, something that got built
PathRef(T.dest/"the-build-result")
// Or a result of a calculation
s"some-meta-info-used-later"
// Example for 'npm' would be to return the 'node_modules'
PathRef(millSourcePath/"node_modules")
// However, node_modules can be a gigantic state, so you could return the "package-lock.json" as proxy
PathRef(millSourcePath/"package-lock.json")
// Worst case, return a time stamp to indicate 'something' changed. This should be your absolute last result
s"last-build-${Instant.now()}"
}
Keep Things in the out directory
Mill avoids fragile builds by ensuring each build step gets their own build directory and
the results are passed via explicit return values.
This only works when your build step does write its data to the T.dest
path, which will be
created in the out
directory.
This might be hard with some tools, eg NodeJS ecosystem, that prefer to implicitly look for files in the working directory and write files to there.
My tip: Do be afraid to copy things ino the T.dest
directory for a few build
steps and then run the tool. This way you keep the task isolation
guarantees of Mill.
Sometimes you have to compromise. For example, I keep the NodeJS node_modules
in
the projects working dir, otherwise tooling like IDEs will get confused.
Mills Magic Tasks
Task Scheduling
Let’s take a look at these tasks:
def targetOne = T{
for(i<-0 until 10){
println("targetOne")
Thread.sleep(100)
}
System.currentTimeMillis()
}
def targetTwo = T{
for(i<-0 until 10){
println("targetTwo")
Thread.sleep(100)
}
System.currentTimeMillis()
}
def dependsOnTargets = T{
targetOne()
targetTwo()
}
From reading this as regular Scala code, you think that first targetOne
, then targetTwo
is executed.
However, if you run it you will see this:
$ ./mill dependsOnTargets # In mill 0.12+ by default. Older versions required explict parallism opt in #1 [1/3] targetOne #2 targetTwo #1 targetOne #2 targetTwo #1 targetOne #1 targetOne #2 targetTwo #2 targetTwo #1 targetOne #2 targetTwo #1 targetOne #2 targetTwo #1 targetOne #2 targetTwo #1 targetOne #2 targetTwo #1 targetOne #1 targetOne #2 targetTwo #1 targetOne #2 targetTwo #3 [3/3] dependsOnTargets
The two tasks ran concurrently! WHAT? I think this is the hardest part of Mill: The bodies of tasks are macros that rewrite your code to enable concurrent building and all the other Mill features.
That means you must: - Rely on the return values of tasks only, not on any other side effect. - The order of when tasks run isn’t 1:1 in the code: For example I once tried to startup some server and shut it down across tasks: That never worked, because the re-ordering made it unreliable. I later used a proper Mill worker to achieve what I wanted.
Forgetting Parenthesis: Aka always consume the result
If you have a task you run only for the side effects (eg. installing NodeJS modules), then you might call it like this by accident:
def installPackages = T {
// Important: This task depends on the package description.
val packages = packageConfig()
os.proc("npm", "install").call(cwd = millSourcePath, stdout = Inherit)
}
def buildFrontEnd = T {
// node build process expects packages to be installed
installPackages
// depend on source code
sources
os.proc("npm", "run", "build-my-frontend").call(cwd = millSourcePath, stdout = Inherit)
}
When you run the build:
$ ./mill front-end.buildFrontEnd #02 [2/2] front-end.buildFrontEnd #02 > build-my-frontend #02 > ls node_modules && echo `date` && mkdir -p ./public #02 #02 ls: cannot access 'node_modules': No such file or directory
Oh no, the build ran, but it didn’t install the node packages. Why?
Because we forgot the add the parenthesis to the installPackage
.
Without the those, the target isn’t actually executed, only referenced.
You must add the parenthesis:
def buildFrontEnd = T {
// node build process expects packages to be installed
installPackages()
// depend on source code
sources()
os.proc("npm", "run", "build-my-frontend").call(cwd = millSourcePath, stdout = Inherit)
}
In general, if you do not use a return value of a task you call, it’s already sub-optimal because it means you rely on a side effect of that task. But sometimes it is unavoidable
No Conditional Tasks
You might are tempted to write if-else to run some optional tasks. Something like this:
def alwaysRun = T{
System.currentTimeMillis()
}
def isExtraWorkEnabled = T.input {
T.ctx().env.get("DO_EXTRA").isDefined
}
def optionalTask = T{
println("Im optional")
"Optional"
}
def build = T{
val time = alwaysRun()
val doExtra = isExtraWorkEnabled()
println(s"Do Extra Work: ${doExtra}")
if(doExtra){
val optional = optionalTask()
}
time
}
Then you run it and you will get this surprising result:
$ ./mill build
#02 [2/4] isExtraWorkEnabled
#03 Im optional
#04 [4/4] build
#04 Do Extra Work: false
The if statement says that the extra task should not run, but it ran already.
This is a deliberate design limitation of Mill: The task graph is build at the startup
of the build. Tasks then cannot change it dynamically once they are running.
So, of a task might
need the result of another task, then that task will
be executed.
Stability: Upcoming Mill versions, Scala 3 etc
Mill isn’t as rock
stable as I would like, yet.
Mill 0.12 is right around the corner with some improvements, but also some breakages.
Mill 0.13 is planned to move to Scala 3, which is another set of breaking changes.
Even assuming Mill doesn’t introduce any breaking changes itself: Scala itself won’t be as stable as for example Java.
Overall: bumping a Mill version (0.11.x → 0.12.x) requires adjustments to your build at the moment
Debugging Build Issues
When a build has problems, I always try to run Mill in the most conservative mode:
Clean the out directory, to start from scratch:
./mill clean
Then run Mill in the most conservative mode:
./mill --no-server -j 1 <target>
This ensures Mill runs everything in serial fashion and in a single process, without running a background helper process. Once I get the build correct in this in constellation, I remove these parameters. So far it always worked. But building up the correct build is easier with less interleaving and on a single process.
If it works with --no-server
, but it doesn’t work without it,
then try killing the background worker with ./mill shutdown
Extra Information in /out directory
On every run Mill produces information about the run in the /out
directory.
These can be helpful:
mill-chrome-profile.json
to open with the Chrome Dev tools.mill-chrome-profile.json
tells you each task executed, if was cached or not and its time etc.
Summary
This was a short run down of some of the pitfalls and troubles I encountered with Mill. If I encounter more, I’ll probably update this post retroactively in the near future.