Understanding Core.Command
and Cmdliner
Introduction
Command-line parse libraries e.g. Core.Command
and Cmdliner
are notoriously difficult to understand and use. The tension may come from people require a quick solution, while both these libraries target to be full-feathered tools with abstractions and concepts. This post aims to provide an understanding guide for them. Their official tutorials are at RWO/Command-Line Parsing and cmdliner/tutorial. I hope you can be much clear to jump back to them after reading this guide.
Four Concepts
Both libraires provide four levels of concepts:
Concept | Stage | Composable | Core.Command | Cmdliner |
---|---|---|---|---|
Argument Parser | Step 3 | no | 'a Arg_type.t | 'a Arg.conv = 'a parser * 'a printer |
Argument Handler | Step 3 | yes | 'a Param.t | 'a Term.t |
Command Item | Step 2 | yes | unit Command.t | 'a Cmd.t |
Driver | Step 1 | no | Command.run | Cmd.eval , Cmd.eval_value |
Argument Parser provides functions to parse a raw string to the expected type 'a
. They both provide parsers inside Core/Command/Spec and Cmdliner/Predefined_converters for common types. Cmdliner.Arg.conv
is a pair of a parser with a printer.
Argument Handler wraps a parser with extra properties to handle command-line argument e.g. whether it's required or optional, whether it's repeated, with a flag or not, the doc string for it. Argument Handler may not need Parsing in examples that a flag argument carries no values. The existence or not means a boolean parameter e.g. -v
(rather than -log=WARNING
).
In both libraries, Argument Handlers are not directed used by Drivers. Argument Handlers are packed into a Command Item, and then drivers take Command Item and perform the real parsing work. A Command Item for a whole sub-command e.g. push ...
in git push ...
. You may have other sub-commands like git clone
, git pull
, and you need to group these Command Items into a compound Command Item.
In Table 1, The Column Stage lists the time order during a real command-line parsing.
- Step 1: Driver are usually the unique top function of your program. It takes the
Sys.argv
and invodes your Command Item. - Step 2: Driver dispatches to the Command Item.
- Step 3: Argument Handler performs the parsing and handling, with the Argument Parser in it if having one.
Driver functions
Core.Command
can only build up to a Command.t
. It doesn't provide any driver functions. The driver function Core.Command.run
is defined in another library core_unix.command_unix
. It takes the Core.Command.t
and start the work.
Cmdliner
has a variant of driver functions e.g. Cmd.eval
and Cmd.eval_value
. Since they're supposed to be the unique top functions, Cmd.eval
returns a unit
while Cmd.eval
returns a int
(standard error code) that is used by exit
.
It's a common myth that people seek to get the unboxed parsed result. Such a function is not even provided in Core.Command
. It's do-able with Cmdliner.Cmd.eval_value : ... -> ?argv:string array -> 'a Cmd.t -> ('a Cmd.eval_ok, Cmd.eval_error) result
. However, you need to tokenize to get argv
yourself (Imparient readers can jump to Section Elimination).
Diagrams for Core.Command
and Cmdliner
Both Core.Command
and Cmdliner
have two-layered compositional datatypes. An element in the inner layer is to parse one key-value pair (or key-only or value-only). For example, we're going to parse -v -a=1 -b=t 42
.
The inner layer for Core.Command
is a compositional Param.t
. We will have four Param.t
that are
bool Param.t
for-v
int Param.t
for-a=1
in which aint Arg_type
to parse1
string Param.t
for-b=t
in which astring Arg_type
to parset
int Param.t
for42
in which aint Arg_type
to parse42
The inner layer for Cmdliner
is a compositional Term.t
. We will have four Term.t
that are
bool Term.t
for-v
int Term.t
for-a=1
in which aint Arg.conv
to parse1
string Term.t
for-b=t
in which astring Arg.conv
to parset
int Term.t
for42
in which aint Arg.conv
to parse42
The inner layer data are wrapped into outer layer data Core.Command.t
or Cmdliner.Cmd.t
via packing function Core.Command.basic
or Cmdliner.Cmd.v
. A outer layer data is usually used for argparsing one command-line case. It is also composable and is used to group sub-commands. Core.Command.group
takes (string * Core.Command.t) list
and returns a Core.Command.t
. Cmdlinder.Cmd.group
takes Cmdliner.Cmd.t list
and returns a Cmdliner.Cmd.t
.
Their diagrams are very alike in the respective of our four concepts. A rectangle corresponds to a type. An edge is a function that transforms datatype. A rounded rectangle is also a function but at an endpoint (It's rare. Only two driver functions and one Arg.flag
). Four compositional datatypes Param.t
Command.t
Term.t
Cmd.t
should be in stacked rectangles (but here I just use a rectangle with double edges). I omit the doc-related components to make it clear.
--- title: Figure 1.1 - Diagram for Core.Command --- graph LR subgraph at [Argument Parser] AT[Arg_type.t] end subgraph sp [Argument Handler] subgraph fa [Flag & Anons] CA1[Anons.t] CF1[Flag.t] CF2[no_arg : Flag.t] end AT --> |"Anons.(%:)"| CA1 AT --> |Param.required| CF1 P0[[Param.t]] --> |"Param.map"| P0 CA1--> |Param.anon| P0 CF1 --> |Param.flag| P0 CF2 --> |Param.flag| P0 end subgraph sg [Command Item] C0[[Command.t]] --> | Command.group | C0 end P0 --> |Command.basic| C0 subgraph sd [Driver] direction LR Cr(Command.run) C0 -.-> Cr end
--- title: Figure 1.2 - Diagram for Cmdliner --- graph LR subgraph ap [Argument Parser] AC[Arg.conv] end AC --> |Arg.pos| AT1 AC --> |Arg.opt| AT2 subgraph sa [Argument Handler] subgraph at [Arg.t] AT1[Arg.t] AT2[Arg.t] AT3(Arg.flag ~> Arg.t) end AT1 --> |Arg.value| P0 AT2 --> |Arg.value| P0 AT3 --> |Arg.value| P0 P0[[Term.t]] --> |"Term.($)"| P0 end P0 --> |Cmd.v| C0 subgraph sg [Command Item] direction LR C0[[Cmd.v]] --> |Cmd.group| C0 end subgraph sd [Driver] direction LR Cr(Cmd.eval) Cv(Cmd.eval_value) C0 .-> Cr C0 .-> Cv end
How Param.t
and Term.t
are made
A Core.Command.t
consists of the flagged parameters and anonymous (flag-less) parameters. A Cmdlinder.t
is consists of optional arguments and positional arguments. They are Argument Handlers. Note Argument Handlers use Argument Parsers.
In Core.Command
, A primitive 'a Param.t
can made up from ingridients
'a Arg_type.t
parsesstring
to'a
'a Flag.t
can wrap'a Arg_type.t
asrequired
,optional
, oroptional_with_default
'bool Flag.t
requires no'a Arg_type.t
. Therefore its existence denotes atrue
orfalse
'a Anons.t
which wraps'a Arg_type.t
Param.flag
makes'a Flag.t
a'a Param.t
Param.anon
makes'a Anons.t
a'a Param.t
In Cmdlinder
, the ingridients to make up a primitive 'a Term.t
are:
'a Arg.conv
defines both a parser and a printer for'a
Arg.opt
wraps'a Arg.conv
an optional flagged argument'a Arg.t
.Arg.pos
wraps'a Arg.conv
and makes a positional argument at certain index'a Arg.t
Arg.flag
makes a pure flag optional argumentbool Arg.t
Arg.value
makes'a Arg.t
a'a Term.t
The listing and the diagram are not complete, but they are sufficient to illuminate.
Pack Argument Handler to Command Item
Driver functions takes a Command Item packed from a Argument Handler. Param.t
and Term.t
can compose just like parser combinator or prettyprinter. They should be Applicative
(or also Contravariant
?)
Pack Command.Param.t
to Command.t
The pack function for Command
is Command.basic
.
# #require "core";;
# open Core;;
# Command.Param.map;;
- : 'a Command.Spec.param -> f:('a -> 'b) -> 'b Command.Spec.param = <fun>
# #show Command.basic;;
val basic : unit Command.basic_command
# #show Command.basic_command;;
type nonrec 'result basic_command =
summary:string ->
?readme:(unit -> string) ->
(unit -> 'result) Command.Spec.param -> Command.t
Type Command.Spec.param
is an alias for Command.Param.t
. We can see the types for first argument and the return type of Command.Param.map
, and the second-to-last argument of Command.basic_command
, are all Command.Param.t
. 'result
is set to be unit
in Command.basic
.
The task left for users is to provide a function that maps parsed result 'a Param.t
to (unit -> unit) Param.t
.
The following two lines are equal. It's a partially applied Param.map
from string Param.t
to an unknown ~f : string -> '_weak
:
# Command.Param.(%:);;
- : string -> 'a Command/2.Arg_type.t -> 'a Command.Spec.anons = <fun>
# Command.Param.string;;
- : string Command/2.Arg_type.t = <abstr>
# Command.Param.(map (anon ("filename" %: string)));;
- : f:(string -> '_weak1) -> '_weak1 Command.Spec.param = <fun>
# Command.(let s : string Param.Arg_type.t = Param.string in let a = Param.(%:) "filename" s in Param.map (Param.anon a));;
- : f:(string -> '_weak2) -> '_weak2 Command.Spec.param = <fun>
In the following example of Command.basic
, our function (fun file () -> ignore file)
satisfied the type requirement. The observation here is Command.basic
requires an argument of type (unit -> unit) Command.Spec.param
. The user comsume-all-parsed code is usually like Param.map a_b_c_param ~f:(fun a b c () -> ...; () ) : (unit -> unit) Command.Spec.param
. The parsed result is passed before the last argument ()
.
# Command.basic ~summary:"fairy file" Command.Param.(map (anon ("filename" %: string)) ~f:(fun file () -> ignore file));;
- : Command.t = <abstr>
To make it familiar to RWO readers, we use let%map_open
instead of Param.map
. The code is equivalent to the above one:
# #require "ppx_jane";;
# let%map_open.Command file = (anon ("filename" %: string)) in file;;
- : string Command.Spec.param = <abstr>
# Command.basic ~summary:"fairy file" (let%map_open.Command file = (anon ("filename" %: string)) in fun () -> ignore file);;
- : Command.t = <abstr>
Pack Cmdliner.Term.t
to Cmdliner.Cmd.t
Cmdliner
uses pure (Term.const : 'a -> 'a t
) and ap (Term.($) : ('a -> 'b) t -> 'a t -> 'b t
) to compose Term.t
. Unlike Command.basic
, the user comsume-all-parsed code is usually like Term.(const (fun a b c -> ...; () )) $ a_param $ b_param $ c_param : unit Term.t
. Term.v
packs this value to get a Cmd.t
, then this Cmd.t
can be used in the driver function Cmd.eval
.
# #require "cmdliner";;
# open Cmdliner;;
# Arg.(&);;
- : ('a -> 'b) -> 'a -> 'b = <fun>
# Arg.value;;
- : 'a Cmdliner.Arg.t -> 'a Term.t = <fun>
# Arg.string;;
- : string Arg.converter = (<fun>, <fun>)
# let file = Arg.(value & pos 0 string "default_name" (Arg.info []));;
val file : string Term.t = <abstr>
# Cmd.v (Cmd.info "fairy file") Term.((const (fun filename -> ignore filename)) $ file );;
- : unit Cmd.t = <abstr>
The difference between their approaches is Command
puts the comsume-all-parsed function at last while Cmdliner
puts it at first. They both use functional jargons e.g. let%map_open
, Anon.(%:)
, Arg.(&)
and Term.($)
. They help to compact the code, at the cost of confusing newcomers.
Syntax for Command-line Languages
It's not explicitly specified, but their design and APIs suggest (or check) possible syntax for the command-line languages. e.g. for the flag-less parts, Command
treats them as anonymous arguments and allow the a sequence of them. The run-time validity of Param.t
is less powerful than its static type. In this example, Anonymous int
after a list of anonymous string
causes an exception for its syntax.
# let my_param =
(let%map_open.Command str_lst = anon (sequence ("str" %: string))
and extra_int = anon ("int" %: int) in
fun () -> ());;
val my_param : (unit -> unit) Command.Spec.param = <abstr>
# Command.basic ~summary:"my_param" my_param ;;
Exception:
Failure
"the grammar [STR ...] INT for anonymous arguments is not supported because there is the possibility for arguments (INT) following a variable number of arguments ([STR ...]). Supporting such grammars would complicate the implementation significantly.".
Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
Called from Base__List0.fold in file "src/list0.ml", line 37, characters 27-37
Called from Command.Anons.Grammar.concat in file "command/src/command.ml", line 1000, characters 10-729
Called from Command.basic in file "command/src/command.ml", line 2373, characters 14-22
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-14
Cmdliner
treats flag-less parts as positional arguments. The manual warns it's the user's duty to ensure the unique use of a position. Arg
provides functions to either preciously points to one position index, or to range all, or to cover the left/right to a given position index. Duplication of position index will trigger run-time exception.
# let top_t =
let ai = Arg.info [] in
let str_lst : string list Term.t = Arg.(value & pos_all string [] ai) in
let extra_int : int Term.t = Arg.(value & pos 1 int 0 ai) in
let handle_all : (string list -> int -> unit) Term.t =
Term.const @@ fun _str_list _extra_int -> ()
in
Term.(handle_all $ str_lst $ extra_int);;
val top_t : unit Term.t = <abstr>
# let c =
let i = Cmd.info "" in
Cmd.v i top_t ;;
val c : unit Cmd.t = <abstr>
# Cmd.eval_value ~argv:[| "a"; "2" |] c;;
- : (unit Cmd.eval_ok, Cmd.eval_error) result = Stdlib.Ok (`Ok ())
# Cmd.eval_value ~argv:[| "a"; "b" |] c;;
: invalid value 'b', expected an integer
Usage: [OPTION]… [ARG]…
Try ' --help' for more information.
- : (unit Cmd.eval_ok, Cmd.eval_error) result = Stdlib.Error `Parse
Summary and Questions
To recap, after skipping documenting, escaping, error code, auto-completing, two of the most popular command-line libraries Core.Command
and Cmdliner
can be grasped via four concepts i.e. Argument Parser, Argument Handler, Command Item, and Driver. The diagram not only shows their internal relation, but also reveal the similarity between them. I don't have time to re-comment all their documents and tutorials from my viewpoint, but you're well equipped to go back for them ((for Command
and for Cmdliner
)) after this post.
Explicit syntax for for command-line languages and static safety typed parsing arise very interesting problems to me. I will delve into it in future posts.
Elimination
For command-line parsing, OCaml has build-in Sys.argv
and standard library Arg
module (tutorial).
⚠️ We can let Core.Command
return the parsed result with a mutable reference, even its pack function and driver function requires a unit
return value. When a functional program is not a functional program?
# let store = ref "" in
Command.basic ~summary:"fairy file" Command.Param.(map (anon ("filename" %: string)) ~f:(fun file () -> store := file; ()));;
- : Command.t = <abstr>