How to write this DCG more elegantly?
Mia Lopez
Playing around with DCGs and stubled upon the following problem:
I want to parse as well as produce an exact amount of spaces (" "). My trivial approach of simply doing this:
trivial_nat_space(0) --> [].
trivial_nat_space(N) --> { N > 0, N is M+1 }, " ", trivial_nat_space(M).failed terribly, because of insufficient instantiation of N and M depending on whether i do
?- String=" ", my_string_codes(String,Codes), phrase(trivial_nat_space(Anz_Spaces), Codes, [])or
?- Anz_Spaces=3,my_string_codes(String,Codes), phrase(trivial_nat_space(Anz_Spaces), Codes, [])where (for convenience)
my_string_codes(S,C) :- when((ground(S);ground(C)), string_codes(S,C)).searching for a nice solution to the problem I made a version that depends on self defined nats:
z.
s(z).
s(s(O)) :- s(O).
nat_num(S,C) :- when((ground(S);ground(C)),nat_num_(S,C)).
nat_num_(z,0) :- !.
nat_num_(s(N),X) :- nonvar(X), !, X > 0, Y is X-1, nat_num_(N,Y).
nat_num_(s(N),X) :- var(X), nat_num_(N,Y), X is Y+1.
n_space(z) --> [].
n_space(s(N)) --> " ", n_space(N).which I would like to avoid because the additional encoding of the natural number is kind of already present in the builtin numbers.
and this:
nat_space(0) --> [].
nat_space(N) --> { var(N) }, " ", nat_space(M), { N is M+1 }.
nat_space(M) --> { nonvar(M), M>0 }, " ", { N is M-1 }, nat_space(N).which does work fine:
?- Anz_Spaces=3,my_string_codes(String,Codes), once(phrase(nat_space(Anz_Spaces), Codes, [])).
Anz_Spaces = 3,
String = " ",
Codes = [32, 32, 32].
?- String=" ",my_string_codes(String,Codes), once(phrase(nat_space(Anz_Spaces), Codes, [])).
String = " ",
Codes = [32, 32, 32],
Anz_Spaces = 3.However the encoding of nat_spaces is (in my opinion) far from nice: it depends on meta-predicates to enforce a specific execution sequence, and (more seriously): if the parser were more complex than just " ", the logic would have to be defined in a seperate DCG predicate/rule resulting in the logic for a single parser/generator to be split into two definitions (the separated one and the one enforcing the correct execution sequence).
Is this the canonical/standard way of solving problems like this or is there a more general, elegant solution that I am missing right now?
Additional Info:
I am using SWI-Prolog version 8.3.9 for x86_64-linux
with :- [library(dcg/basics)] and no additional arguments when starting the runtime. Nor do I set any settings in the file with the definitions.
3 Answers
Frankly, your original definition doesn't fail that terribly. No, it does not fail. For the most general query, it produces one solution,
?- phrase(trivial_nat_space(0), Cs). Cs = [] % pure perfect logic!
; false.
?- phrase(trivial_nat_space(-1), Cs). false. % it's right to be false!
?- phrase(trivial_nat_space(1), Cs). error(instantiation_error,(is)/2). % er...
?- phrase(trivial_nat_space(N), Cs). % most general query Cs = [], N = 0 % again, pure logic
; error(instantiation_error,(is)/2). % mmh...... and otherwise an instantiation error. Instantiation errors are not the worst that can happen. They clearly and honestly state that more information (= instantiations) must be provided before we can continue. That is much better than to pretend everything is fine when it is not. Think of a clerk who asks for more information as producing an instantiation error. And then compare this to one that just fills out your IRS forms with some bold default assumptions1.
To localize the reason for an instantiation error, we will use a failure-slice. So I will throw in some false goals and also an additional instantiation to make it even easier:
trivial_nat_space(0) --> [], {false}. trivial_nat_space(N) --> {N = 1}, { N > 0, N is M+1, false }," ",trivial_nat_space(M). ?- phrase(trivial_nat_space(1), Cs). error(instantiation_error,(is)/2).
This is a pretty disfunctional program! And still it produces an instantiation error. In order to fix your original program we have to modify something in the remaining visible part. In N is M+1 only the M can cause that error. In fact, it occurs here for the first time. We can replace it by M is N-1 which improves your program:
?- phrase(trivial_nat_space(1), Cs). Cs = " " % see section double quotes
; false.
?- phrase(trivial_nat_space(2), Cs). Cs = " "
; false.
?- phrase(trivial_nat_space(N), Cs). Cs = [], N = 0
; error(instantiation_error,(is)/2). % still ...
?- phrase(trivial_nat_space(N), " "). error(instantiation_error,(is)/2).Our program now works at least when the concrete number of spaces is known. Even better, we can also use arithmetic expressions!
?- phrase(trivial_nat_space(4*1), Cs). % let's indent by four spaces Cs = " "
; false.
?- phrase(trivial_nat_space(4*2), Cs). % ... twice ... Cs = " "
; false.
?- phrase(trivial_nat_space(4*0), Cs). % ... none? false.
?- phrase(trivial_nat_space(0), Cs). Cs = [] % here it works
; false.So N may be an arithmetic integer expression, and it works as expected, except for 0 which must be stated literally. That is not really a deep problem, no algebraic properties are violated. But elegant it is not. Let's remember that.
Back to the instantiation errors. To handle these cases as well we need some way to deal with this variable N. The easiest way is to use library(clpz) or its predecessor in SWI library(clpfd) as proposed in another answer. And yes, you can do such things manually, but thereby you are duplicating the work that has been invested into that library. It might make sense for performance reasons sometimes, but it will come at a hefty (bug ridden) price.
So let's look at @gusbro's solution and don't forget to add
:- use_module(library(clpz)). % SICStus or Scryer
:- use_module(library(clpfd)). % SWI?- phrase(trivial_nat_space(N), Cs). Cs = [], N = 0
; Cs = " ", N = 1 % finally, logic!
; Cs = " ", N = 2
; Cs = " ", N = 3
; ... .
?- phrase(trivial_nat_space(N), " "). N = 2
; false.
?- N = 1+1, phrase(trivial_nat_space(N), " "). N = 1+1 % nice like is/2
; false.
?- phrase(trivial_nat_space(N), " "), N = 1+1. false. % and out, why?Everything is nice and dandy, up to the last query. So that extension with arithmetic expressions did not work out so nicely. Effectively it boils down to the following problem:
?- N = 1+1, N #= 2. N = 1+1.
?- N #= 2, N = 1+1. false.In the first query, we solve the integer-equation 1+1 #= 2 which succeeds, and in the second query, we solve N #= 2 which succeeds with N = 2 and then we try to solve 2 = 1+1 which fails.
In other words, that extension into general arithmetic expressions did not work so well for constraints. Before, instantiation errors hid the problem. Now we need to draw somehow the line. And violating commutativity as above is not a nice option2.
The solution is to separate expression variables and integer variables explicitly and insist on fully instantiated expressions.
?- N = 1+1, #N #= 2. error(type_error(integer,1+1),must_be/2)
?- #N #= 2, N = 1+1. false.
?- assertz(clpz:monotonic). true.
?- N #= 2, N = 1+1. error(instantiation_error,instantiation_error(unknown(_102),1)).So now @gusbro's program gets some slight modification:
trivial_nat_space(0) --> [].
trivial_nat_space(N) --> { #N #> 0, #M #= #N-1 }, " ", trivial_nat_space(M).double_quotes
Since you want elegant code, consider to use as a single representation for text: lists of characters. In this manner you avoid all this converting code which will never be elegant. In some systems like Tau, Trealla, and Scryer, double quoted items are chars by default. In SWI proceed like so:
?- L = "ab", L = [_,_]. false.
?- phrase("ab","ab"). false.
?- set_prolog_flag(double_quotes, chars). true.
?- L = "ab", L = [_,_]. L = [a, b].
?- phrase("ab","ab"). true.And with library(double_quotes)
?- L = "ab", L = [_,_]. L = "ab".Grammars
Finally, there is something to note about multi-directional grammar rules in general. Think of a predicate text_ast/2. For one Abstract Syntax Tree, there is an infinity of valid program texts which all differ by trivial paraphernalia like layout text. Therefore, this general relation must not terminate when only the AST is given. So you would need an extra parameter indicating whether the text should be canonical or not.
1 And in fact in DEC10 Prolog the default assumption for variables in arithmetic expressions was the value zero. ISO Prolog has defined instantiation errors for those situations.
2 In SICStus' native library clpfd, the same problem appears with ?- N = 1+1, call(N #= 2). instead.
For your specific example, you can use CLP(fd) to be able to use the DCG in both ways:
trivial_nat_space(0) --> [].
trivial_nat_space(N) --> { N #> 0, M #= N-1 }, " ", trivial_nat_space(M).In the following sample runs I will use backticks (`) to use coded strings:
?- phrase(trivial_nat_space(Anz_Spaces), ` `, []).
Anz_Spaces = 3 ;
false.
?- phrase(trivial_nat_space(3), Spaces, []).
Spaces = [32, 32, 32] ;
false.
?- phrase(trivial_nat_space(N), Spaces, []).
N = 0,
Spaces = [] ;
N = 1,
Spaces = [32] ;
N = 2,
Spaces = [32, 32] ;
N = 3,
Spaces = [32, 32, 32] ... 3 In this case we can avoid explicit arithmetic altogether, and let unification do the work:
:- set_prolog_flag(double_quotes, chars).
spaces --> "".
spaces --> " ", spaces.
n_spaces(N, Spaces) :- length(Spaces, N), phrase(spaces, Spaces).?- n_spaces(N, S).
N = 0,
S = [] ;
N = 1,
S = [' '] ;
N = 2,
S = [' ', ' ']
?- n_spaces(2, S).
S = [' ', ' '].
?- n_spaces(N, " ").
N = 2.We need the double_quotes flag here because (at least in SWI-Prolog 8.0.2) it seems that strings have to be either ground or var, unlike lists which can have variable entries/tails, so I don't think it's possible to use this unification technique with SWI-style strings:
?- string_length(String, Length).
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR: [8] string_length(_10886,_10888)
ERROR: [7] <user>