By David Tena Cucala
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Weak
Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Bad
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Weak
Detailed Comments:
This paper introduces a novel neuro-symbolic method for a prediction task based on a simplification of the Raven’s Progressive Matrices (RPM) intelligence test. The symbolic descriptions of eight geometric figures are provided (of varying shape, colour, etc) and the goal is to predict the ninth figure by guessing the underlying pattern. The proposed method uses a differentiable rule-learning approach, where the attributes of the known figures are embedded in a high-dimensional vector space, and the system learns a vector transformation function that produces the prediction; where this transformation can be interpreted as a rule. The authors show that the prediction accuracy of the system surpasses state-of-the-art methods based on large language models.
Overall, the article is written in good English, well-structured, and it appears to be technically sound. In my opinion, however, there are three important weaknesses.
First, the presentation of the paper, is often lacking in detail and intuitions, which makes it difficult to fully comprehend and evaluate the proposed approach. For example, the space of rules is defined in equations (2) and (3), but the definitions of x_i, o_j, are only given informally in lines 21-28 of page 8, and through an example – I found this insufficient to understand these definitions. Furthermore, there is no intuition as to why equation (2) is chosen as the general shape of the rules. It would be important to expand on this and discuss the expressive power of this language; in particular, to know whether it is expressive enough to capture the rules used to generate the dataset. For more examples of parts of the paper that could be clarified, please see the Detailed Comments below.
Second, the contribution of the work seems comparatively small. The abstract suggests that the main contribution of the paper is an experimental comparison between the neuro-symbolic approach ARLC and large language model-based approaches to the task described above. Reading the Aims and Scope section of the Neurosymbolic Artificial Intelligence journal, it remains uncertain whether experiment reports are regarded as a sufficient contribution.
Third, the motivation for comparing ARLC with large language models is unclear. In this context, it would appear that the space of rules to generate the patterns is known (even if the rules themselves aren’t), so why not apply some standard Inductive Logic Programming (ILP) approach? In fact, it is surprising that ILP is not mentioned at all in the paper, and that no ILP systems are used in the experiments, even though it would appear that ILP methods are the most naturally suited to the given task.
My overall recommendation, assuming the journal is amenable to publishing work focused on experimental reporting, would be to accept a revised version of the paper that provides additional clarity and intuitions (please see list below), and develops the motivation further.
Detailed Comments:
--I am also confused about what “constellation” means. In particular, what are 2x2, 3x3, and center constellations? Without explaining this, the task description is quite difficult to follow. It is also hard to evaluate the article’s decision to focus on the center evaluation only.
--I am missing a discussion about whether the original goals of the test are lost when the images are translated to symbolic descriptions. The symbolic description could be seen as a simplification of the task, since it describes explicitly the attributes of the figures that are relevant to the task (in contrast to the original task, where no such list of attributes is provided explicitly; and in fact figuring which are the relevant attributes in a non-trivial reasoning task)
--To make the paper self-contained and understand the prediction task, it would be important to explain the four rules used to generate the figures in the right column of the matrices. The text simply names them as “constant”, “progression”, “arithmetic”, and “distribute three”. One can later deduce them from the definition of the 3x10 extension of the RAVEN problem, but it would be very helpful to describe them explicitly.
--“Page 3, line 31: what is an “attribute bisection tree”. Also, what is the “context matrix generation”?
--The number of attributes should be clarified, in addition to the number of possible values per attribute (dubbed “m” in the text, if I understand correctly).
--I do not understand why the RAVEN dataset is extended with additional columns. The article mentions that this is done to test “scalability” of the approach, but I am not sure whether this is a relevant concept. In this task, having more columns means having a larger number of examples, which could actually help discriminate better between patterns used to generate the sequence.
--My lack of familiarity with “vector-symbolic architectures” may prevent a reader from understanding well the approach. It would be helpful to explain what are the binding, unbinding, and bundling operations.
--It would also be helpful to give some intuition for why encoding into VSA is a good idea. Why use blocks?
--Why have two blocks of six coefficients in Equation (2)?
--Section 4.3 talks about “all rules” – but how is this quantified?