A survey of neurosymbolic artificial intelligence: foundations, advances, and future trajectories

Tracking #: 933-1956

Flag : Review Received

Authors:

Otto Mättas

Priit Järv

Tanel Tammet

Responsible editor:

Pascal Hitzler

Submission Type:

Survey

Full PDF Version:

nai-paper-933.pdf

Cover Letter:

Dear Editors, Please consider our manuscript, “A survey of neurosymbolic artificial intelligence: foundations, advances, and future trajectories”, for publication in Neurosymbolic Artificial Intelligence as a Survey submission. We believe it fits the journal's scope and will be valuable to its readership because it provides evidence-aware coverage of a fast-growing and fragmented neurosymbolic landscape, with an interface-centric, evidence-tagged framework for comparing approaches and their evaluation trade-offs in deployable hybrid systems. The survey focuses on 2020–2025 (with foundational anchors for historical context) and organizes the literature around four recurring themes that align with the journal: performance, understandability, reliability, and ethics. Beyond narrative coverage, we (i) enforce a strict boundary for what counts as neurosymbolic evidence (explicit symbolic representations with defined operators must participate directly in training/inference), avoiding conflation with tool augmentation; (ii) provide an interface-centric synthesis mapped to system functions (perception, knowledge, reasoning, planning/control, oversight), with representative benchmarks and measures; (iii) consolidate how papers evaluate each theme in practice, including commonly reported measures, benchmarks, and reproducibility signals (e.g., code/data availability and ablations when reported); and (iv) analyze recurring cross-theme system-design pitfalls (e.g., cost vs. guarantees; grounding vs. correctness). Our screening covered 912 consolidated records and includes 319 sources in the main compilation, aiming for balanced coverage and clear entry points for researchers, PhD students, and practitioners. This manuscript is original, has been approved by all authors, and is not under consideration for publication elsewhere. Thank you for your time and consideration. Seize the day, Otto Mättas (Corresponding author)

Approve Decision:

Approved

Tags:

Reviewed

Decision:
Major Revision

Solicited Reviews:

Review #1 submitted on 29/Mar/2026

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

Evaluation criteria for Survey (NeuroSymbolic Journal):

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

Yes, definitely. However the paper needs to be improved.  
(2) How comprehensive and how balanced is the presentation and coverage; in particular the survey should not be a platform to promote the authors' work.   This is quite balanced with a focus on papers in 2020-2025. 
(3) Readability and clarity of the presentation.

The section nummering is downright bad: in the text references to section X, but the sections are not numbered at all.
The readability and clarity can be improved. Below are a number of suggestions to improve the paper.

(4) Importance of the covered material to the broader Neurosymbolic AI community. 
Their analysis in themes (performance, understandability, reliability and ethics), function roles (=perception, knowledge, reasoning, planning/control and oversight) and interaction patterns is interesting for the community. I’m not able to check whether the categorization of all papers is correct, but it seems to me an in-depth study.

=======================================================================

This survey is based on papers from 2020-2025. They discuss neurosymbolic AI in four themes: performance, understandability, reliability and ethics. They describe the interface patterns, taking into account the input/output of neural components, or the use of constraints by the neural components. Furthermore they use five function roles in an AI system: perception, knowledge, reasoning, planning/control and oversight.
This survey is about very specific notion of Neurosymbolic AI, namely the method has at least a neural network component.

Title: A Survey of Neurosymbolic AI: foundations, advance, and future trajectories.
After reading I asked myself whether it is clear from the paper:
What is considered as the “foundations” “advances” “future trajectories”
I think the authors can emphasize this a bit more in the conclusion.

The authors mentioned the following contributions:  (i) Foundational anchors: Q: (Q: these are the 4 themes, the function roles, and the interaction patterns?)
(ii)Mapping methods x system functions (+benchmarks and evaluation measures): Q: thus evaluation of the functions (=perception, knowledge, reasoning, planning/control and oversight.)
(iii) evaluate each theme (evaluation measures, benchmarks, reproducibility signals). (Which table??)
(iv) theme analysis of system-design pitfalls (eg. Cost vs. garantees)
Do the authors mean trade-offs instead of pitfalls? (Described via interface patterns)
(v) future directions / challenges (concrete evaluation criteria and test considerations) Do the authors mean evolution criteria for functions or themes?
My suggestion is to refer to the tables (sections) that belong to the above contributions.

Where do those five function roles in an AI system (perception, knowledge, reasoning, planning/control and oversight) come from?
Two comments on those five chosen function roles: (1) Knowledge seems a strange “type” compared with the other roles, and (2) “explaining” would that not also be a function rol?

How do the authors consider the “structures traces” as a reasoning operator?

Section Background:
(In the classical sense….). Decide whether it is important what is written in brackets. If it is important then remove the brackets (include it in the main text, and make two sentences), or delete (…).

Problem Statement:
Make more explicit that there is a need for benchmarking measures.

Page 6: “AI subdomains” Do the authors mean “the function roles in AI systems” or do they mean a domain like medical or subdomains like NLP, vision.

Page 6: What is the difference between this survey and the surveys from (Bubeck 2023, Rezazadegan 2024)? Do they have a different focus, what are the main takeways from those surveys? (Later I saw the table 11). Might be good to add a sentence here (page 6) about the main difference.

Page 6: What are end-to-end implications?
“A consistent mapping to system functions and evaluation measures”. Those evaluation measures are for the four themes or for the system functions? I would expect for the four themes.

Comments wrt. Table 2:
Page 7: confusing remarks about table 2:
[1] (ii) table 2 is a mapping from methods to system functions (with benchmarks, evaluation measures)
[2] In the text: “The summary matrix of Table 2 provides a consistent thread for mapping advances to goals, placing results within a practical system setting, and clarifying where knowledge and explanations originate.”
[3] Caption table 2: survey theme, system function, interface pattern, evaluation levers.

Where are the benchmarks from [1] in the table?
Wrt [2]. The goals are the functions or the themes? What about the Knowledge and Explanations?
Which interface patterns do you consider? This is not clear.
The “typical methods (example)” are the interface patterns? (It seems they are not always match with the interaction patterns in table 4).

Page 8: Overview of the paper.
Section numbers do not make any sense.

Page 9: Source selection. The authors submit a survey to the Neurosymbolic Journal, but they do not mention the journal as “journals relevant to Neurosymbolic AI and KR”.

Appendix:  Can you motivate why the queries in the table are dependent of the source?

Page 9:
Items were tagged with primary and optional with secondary tag.
Which “tags” have been used. Are they based on the papers, or did you start with a set of tags and then adapted the set based on the papers?

Critical analysis:
(i)Problem abstractions and integration patterns
(ii) evaluation designs, datasets, evaluation measures
(iii) limitations and threats to validity

Q: Do the authors mean with the problem abstractions and the system functions the same?

Then they summarize again what to expect:
System functions - evaluation per theme - reproducibility (including ablations for robustness) - discussion&future work.
New is the reproducibility aspect?

Earlier (overview of the paper page 8), they mentioned:
Theme organization:
Problem framing, representative advances, Evaluation/benchmarks, Limitations, Takeaway.

Table 1 gives the possible interface pattern of the neurosymbolic systems?  I think “neuro —> symbolic” is an interface pattern. However the “cost profile” does not seem to me an interface pattern.
What are the dimensions in table 1? Or are those the tags?

Page 11,12:
A number of citation roles are given: Spine, pattern exemplar, evidence citation, context/background,/postion/opinion.
Q1: where are those citation roles used? Table 1 is of the type “Spine”? If so, add this in the caption of table 1.

Page 12: Last sentence: “This table illustrates the protocol in Table 3” Which table?

Page 13: Can the authors clarify “it does not imply that all listed systems are directly comparable or that any dimension transfers across tasks”.

Note on page 13:
- “this table” —> table 4.
- “Problem setting” = functional role.

Table 4: U expected here the function roles (=perception, knowledge, reasoning, planning/control and oversight.)
Where do those “problem settings” come from?

I would like to suggest to explain the relation between table 1, table 2 and table 4 more explicite.
 page 14: use the same name of the themes, so first section is “Performance”. (Performant AI: …)
Page 14:

Refer in the text to Table 5.

Page 21: “we review works that make AI reasoning more interpretable”. Do the authors mean “AI” or “Neuro”?
Page 27: “In the running example, …” Eh which running example?
Page 29: “In the running example, …” Which running example?

Page 30-34: this part needs to be improved.
Page 30-34: What is the story line of this section, and how fits this in the previous mappings (interfacing patterns, themes, function roles)
Those pages are a bad read. They seem rather separate sentences, often long sentences, and each sentence ends with a reference.
Page 32: “section ??”
Page 32: “same running scenarios used throughout this paper: a manufacturing maintenance copilot (section 8)”?? This first time that copilot is mentioned.
Page 39: I would remove the “Outlook” paragraph. The authors have already a good section Future Directions and Challenges. What do the authors mean with the field can move toward systems that are dependable systems??

Suggestion: including some quantitative insides from your work would be interesting as well.

Review #2 submitted on 13/Apr/2026

By Md Kamruzzaman Sarker
Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average

Content:
Technical Quality of the paper: Good
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes, but see detailed comments

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Poor
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

Content:
The paper makes a good contribution through its goals-first, theme-based organization and the interface-centric coding dimensions, which offer a different view than purely taxonomy-driven predecessors. The evidence-tagging protocol also adds methodological rigor not commonly seen in NeSy surveys. However, the survey's novelty is constrained by the fact that several recent surveys (Colelough & Regli 2025; Michel-Deletie & Sarker 2025; Bhuyan et al. 2024; DeLong et al. 2025) discusses similar themes. While the authors make reasonable differentiation arguments in Table 11, the incremental positioning relative to these concurrent works could be argued more forcefully.

The coverage of neurosymbolic approaches in computer vision beyond VQA settings (e.g., scene understanding, autonomous driving perception) appears sparse relative to their activity in the literature.

The acknowledgment section notes the use of GPT-based tools for manuscript preparation, which is appropriate transparency; however, the bibliography would benefit from more explicit versioning of preprints versus published work, since several arXiv entries appear without journal/conference venue confirmation. More information about the selection process of the arxiv paper would be helpful.

Presentation:

Section numbering is inconsistent and opaque. The roadmap in the Introduction references "Section 9," "Section 14," "Section 21," "Section 25," "Section 28," "Section 30," "Section 34," "Section 36," and "Section 37," but readers of the submitted manuscript cannot verify these numbers as the visible section headers use descriptive titles without matching numbers.

One limitation is that the "Novelty" subsection, while informative, would benefit from a sharper contrast with the two or three most directly competing concurrent surveys rather than the broader field. Comparison with Previous Works and Theories can be written at the begining of the paper, rather in the last.

The Conclusion is well-structured in its four-part thematic recap, but the "What we do not claim" paragraph, while valuable, reads as a defensive addendum rather than an integrated conclusion. Incorporating these caveats more organically throughout the thematic sections would strengthen the overall argumentation.

Table 4 is not easily readable, should be adjusted for better viewing/readability

More issues:
The manuscript contains at least one explicit broken reference ("Section ??") and a pattern of section numbers that doesn't exist in the paper. Perhaps the sections is the page number or someting?

The Ethical AI section is substantially shorter and less technically detailed than the other three themes. While ethics is acknowledged as harder to operationalize empirically, the section would benefit from additional representative systems with interface-centric coding comparable to Tables 6–8.

Overall, this survey makes a meaningful contribution to the NeSy AI literature through its theme-based organization, interface-centric coding framework, and explicit evidence protocol. The core methodological apparatus is sound and the coverage is broad. However, the structural inconsistencies, unresolved cross-references, uneven theme depth, and the underdeveloped Ethics section requires significant revision before the paper is ready for publication.

Review #3 submitted on 17/Mar/2026

By Anonymous User
Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak

Content:
Technical Quality of the paper: Average
Originality of the paper: Yes, but limited
Adequacy of the bibliography: Yes

Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Limited
Organization of the paper: Needs improvement
Level of English: Satisfactory
Overall presentation: Average

Detailed Comments:

The paper is a survey of neurosymbolic AI in general and spans the time between 2020 and 2025. The survey collects and sorts a wide variety of relevant work. However, I think the scope of the paper is too wide: the area of neurosymbolic AI spans such a variety of totally different topics that it is hard to follow the paper, as every of these many different research lines is mentioned only quite shortly.
The thematic organization aims at solving this issue by sorting the approaches into themes and subthemes.
However, this thematic organization leads to the fact that there are many different approaches in each theme. That makes it hard to understand regularities between the approaches.
There, it would be possibly easier to use one of the standard classifications of Nesy-approaches, e.g., by Kautz or others, and do such a thematic organization inside the different categories. Then, it would have been easier to understand commonalities and differences of the approaches.
Additionally, the four themes are not disjoint at all: many of the approaches aim at increasing the result quality in several themes. E.g., an explainable and interpretable approach mostly also improves ethical aspects and allows for safety proofs.
These four categories need a better justification (especially with a focus on why exactly these categories have been chosen), definition and especially need a better discrimination to related themes. Especially the second theme is sometimes called "explainability", in the heading it is called "understandable" and in the text "interpretable". However, these three themes, though occur often together, are different.
Therefore, though the paper presents a detailed collection of relevant papers, either a different sorting scheme or a more detailed categorization would be needed to get a thorough overview of nesy-AI.

Minor issues:
The idea of a running example seems to be good, however, it needs to be mentioned more often, ideally in an example environment and there should be a reference to the introduction of this example everytime it is mentioned. Otherwise, the running example is hard to understand.
The tables are hard to understand, as the columns are too small. Consider Table 6 as an example, there especially in the scope-column, it is not clear where one row ends and the other starts. The "Tag" on the other hand is always "measured" and thus does not need to be added there. The idea of having those tables seems promising, however, it would be helpful to work out commonalities and differences in the table better, as for now, it is hard to gain insights from it that go over the fact that these approaches are different.
Additionally, all references to sections are wrong, as there are no section numbers (see, e.g., the last line of the caption of table 6 but also throughout the paper).

A survey of neurosymbolic artificial intelligence: foundations, advances, and future trajectories

Tracking #: 933-1956

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Recent blog posts

Journal Info

Submit

For Reviewers

Links

Search form

Tracking #: 933-1956

Flag : Review Received

Authors:

Responsible editor:

Submission Type:

Full PDF Version:

Cover Letter:

Approve Decision:

Tags:

Journal Info

Submit

For Reviewers

Links