By Lia Morra
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Good
Content:
Technical Quality of the paper: Excellent
Originality of the paper: Yes
Adequacy of the bibliography: Yes
Presentation:
Adequacy of the abstract: Yes
Introduction: background and motivation: Good
Organization of the paper: Satisfactory
Level of English: Satisfactory
Overall presentation: Good
Detailed Comments:
The manuscript was substantially extended and improved following revision. The pipeline is now much clearer in all its steps, and the inclusion of a second task/dataset supports the applicability and generalizability of the proposed approach.
Still, I think a clarification is needed pertaining to the selection of the rules used for model mending. At page 17 lines 20-26 the methodology states that “After extracting the rules, we analysed them to identify underlying patterns. […] Therefore, we simplified all these rules into more general candidate hypotheses, such as “an image is difficult for the object detection model if it contains a dirtbike facing north”. This sentence suggests that the output of the ILP systems is manually analyzed and combined to obtain the final rules for model mending, and thus qualifies the proposed approach as a human-in-the-loop system. However, both the overview provided in Fig. 2 and the introduction of the rare slice hypothesis threshold (page 15, L22-23) suggests that the proposed pipeline is completely automatic. I do not see any problem with having a human inspection, but the paper should clarify whether a human in the loop is always required, if human intervention was needed to obtain the results presented in the paper, or conversely if the system is in principle capable of operating in an unsupervised fashion. If a human in the loop is required for model mending, I believe Figure 2 should be amended accordingly.
Either way, I think the paper has merit and the authors have extensively answered reviewers’ suggestions, and thus recommend acceptance.