pith. sign in

arxiv: 2606.02983 · v1 · pith:2ZXTE6Y6new · submitted 2026-06-02 · 💻 cs.CL

A Locally Deployed RAG-Based Academic Advising System for Course Selection

Pith reviewed 2026-06-28 10:59 UTC · model grok-4.3

classification 💻 cs.CL
keywords RAGacademic advisingcourse selectionprerequisiteslocal deploymentlarge language modelssyllabus dataprivacy-preserving
0
0 comments X

The pith

A locally deployed RAG system retrieves structured syllabus data to advise students on course sequences and prerequisites.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a system that combines large language models with retrieval from structured syllabus information to assist students in selecting courses, understanding prerequisites, and creating personalized study plans. It targets the problems of student confusion from information overload and institutions' limited ability to provide individual academic advice. The local deployment keeps all processing on-site to avoid sharing student data externally. A sympathetic reader would care because accurate course sequencing supports better knowledge building while reducing reliance on scarce advising resources.

Core claim

The authors describe a locally deployed RAG-based academic advising system grounded in syllabus information. By combining large language models with retrieval from structured syllabus data, the system supports course selection, prerequisite understanding, and personalized study planning in a privacy-preserving manner.

What carries the argument

The RAG-based academic advising system that retrieves from structured syllabus data to ground LLM responses on course sequences and prerequisites.

If this is right

  • Students receive guidance on correct course sequences based on retrieved prerequisite data.
  • Institutions can extend advising capacity without additional staff time.
  • All advising occurs locally so student queries and data remain private.
  • The system reduces student confusion caused by recognition limits and information overload.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval-plus-LLM pattern could be applied to other structured educational documents such as degree audits or degree maps.
  • Accuracy would likely improve if the system were connected to live enrollment data to flag closed sections.
  • Maintenance of the underlying syllabus database becomes the new bottleneck once the LLM layer is in place.

Load-bearing premise

Structured syllabus data is complete, accurate, and sufficient for the LLM to correctly interpret and chain prerequisites without introducing errors or hallucinations.

What would settle it

Running the system on a real university syllabus and checking whether it recommends any course sequence that violates documented prerequisites or invents nonexistent prerequisite links.

Figures

Figures reproduced from arXiv: 2606.02983 by Feng Li, Yoritaka Iwata.

Figure 2
Figure 2. Figure 2: Example of a metadata-enriched chunk, original language is Japanese. 3.3. Retrieval Strategies To evaluate how retrieval methods behave across different query types, this study compares three strategies including BM25-based lexical retrieval, embedding-based semantic similarity retrieval, and hybrid retrieval. 3.3.1. BM25-based lexical retrieval BM25-based lexical retrieval is configured to evaluate keywor… view at source ↗
Figure 3
Figure 3. Figure 3: Refusal rate of the LLM-only baseline on 75 answerable queries without provided syllabus information. 4.2. Retrieval Performance across Query Types [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Retrieval performance on all answerable queries. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: presents the results for Lexical Match Queries. Because queries contain explicit keywords from one gold chunk, BM25 was expected to perform strongly at Top-1. However, the actual results show that all three retrievers achieved high Recall@K, and dense retrieval achieved the highest Precision@K. This suggests that even for keyword￾overlap queries, dense retrieval can still be competitive in the syllabus cor… view at source ↗
Figure 7
Figure 7. Figure 7: summarizes the results for Multi-evidence Synthesis Queries. Given that hybrid retrieval combines lexical and semantic signals, hybrid retrieval was expected to outperform other retrieval strategies in this setting. However, the results show only a small difference between hybrid and dense retrieval. Both strategies achieved higher Recall@K than BM25, while BM25’s Precision@K decreased more rapidly as Top-… view at source ↗
Figure 8
Figure 8. Figure 8: Generation performance for all answerable queries. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: demonstrates that the decline differed across retrieval strategies. These queries have no supported evidence in the syllabus corpus, so the system is expected to refuse to answer all of them. The results show that Refusal Rate generally decreased as Top-k increased. This indicates that increasing the amount of input context weakened the model’s boundary capability. Although the input context did not provid… view at source ↗
read the original abstract

The correct sequence of courses in the curriculum based on prerequisites between courses is of great importance for students to develop their knowledge and skills holistically. However, students crafting this sequence in isolation frequently struggle with recognition limitations and information overload that leads to confusion. Simultaneously, education institutions encounter difficulties in providing adequate academic advice for the correct sequence due to limited education resources. To address these challenges, we propose a locally deployed RAG-based academic advising system grounded in syllabus information. By combining large language models with retrieval from structured syllabus data, the system is designed to support course selection, prerequisite understanding, and personalized study planning in a privacy-preserving manner.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to address challenges in academic advising by proposing a locally deployed RAG-based system that combines LLMs with retrieval from structured syllabus data to support course selection, prerequisite understanding, and personalized study planning while preserving privacy.

Significance. If implemented and validated, the system could mitigate information overload for students and resource constraints for institutions through a privacy-preserving AI tool. The local deployment aspect is a notable strength for handling sensitive educational data. However, as a design proposal without empirical results, its significance is primarily conceptual at this stage.

major comments (2)
  1. [Abstract] The abstract presents the system as designed to support 'prerequisite understanding' and 'personalized study planning' without any accompanying experiments, metrics, or validation against ground-truth prerequisite chains or student outcomes, rendering the effectiveness claims untested.
  2. No description is provided of verification steps, consistency checks, or quantitative evaluation to ensure that the retrieved syllabus data enables correct prerequisite chaining by the LLM, despite this being central to the system's reliability.
minor comments (1)
  1. Consider adding a dedicated section on potential limitations, such as syllabus data incompleteness, to provide a balanced view of the proposal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need to clarify the scope and reliability aspects of our proposed system. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] The abstract presents the system as designed to support 'prerequisite understanding' and 'personalized study planning' without any accompanying experiments, metrics, or validation against ground-truth prerequisite chains or student outcomes, rendering the effectiveness claims untested.

    Authors: The manuscript presents a conceptual system design proposal rather than an empirical evaluation study. The abstract describes the intended design goals of the RAG-based architecture. We will revise the abstract to explicitly note that this is a proposed system without empirical validation at this stage and add a dedicated section outlining planned evaluation approaches, including potential metrics for prerequisite chaining accuracy. revision: yes

  2. Referee: No description is provided of verification steps, consistency checks, or quantitative evaluation to ensure that the retrieved syllabus data enables correct prerequisite chaining by the LLM, despite this being central to the system's reliability.

    Authors: The current version focuses on the high-level architecture and does not detail verification protocols. We agree this omission weakens the reliability discussion. In revision, we will add a subsection describing design-level verification steps such as consistency checks between retrieved syllabus data and prerequisite structures, along with qualitative review processes for LLM-generated chains, without claiming quantitative results. revision: yes

Circularity Check

0 steps flagged

No circularity: high-level system proposal with no derivations or fitted predictions

full rationale

The paper describes a proposed RAG-based architecture for academic advising using LLMs and structured syllabus data. It contains no equations, no parameter fitting, no predictions of derived quantities, and no self-citations invoked as uniqueness theorems or load-bearing premises. The central description is a straightforward application of existing retrieval-augmented generation techniques to course selection; all claims remain at the level of system design without any reduction of outputs to inputs by construction. This is the expected outcome for a non-mathematical engineering proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As an abstract-only system proposal, the ledger contains only domain assumptions about data quality; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Syllabus information is structured and sufficient to enable accurate prerequisite understanding and course sequencing via retrieval and LLM generation.
    The proposal is explicitly grounded in syllabus information as the core data source for the RAG system.

pith-pipeline@v0.9.1-grok · 5626 in / 1171 out tokens · 25039 ms · 2026-06-28T10:59:26.352301+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 5 canonical work pages

  1. [1]

    Reshmi Mitra, Dana Schwieger, and Robert Lowe. (2023). Managing Graduate Student Advisement Questions during a Season of Explosive Growth: Development and Testing of an Advising Chatbot. Information Systems Education Journal 21, 3(2023), 12–23

  2. [2]

    Ghazala Bilquise and Khaled Shaalan. (2022). AI-based Academic Advising Framework: A Knowledge Management Perspective. International Journal of Advanced Computer Science and Applications 13, 8. https://doi.org/10.14569/ijacsa.2022.0130823

  3. [3]

    Abdulrahman Alkhoori, Mohammad Amin Kuhail, and Abdulla Alkhoori. (2020). UniBud: A Virtual Academic Adviser. In 2020 12th annual undergraduate research conference on applied computing (URC). IEEE, Dubai, United Arab Emirates, 1–4

  4. [4]

    Scott-Clayton, Judith E. (2012). The Shapeless River: Does a Lack of Structure Inhibit Students' Progress at Community Colleges? Community College Research Center. http://ccrc.tc.columbia.edu/

  5. [5]

    S., & Lepper, M

    Iyengar, S. S., & Lepper, M. R. (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79(6), 995–1006. https://doi.org/10.1037/0022-3514.79.6.995

  6. [6]

    The Complete LangChain Handbook: Master RAG, Agents, Vector Search, and LLM Workflows to Create Advanced AI- Powered Applications” Nora Tech

    Nora Tech (2025). The Complete LangChain Handbook: Master RAG, Agents, Vector Search, and LLM Workflows to Create Advanced AI- Powered Applications” Nora Tech

  7. [7]

    C., & Talati, N

    Xia, Y., Kim, J., Chen, Y., Ye, H., Kundu, S., Hao, C. C., & Talati, N. (2024). Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. In 2024 IEEE International Symposium on Workload Characterization (IISWC), Vancouver, Canada, 210–223. https://doi.org/10.1109/IISWC63097.2024.00027

  8. [8]

    Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). RAGAS: Automated Evaluation of Retrieval Augmented Generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, St. Julian’s, Malta, 150–158

  9. [9]

    Tamascelli, M., Bunch, O., Fowler, B., Taeb, M., & Cohen, A. (2025). Academic Advising Chatbot Powered with AI Agent. Proceedings of the 2025 ACM Southeast Conference

  10. [10]

    V., Clarke, C

    Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 758–

  11. [11]

    https://doi.org/10.1145/1571941.1572114

  12. [12]

    Rakuten Group Inc. et al. (2024). RakutenAI-7B: Extending Large Language Models for Japanese. arXiv preprint arXiv:2403.15484. https://doi.org/10.48550/arXiv.2403.15484