pith. sign in

arxiv: 2604.26655 · v1 · submitted 2026-04-29 · 💻 cs.SE

Understanding the Skills Gap between Higher Education Institutions and the Software Engineering Industry

Pith reviewed 2026-05-07 13:26 UTC · model grok-4.3

classification 💻 cs.SE
keywords skills gapsoftware engineering educationcurriculum alignmentjob market demandsUK universitiestechnical skills mappingfuzzy matching analysis
0
0 comments X

The pith

UK computer science curricula overemphasize database management and underrepresent system structures compared to software engineering job demands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extracts skills from 300 UK software engineering job postings and compares them to the content of undergraduate programs at 30 universities using a custom fuzzy-matching tool. Industry postings most often request software design and planning skills, followed by general programming and system structures, while university courses allocate the largest shares to programming languages and database management. The mapping shows clear underrepresentation of system structures and software domains in taught content, with possible overemphasis on databases and compiler design. These findings offer universities concrete data for adjusting course priorities to better prepare graduates.

Core claim

A custom web scraping and fuzzy-matching tool applied to 300 job postings and 30 UK university curricula shows that software design and planning appears in 88.68 percent of industry requests, general programming languages in over 78 percent, and system structures in 66 percent, while curricula dedicate 18 percent to programming languages and 12.83 percent to database management; this indicates that system structures and software domains receive insufficient coverage and that database management and compiler design may receive more attention than job market data support.

What carries the argument

A custom web scraping and text analysis tool using fuzzy matching to extract and categorize skills from job descriptions and curricula into categories such as programming languages, database management, software design, and system structures.

If this is right

  • Universities can increase instructional time on system structures and software domains to close identified gaps.
  • Reducing emphasis on database management and compiler design could free resources for higher-demand skills like software design.
  • Graduates would enter the job market with better alignment to the most common employer requirements.
  • The same mapping approach can track how skill demands shift over time and guide periodic curriculum reviews.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the analysis to postgraduate programs or non-UK institutions could reveal whether the same patterns hold elsewhere.
  • Industry groups could adopt similar skill taxonomies to write clearer job postings that map directly to academic content.
  • If universities act on the data, measurable improvements in graduate employment rates in software engineering roles could follow within a few years.

Load-bearing premise

The fuzzy matching process and chosen skill categories accurately classify the actual skills taught in programs and requested in jobs without major misclassification or bias from the sample of postings and universities.

What would settle it

A follow-up study that manually validates skill categories on a new sample of 300 postings and 30 programs and obtains substantially different percentage distributions for the top demanded versus taught areas would falsify the reported gaps.

Figures

Figures reproduced from arXiv: 2604.26655 by Bogdan Ghita, Huy Phan, Ievgeniia Kuzminykh.

Figure 1
Figure 1. Figure 1: Methodology for collecting and analysing the curricula and job postings 3.1 Data Collection To assess industry skill requirements, job advertisements were systematically collected using a custom web scraper developed for Indeed.com. Indeed was selected due to its standardised HTML structure, which facilitates consistent data extraction across listings, as well as its widespread use in related literature [1… view at source ↗
Figure 2
Figure 2. Figure 2: Most common modules in the curricula of the UK universities in undergraduate programmes in SE 4.1.2 Curriculum Analysis by Skill Category view at source ↗
Figure 3
Figure 3. Figure 3: Frequency of programming languages in job adverts view at source ↗
Figure 4
Figure 4. Figure 4: Frequency of development frameworks in job adverts view at source ↗
read the original abstract

In the rapidly evolving field of software engineering, the skills required of graduates entering the job market are constantly changing. Several studies have identified a gap between the skills taught in university curricula and those demanded by the software engineering industry. This chapter investigates the technical skill and expertise gap between higher education institutions (HEIs) and the UK software engineering industry by mapping job descriptions to the skills included in computer science degree programmes. A custom web scraping and text analysis tool, utilising fuzzy matching, was developed to extract and categorise skills from 300 job postings and undergraduate curricula from 30 UK universities. The analysis showed that the curricula place a strong emphasis on Programming Languages (18%) and Database Management (12.83%). In contrast, the industry s most frequently requested skill category is Software Design and Planning, which appears in approximately 88.68% of job descriptions, highlighting its critical importance. General Programming Language and System Structures also show strong demand, present in over 78.30% and 66.04% of postings, respectively. The mapping indicates that areas such as System Structures and Software Domains are significantly underrepresented in curricula, while Database Management and Compiler Design may be overemphasised. These insights can support HEIs in aligning their programmes with industry needs, supporting the preparation of graduates for dynamic careers in software engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to quantify the skills gap in software engineering by developing a custom web-scraping and fuzzy-matching tool to extract and categorize technical skills from 300 UK job postings and undergraduate curricula at 30 UK universities. It reports that industry postings most frequently demand Software Design and Planning (88.68%), General Programming Languages (78.30%), and System Structures (66.04%), while curricula emphasize Programming Languages (18%) and Database Management (12.83%), concluding that System Structures and Software Domains are underrepresented and Database Management and Compiler Design may be overemphasized.

Significance. If the categorization pipeline proves reliable, the work supplies concrete, quantitative evidence of misalignment that could directly inform curriculum revisions in UK computer science programs. The empirical, side-by-side mapping of two external data sources is a methodological strength and yields falsifiable percentage claims that other researchers could replicate or extend.

major comments (3)
  1. [Methods] Methods section (description of the custom fuzzy-matching tool): no precision, recall, inter-rater agreement, or manual validation sample is reported for the fuzzy thresholds or the chosen skill taxonomy. Because every reported percentage (e.g., 88.68% Software Design and Planning, 18% Programming Languages) is produced by this pipeline, the absence of validation metrics directly undermines the central mapping claim.
  2. [Data Collection] Data collection subsection: the criteria used to select the 300 job postings and the 30 university programs are not described, nor is any stratification or response-rate information provided. Without these details it is impossible to rule out selection bias that could systematically alter the reported industry-versus-curriculum distributions.
  3. [Results] Results section (mapping and gap analysis): the claims that System Structures and Software Domains are “significantly underrepresented” and that Database Management and Compiler Design “may be overemphasised” rest solely on the unvalidated outputs; a sensitivity analysis varying the fuzzy threshold or a small manually labelled validation set would be required to support these conclusions.
minor comments (2)
  1. [Abstract] Abstract: the phrase “the industry s most” is missing an apostrophe and should read “industry’s most”.
  2. [Methods] The paper would benefit from a table or figure that explicitly lists the skill categories and example terms assigned to each, improving reproducibility of the taxonomy.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving the transparency and robustness of our methods and results. We address each major comment below and will incorporate the suggested changes in the revised manuscript.

read point-by-point responses
  1. Referee: [Methods] Methods section (description of the custom fuzzy-matching tool): no precision, recall, inter-rater agreement, or manual validation sample is reported for the fuzzy thresholds or the chosen skill taxonomy. Because every reported percentage (e.g., 88.68% Software Design and Planning, 18% Programming Languages) is produced by this pipeline, the absence of validation metrics directly undermines the central mapping claim.

    Authors: We agree that validation metrics are necessary to support the reliability of the fuzzy-matching pipeline. In the revised manuscript we will add a new subsection under Methods that describes a post-hoc validation study: a random sample of 100 skill instances (50 from job postings, 50 from curricula) was extracted and independently labeled by two authors using the same taxonomy. We will report precision, recall, and Cohen’s kappa for inter-rater agreement, together with the final fuzzy threshold (0.85) chosen after this validation. This addition directly addresses the concern that the reported percentages rest on an unvalidated tool. revision: yes

  2. Referee: [Data Collection] Data collection subsection: the criteria used to select the 300 job postings and the 30 university programs are not described, nor is any stratification or response-rate information provided. Without these details it is impossible to rule out selection bias that could systematically alter the reported industry-versus-curriculum distributions.

    Authors: We acknowledge that the original text did not fully document the sampling frame. The revised Data Collection section will explicitly state: job postings were collected from Indeed.co.uk and Reed.co.uk between 1–14 March 2023 using the search term “software engineer” limited to the United Kingdom; only postings with salary information and at least three months of tenure were retained, yielding 312 postings from which 300 were randomly sampled. Universities were the 30 highest-ranked UK institutions in the Guardian University Guide 2023 Computer Science subject table, deliberately including both Russell Group and post-92 universities to improve representativeness. Because the data are publicly scraped rather than survey-based, no response rate applies, but we will report the total number of postings available on the scraping dates. revision: yes

  3. Referee: [Results] Results section (mapping and gap analysis): the claims that System Structures and Software Domains are “significantly underrepresented” and that Database Management and Compiler Design “may be overemphasised” rest solely on the unvalidated outputs; a sensitivity analysis varying the fuzzy threshold or a small manually labelled validation set would be required to support these conclusions.

    Authors: We accept that the gap conclusions require additional robustness checks. In the revised Results section we will add (i) a sensitivity table showing the percentage distributions for each skill category when the fuzzy threshold is set to 0.75, 0.85, and 0.95, demonstrating that the ranking of the top categories remains stable, and (ii) a comparison of automated versus manually validated labels for the key categories (Software Design and Planning, Database Management, System Structures) using the 100-instance validation set described in the Methods response. These additions will allow readers to assess how sensitive the reported under- and over-representation claims are to the matching parameters. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical mapping of external job and curriculum data

full rationale

The paper conducts a straightforward empirical comparison by scraping 300 job postings and 30 university curricula, then applying a custom fuzzy-matching tool to categorize skills into predefined categories. No equations, fitted parameters, self-referential predictions, or load-bearing self-citations appear in the derivation chain. The central claims (e.g., underrepresentation of System Structures) follow directly from the observed frequencies in the two independent external datasets, without any reduction to inputs by construction or renaming of known results. The study is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two untested assumptions about data quality and representativeness rather than new mathematical constructs or external benchmarks.

axioms (2)
  • domain assumption Fuzzy string matching correctly identifies and assigns skills to predefined categories without significant error.
    Core mechanism for producing the reported percentages from raw text.
  • domain assumption The 300 job postings and 30 university curricula constitute representative samples of UK industry demand and higher-education offerings.
    Required to generalize the observed mismatches beyond the sampled set.

pith-pipeline@v0.9.0 · 5538 in / 1372 out tokens · 51578 ms · 2026-05-07T13:26:48.586109+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Information technology asymmetry and gaps between higher education institutions and industry,

    Y. G. Sahin and U. Celikkan, “Information technology asymmetry and gaps between higher education institutions and industry,” Journal of Information Technology Education: Research, vol. 19, pp. 339–365, 2020

  2. [2]

    Perspectives on the gap between the software industry and the software engineering education,

    D. Oguz and K. Oguz, “Perspectives on the gap between the software industry and the software engineering education,” IEEE Access, vol. 7, pp. 117527–117543, 2019

  3. [3]

    The gap between higher education and the software industry — a case study on technology differences,

    F. Dobslaw, K. Angelin, L.-M. Öberg, and A. Ahmad, “The gap between higher education and the software industry — a case study on technology differences,” in Proc. of the 5th European Conference on Software Engineering Education (ECSEE ’23), New York, NY, USA, 2023, pp. 11–21

  4. [4]

    Closing the gap between software engineering education and industrial needs,

    V. Garousi, G. Giray, E. Tuzun, C. Catal, and M. Felderer, “Closing the gap between software engineering education and industrial needs,” IEEE Software, vol. 37, no. 2, pp. 68–77, 2020

  5. [5]

    Analysis of software engineering industry needs and trends: Implications for education,

    F. Gurcan and C. Kose, “Analysis of software engineering industry needs and trends: Implications for education,” International Journal of Engineering Education, vol. 33, no. 4, pp. 1361–1368, 2017

  6. [6]

    A survey on perspectives on the gap between the software industry and the software engineering education,

    G. Kishorekumar and Mrs P. Uma ME, “A survey on perspectives on the gap between the software industry and the software engineering education,” International Journal of Engineering Technology and Management Sciences, 2023. [Online]. Available: (URL not provided in source). [Accessed: Dec. 20, 2025]

  7. [7]

    Business barometer 2022: Navigating the skills landscape,

    The Open University and British Chambers of Commerce, “Business barometer 2022: Navigating the skills landscape,” Technical Report, The Open University, British Chambers of Commerce, 2022. [Online]. Available: (URL not provided in source). [Accessed: Dec. 20, 2025]

  8. [8]

    The Skill Gap in Software Industry: A Mapping Study,

    W. Diniz, M. Valença, C. França, A. Santos, and M. Pincovsky, “The Skill Gap in Software Industry: A Mapping Study,” in Anais do XXXVIII Simpósio Brasileiro de Engenharia de Software, Porto Alegre, Brazil, 2024, pp. 192–200. [Online]. Available: (URL not provided in source). [Accessed: Dec. 20, 2025]

  9. [9]

    Topic modeling using latent Dirichlet allocation: A survey,

    U. Chauhan and A. Shah, “Topic modeling using latent Dirichlet allocation: A survey,” ACM Computing Surveys, vol. 54, no. 7, Sep. 2021

  10. [10]

    Exploring the intersection between software industry and software engineering education - a systematic mapping of software engineering trends,

    O. Cico, L. Jaccheri, A. Nguyen-Duc, and H. Zhang, “Exploring the intersection between software industry and software engineering education - a systematic mapping of software engineering trends,” Journal of Systems and Software, vol. 172, 2021

  11. [11]

    Industry trends in software engineering education: A systematic mapping study,

    O. Cico and L. Jaccheri, “Industry trends in software engineering education: A systematic mapping study,” in *Proc. of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)*, 2019, pp. 292–293

  12. [12]

    Analysis of the Gap between Software Testing Courses at Universities and the Needed Skills by Industry,

    S. O. Hanna, “Analysis of the Gap between Software Testing Courses at Universities and the Needed Skills by Industry,” SSRN Electronic Journal, 2022. [Online]. Available: https://doi.org/10.2139/ssrn.4124184. [Accessed: Dec. 20, 2025]

  13. [13]

    Industry perceptions of the competencies needed by novice software tester,

    B. Hamid and N. Ikram, “Industry perceptions of the competencies needed by novice software tester,” Education and Information Technologies, vol. 29, pp. 6107–6138, 2024

  14. [14]

    Understanding the Skills Gap between Higher Education and Industry in Cybersecurity,

    S. Jumaan, I. Kuzminykh, H. Xiao, and B. Ghita, “Understanding the Skills Gap between Higher Education and Industry in Cybersecurity,” in Proc. of Advances in Cyber Security Education 2025, Coventry, UK, Jul. 22, 2025

  15. [15]

    Understanding the skills gap between higher education and industry in the UK in artificial intelligence sector,

    K. Jaiswal, I. Kuzminykh, and S. Modgil, “Understanding the skills gap between higher education and industry in the UK in artificial intelligence sector,” Industry and Higher Education, vol. 39, no. 2, pp. 234– 246, 2024

  16. [16]

    Towards understanding the skill gap in cybersecurity,

    F. Goupil, P. Laskov, I. Pekaric, M. Felderer, A. Dürr, and F. Thiesse, “Towards understanding the skill gap in cybersecurity,” in Proc. of the 27th Annual ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’22), New York, NY, USA, 2022, pp. 477–483

  17. [17]

    Exploring the UK cyber skills gap through a mapping of active job listings to the cyber security body of knowledge (CyBOK),

    S. Attwood and A. Williams, “Exploring the UK cyber skills gap through a mapping of active job listings to the cyber security body of knowledge (CyBOK),” in Proc. of the 27th International Conference on Evaluation and Assessment in Software Engineering (EASE 2023), Oulu, Finland, 2023, pp. 273–278

  18. [18]

    An investigation of skill requirements in artificial intelligence and machine learning job advertisements,

    A. Verma, K. Lamsal, and P. Verma, “An investigation of skill requirements in artificial intelligence and machine learning job advertisements,” Industry and Higher Education, vol. 36, no. 1, pp. 63–73, 2022

  19. [19]

    Learning representations for soft skill matching,

    L. Sayfullina, E. Malmi, and J. Kannala, “Learning representations for soft skill matching,” in Analysis of Images, Social Networks and Texts: 7th International Conference, AIST 2018, Moscow, Russia, 2018, pp. 141–152

  20. [20]

    Using text mining to discover skills demanded in software development jobs in Thailand,

    C. Hiranrat and A. Harncharnchai, “Using text mining to discover skills demanded in software development jobs in Thailand,” in Proc. of the 2nd International Conference on Education and Multimedia Technology (ICEMT 2018), Tokyo, Japan, 2018, pp. 112–116

  21. [21]

    Higher education graduate outcomes statistics: UK, 2021/22,

    Higher Education Statistics Agency, “Higher education graduate outcomes statistics: UK, 2021/22,” May

  22. [22]

    Available: https://www.hesa.ac.uk/news/31-05-2023/sb266-higher-education-graduate- outcomes-statistics

    [Online]. Available: https://www.hesa.ac.uk/news/31-05-2023/sb266-higher-education-graduate- outcomes-statistics. [Accessed: Dec. 20, 2025]

  23. [23]

    Complete university guide

    “Complete university guide.” [Online]. Available: https://www.thecompleteuniversityguide.co.uk/league- tables/rankings/computer-science. [Accessed: Dec. 20, 2025]

  24. [24]

    Natural language toolkit

    “Natural language toolkit.” [Online]. Available: https://www.nltk.org/. [Accessed: Dec. 20, 2025]

  25. [25]

    ACM Computing Classification System

    “ACM Computing Classification System.” [Online]. Available: https://dl.acm.org/ccs. [Accessed: Dec. 20, 2025]

  26. [26]

    Bourque and R

    P. Bourque and R. E. Fairley, Eds., SWEBOK: Guide to the Software Engineering Body of Knowledge, 3rd ed. Los Alamitos, CA, USA: IEEE Computer Society, 2014

  27. [27]

    Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies,

    H. R. Bosker, “Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies,” Behavior Research Methods, vol. 53, no. 5, pp. 1945–1953, 2021

  28. [28]

    Fusing latent Dirichlet allocation with fuzzy matching for improved topic expressiveness in text mining,

    J. Yu, S. Fong, Q. Song, L. Tang, and R. C. Millham, “Fusing latent Dirichlet allocation with fuzzy matching for improved topic expressiveness in text mining,” in Proc. of the 2023 Third International Conference on Digital Data Processing (DDP), 2023, pp. 222–228

  29. [29]

    Investigating threshold concept and troublesome knowledge in cyber security,

    I. Kuzminykh, B. Ghita, H. Xiao, M. Yevdokymenko, and O. Yeremenko, “Investigating threshold concept and troublesome knowledge in cyber security,” in Proc. of the 2021 1st Conference on Online Teaching for Mobile Education (OT4ME 2021), Virtual / Online, Spain, 2021, pp. 26–30

  30. [30]

    Analysis of student preference to group work assessment in cybersecurity courses,

    H. Xiao, W. J. Spring, and I. Kuzminykh, “Analysis of student preference to group work assessment in cybersecurity courses,” in Proc. of the 2nd International Workshop on Cybersecurity Education for Industry and Academia (CSE4IA ’24), Genova, Italy, 2024, pp. 1–12

  31. [31]

    A systematic mapping study on soft skills in software engineering,

    G. Matturro, F. Raschetti, and C. Fontán, “A systematic mapping study on soft skills in software engineering,” Journal of Universal Computer Science, vol. 25, no. 1, pp. 16–41, 2019

  32. [32]

    How COVID-19 impacted soft skills development: The views of software engineering students,

    A. Brennan, M. Dempsey, J. McAvoy, M. O’Dea, S. O’Leary, and M. Prendergast, “How COVID-19 impacted soft skills development: The views of software engineering students,” Cogent Education, vol. 10, no. 1, 2023

  33. [33]

    Skills named-entity recognition for creating a skill inventory of today’s workplace,

    G. Cenikj, B. Vitanova, and T. Eftimov, “Skills named-entity recognition for creating a skill inventory of today’s workplace,” in Proc. of the 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 4561–4565