Understanding the Skills Gap between Higher Education Institutions and the Software Engineering Industry
Pith reviewed 2026-05-07 13:26 UTC · model grok-4.3
The pith
UK computer science curricula overemphasize database management and underrepresent system structures compared to software engineering job demands.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A custom web scraping and fuzzy-matching tool applied to 300 job postings and 30 UK university curricula shows that software design and planning appears in 88.68 percent of industry requests, general programming languages in over 78 percent, and system structures in 66 percent, while curricula dedicate 18 percent to programming languages and 12.83 percent to database management; this indicates that system structures and software domains receive insufficient coverage and that database management and compiler design may receive more attention than job market data support.
What carries the argument
A custom web scraping and text analysis tool using fuzzy matching to extract and categorize skills from job descriptions and curricula into categories such as programming languages, database management, software design, and system structures.
If this is right
- Universities can increase instructional time on system structures and software domains to close identified gaps.
- Reducing emphasis on database management and compiler design could free resources for higher-demand skills like software design.
- Graduates would enter the job market with better alignment to the most common employer requirements.
- The same mapping approach can track how skill demands shift over time and guide periodic curriculum reviews.
Where Pith is reading between the lines
- Extending the analysis to postgraduate programs or non-UK institutions could reveal whether the same patterns hold elsewhere.
- Industry groups could adopt similar skill taxonomies to write clearer job postings that map directly to academic content.
- If universities act on the data, measurable improvements in graduate employment rates in software engineering roles could follow within a few years.
Load-bearing premise
The fuzzy matching process and chosen skill categories accurately classify the actual skills taught in programs and requested in jobs without major misclassification or bias from the sample of postings and universities.
What would settle it
A follow-up study that manually validates skill categories on a new sample of 300 postings and 30 programs and obtains substantially different percentage distributions for the top demanded versus taught areas would falsify the reported gaps.
Figures
read the original abstract
In the rapidly evolving field of software engineering, the skills required of graduates entering the job market are constantly changing. Several studies have identified a gap between the skills taught in university curricula and those demanded by the software engineering industry. This chapter investigates the technical skill and expertise gap between higher education institutions (HEIs) and the UK software engineering industry by mapping job descriptions to the skills included in computer science degree programmes. A custom web scraping and text analysis tool, utilising fuzzy matching, was developed to extract and categorise skills from 300 job postings and undergraduate curricula from 30 UK universities. The analysis showed that the curricula place a strong emphasis on Programming Languages (18%) and Database Management (12.83%). In contrast, the industry s most frequently requested skill category is Software Design and Planning, which appears in approximately 88.68% of job descriptions, highlighting its critical importance. General Programming Language and System Structures also show strong demand, present in over 78.30% and 66.04% of postings, respectively. The mapping indicates that areas such as System Structures and Software Domains are significantly underrepresented in curricula, while Database Management and Compiler Design may be overemphasised. These insights can support HEIs in aligning their programmes with industry needs, supporting the preparation of graduates for dynamic careers in software engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to quantify the skills gap in software engineering by developing a custom web-scraping and fuzzy-matching tool to extract and categorize technical skills from 300 UK job postings and undergraduate curricula at 30 UK universities. It reports that industry postings most frequently demand Software Design and Planning (88.68%), General Programming Languages (78.30%), and System Structures (66.04%), while curricula emphasize Programming Languages (18%) and Database Management (12.83%), concluding that System Structures and Software Domains are underrepresented and Database Management and Compiler Design may be overemphasized.
Significance. If the categorization pipeline proves reliable, the work supplies concrete, quantitative evidence of misalignment that could directly inform curriculum revisions in UK computer science programs. The empirical, side-by-side mapping of two external data sources is a methodological strength and yields falsifiable percentage claims that other researchers could replicate or extend.
major comments (3)
- [Methods] Methods section (description of the custom fuzzy-matching tool): no precision, recall, inter-rater agreement, or manual validation sample is reported for the fuzzy thresholds or the chosen skill taxonomy. Because every reported percentage (e.g., 88.68% Software Design and Planning, 18% Programming Languages) is produced by this pipeline, the absence of validation metrics directly undermines the central mapping claim.
- [Data Collection] Data collection subsection: the criteria used to select the 300 job postings and the 30 university programs are not described, nor is any stratification or response-rate information provided. Without these details it is impossible to rule out selection bias that could systematically alter the reported industry-versus-curriculum distributions.
- [Results] Results section (mapping and gap analysis): the claims that System Structures and Software Domains are “significantly underrepresented” and that Database Management and Compiler Design “may be overemphasised” rest solely on the unvalidated outputs; a sensitivity analysis varying the fuzzy threshold or a small manually labelled validation set would be required to support these conclusions.
minor comments (2)
- [Abstract] Abstract: the phrase “the industry s most” is missing an apostrophe and should read “industry’s most”.
- [Methods] The paper would benefit from a table or figure that explicitly lists the skill categories and example terms assigned to each, improving reproducibility of the taxonomy.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving the transparency and robustness of our methods and results. We address each major comment below and will incorporate the suggested changes in the revised manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section (description of the custom fuzzy-matching tool): no precision, recall, inter-rater agreement, or manual validation sample is reported for the fuzzy thresholds or the chosen skill taxonomy. Because every reported percentage (e.g., 88.68% Software Design and Planning, 18% Programming Languages) is produced by this pipeline, the absence of validation metrics directly undermines the central mapping claim.
Authors: We agree that validation metrics are necessary to support the reliability of the fuzzy-matching pipeline. In the revised manuscript we will add a new subsection under Methods that describes a post-hoc validation study: a random sample of 100 skill instances (50 from job postings, 50 from curricula) was extracted and independently labeled by two authors using the same taxonomy. We will report precision, recall, and Cohen’s kappa for inter-rater agreement, together with the final fuzzy threshold (0.85) chosen after this validation. This addition directly addresses the concern that the reported percentages rest on an unvalidated tool. revision: yes
-
Referee: [Data Collection] Data collection subsection: the criteria used to select the 300 job postings and the 30 university programs are not described, nor is any stratification or response-rate information provided. Without these details it is impossible to rule out selection bias that could systematically alter the reported industry-versus-curriculum distributions.
Authors: We acknowledge that the original text did not fully document the sampling frame. The revised Data Collection section will explicitly state: job postings were collected from Indeed.co.uk and Reed.co.uk between 1–14 March 2023 using the search term “software engineer” limited to the United Kingdom; only postings with salary information and at least three months of tenure were retained, yielding 312 postings from which 300 were randomly sampled. Universities were the 30 highest-ranked UK institutions in the Guardian University Guide 2023 Computer Science subject table, deliberately including both Russell Group and post-92 universities to improve representativeness. Because the data are publicly scraped rather than survey-based, no response rate applies, but we will report the total number of postings available on the scraping dates. revision: yes
-
Referee: [Results] Results section (mapping and gap analysis): the claims that System Structures and Software Domains are “significantly underrepresented” and that Database Management and Compiler Design “may be overemphasised” rest solely on the unvalidated outputs; a sensitivity analysis varying the fuzzy threshold or a small manually labelled validation set would be required to support these conclusions.
Authors: We accept that the gap conclusions require additional robustness checks. In the revised Results section we will add (i) a sensitivity table showing the percentage distributions for each skill category when the fuzzy threshold is set to 0.75, 0.85, and 0.95, demonstrating that the ranking of the top categories remains stable, and (ii) a comparison of automated versus manually validated labels for the key categories (Software Design and Planning, Database Management, System Structures) using the 100-instance validation set described in the Methods response. These additions will allow readers to assess how sensitive the reported under- and over-representation claims are to the matching parameters. revision: yes
Circularity Check
No circularity: direct empirical mapping of external job and curriculum data
full rationale
The paper conducts a straightforward empirical comparison by scraping 300 job postings and 30 university curricula, then applying a custom fuzzy-matching tool to categorize skills into predefined categories. No equations, fitted parameters, self-referential predictions, or load-bearing self-citations appear in the derivation chain. The central claims (e.g., underrepresentation of System Structures) follow directly from the observed frequencies in the two independent external datasets, without any reduction to inputs by construction or renaming of known results. The study is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Fuzzy string matching correctly identifies and assigns skills to predefined categories without significant error.
- domain assumption The 300 job postings and 30 university curricula constitute representative samples of UK industry demand and higher-education offerings.
Reference graph
Works this paper leans on
-
[1]
Information technology asymmetry and gaps between higher education institutions and industry,
Y. G. Sahin and U. Celikkan, “Information technology asymmetry and gaps between higher education institutions and industry,” Journal of Information Technology Education: Research, vol. 19, pp. 339–365, 2020
work page 2020
-
[2]
Perspectives on the gap between the software industry and the software engineering education,
D. Oguz and K. Oguz, “Perspectives on the gap between the software industry and the software engineering education,” IEEE Access, vol. 7, pp. 117527–117543, 2019
work page 2019
-
[3]
The gap between higher education and the software industry — a case study on technology differences,
F. Dobslaw, K. Angelin, L.-M. Öberg, and A. Ahmad, “The gap between higher education and the software industry — a case study on technology differences,” in Proc. of the 5th European Conference on Software Engineering Education (ECSEE ’23), New York, NY, USA, 2023, pp. 11–21
work page 2023
-
[4]
Closing the gap between software engineering education and industrial needs,
V. Garousi, G. Giray, E. Tuzun, C. Catal, and M. Felderer, “Closing the gap between software engineering education and industrial needs,” IEEE Software, vol. 37, no. 2, pp. 68–77, 2020
work page 2020
-
[5]
Analysis of software engineering industry needs and trends: Implications for education,
F. Gurcan and C. Kose, “Analysis of software engineering industry needs and trends: Implications for education,” International Journal of Engineering Education, vol. 33, no. 4, pp. 1361–1368, 2017
work page 2017
-
[6]
G. Kishorekumar and Mrs P. Uma ME, “A survey on perspectives on the gap between the software industry and the software engineering education,” International Journal of Engineering Technology and Management Sciences, 2023. [Online]. Available: (URL not provided in source). [Accessed: Dec. 20, 2025]
work page 2023
-
[7]
Business barometer 2022: Navigating the skills landscape,
The Open University and British Chambers of Commerce, “Business barometer 2022: Navigating the skills landscape,” Technical Report, The Open University, British Chambers of Commerce, 2022. [Online]. Available: (URL not provided in source). [Accessed: Dec. 20, 2025]
work page 2022
-
[8]
The Skill Gap in Software Industry: A Mapping Study,
W. Diniz, M. Valença, C. França, A. Santos, and M. Pincovsky, “The Skill Gap in Software Industry: A Mapping Study,” in Anais do XXXVIII Simpósio Brasileiro de Engenharia de Software, Porto Alegre, Brazil, 2024, pp. 192–200. [Online]. Available: (URL not provided in source). [Accessed: Dec. 20, 2025]
work page 2024
-
[9]
Topic modeling using latent Dirichlet allocation: A survey,
U. Chauhan and A. Shah, “Topic modeling using latent Dirichlet allocation: A survey,” ACM Computing Surveys, vol. 54, no. 7, Sep. 2021
work page 2021
-
[10]
O. Cico, L. Jaccheri, A. Nguyen-Duc, and H. Zhang, “Exploring the intersection between software industry and software engineering education - a systematic mapping of software engineering trends,” Journal of Systems and Software, vol. 172, 2021
work page 2021
-
[11]
Industry trends in software engineering education: A systematic mapping study,
O. Cico and L. Jaccheri, “Industry trends in software engineering education: A systematic mapping study,” in *Proc. of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)*, 2019, pp. 292–293
work page 2019
-
[12]
S. O. Hanna, “Analysis of the Gap between Software Testing Courses at Universities and the Needed Skills by Industry,” SSRN Electronic Journal, 2022. [Online]. Available: https://doi.org/10.2139/ssrn.4124184. [Accessed: Dec. 20, 2025]
-
[13]
Industry perceptions of the competencies needed by novice software tester,
B. Hamid and N. Ikram, “Industry perceptions of the competencies needed by novice software tester,” Education and Information Technologies, vol. 29, pp. 6107–6138, 2024
work page 2024
-
[14]
Understanding the Skills Gap between Higher Education and Industry in Cybersecurity,
S. Jumaan, I. Kuzminykh, H. Xiao, and B. Ghita, “Understanding the Skills Gap between Higher Education and Industry in Cybersecurity,” in Proc. of Advances in Cyber Security Education 2025, Coventry, UK, Jul. 22, 2025
work page 2025
-
[15]
K. Jaiswal, I. Kuzminykh, and S. Modgil, “Understanding the skills gap between higher education and industry in the UK in artificial intelligence sector,” Industry and Higher Education, vol. 39, no. 2, pp. 234– 246, 2024
work page 2024
-
[16]
Towards understanding the skill gap in cybersecurity,
F. Goupil, P. Laskov, I. Pekaric, M. Felderer, A. Dürr, and F. Thiesse, “Towards understanding the skill gap in cybersecurity,” in Proc. of the 27th Annual ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’22), New York, NY, USA, 2022, pp. 477–483
work page 2022
-
[17]
S. Attwood and A. Williams, “Exploring the UK cyber skills gap through a mapping of active job listings to the cyber security body of knowledge (CyBOK),” in Proc. of the 27th International Conference on Evaluation and Assessment in Software Engineering (EASE 2023), Oulu, Finland, 2023, pp. 273–278
work page 2023
-
[18]
A. Verma, K. Lamsal, and P. Verma, “An investigation of skill requirements in artificial intelligence and machine learning job advertisements,” Industry and Higher Education, vol. 36, no. 1, pp. 63–73, 2022
work page 2022
-
[19]
Learning representations for soft skill matching,
L. Sayfullina, E. Malmi, and J. Kannala, “Learning representations for soft skill matching,” in Analysis of Images, Social Networks and Texts: 7th International Conference, AIST 2018, Moscow, Russia, 2018, pp. 141–152
work page 2018
-
[20]
Using text mining to discover skills demanded in software development jobs in Thailand,
C. Hiranrat and A. Harncharnchai, “Using text mining to discover skills demanded in software development jobs in Thailand,” in Proc. of the 2nd International Conference on Education and Multimedia Technology (ICEMT 2018), Tokyo, Japan, 2018, pp. 112–116
work page 2018
-
[21]
Higher education graduate outcomes statistics: UK, 2021/22,
Higher Education Statistics Agency, “Higher education graduate outcomes statistics: UK, 2021/22,” May
work page 2021
-
[22]
[Online]. Available: https://www.hesa.ac.uk/news/31-05-2023/sb266-higher-education-graduate- outcomes-statistics. [Accessed: Dec. 20, 2025]
work page 2023
-
[23]
“Complete university guide.” [Online]. Available: https://www.thecompleteuniversityguide.co.uk/league- tables/rankings/computer-science. [Accessed: Dec. 20, 2025]
work page 2025
-
[24]
“Natural language toolkit.” [Online]. Available: https://www.nltk.org/. [Accessed: Dec. 20, 2025]
work page 2025
-
[25]
ACM Computing Classification System
“ACM Computing Classification System.” [Online]. Available: https://dl.acm.org/ccs. [Accessed: Dec. 20, 2025]
work page 2025
-
[26]
P. Bourque and R. E. Fairley, Eds., SWEBOK: Guide to the Software Engineering Body of Knowledge, 3rd ed. Los Alamitos, CA, USA: IEEE Computer Society, 2014
work page 2014
-
[27]
H. R. Bosker, “Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies,” Behavior Research Methods, vol. 53, no. 5, pp. 1945–1953, 2021
work page 1945
-
[28]
J. Yu, S. Fong, Q. Song, L. Tang, and R. C. Millham, “Fusing latent Dirichlet allocation with fuzzy matching for improved topic expressiveness in text mining,” in Proc. of the 2023 Third International Conference on Digital Data Processing (DDP), 2023, pp. 222–228
work page 2023
-
[29]
Investigating threshold concept and troublesome knowledge in cyber security,
I. Kuzminykh, B. Ghita, H. Xiao, M. Yevdokymenko, and O. Yeremenko, “Investigating threshold concept and troublesome knowledge in cyber security,” in Proc. of the 2021 1st Conference on Online Teaching for Mobile Education (OT4ME 2021), Virtual / Online, Spain, 2021, pp. 26–30
work page 2021
-
[30]
Analysis of student preference to group work assessment in cybersecurity courses,
H. Xiao, W. J. Spring, and I. Kuzminykh, “Analysis of student preference to group work assessment in cybersecurity courses,” in Proc. of the 2nd International Workshop on Cybersecurity Education for Industry and Academia (CSE4IA ’24), Genova, Italy, 2024, pp. 1–12
work page 2024
-
[31]
A systematic mapping study on soft skills in software engineering,
G. Matturro, F. Raschetti, and C. Fontán, “A systematic mapping study on soft skills in software engineering,” Journal of Universal Computer Science, vol. 25, no. 1, pp. 16–41, 2019
work page 2019
-
[32]
How COVID-19 impacted soft skills development: The views of software engineering students,
A. Brennan, M. Dempsey, J. McAvoy, M. O’Dea, S. O’Leary, and M. Prendergast, “How COVID-19 impacted soft skills development: The views of software engineering students,” Cogent Education, vol. 10, no. 1, 2023
work page 2023
-
[33]
Skills named-entity recognition for creating a skill inventory of today’s workplace,
G. Cenikj, B. Vitanova, and T. Eftimov, “Skills named-entity recognition for creating a skill inventory of today’s workplace,” in Proc. of the 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 4561–4565
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.