pith. sign in

arxiv: 1106.4064 · v2 · pith:YCQTPZ6Vnew · submitted 2011-06-21 · 💻 cs.LG

Algorithmic Programming Language Identification

classification 💻 cs.LG
keywords codelanguageprogrammingabandonedalgorithmicalgorithmicallyamountapproach
0
0 comments X
read the original abstract

Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.