arxiv: 2002.08155 · v4 · submitted 2020-02-19 · 💻 cs.CL · cs.PL

Recognition: no theorem link

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng , Daya Guo , Duyu Tang , Nan Duan , Xiaocheng Feng , Ming Gong , Linjun Shou , Bing Qin

show 3 more authors

Ting Liu Daxin Jiang Ming Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:59 UTC · model grok-4.3

classification 💻 cs.CL cs.PL

keywords CodeBERTpre-trained modelbimodalprogramming languagenatural languagecode searchdocumentation generationreplaced token detection

0 comments

The pith

CodeBERT is a pre-trained bimodal model for natural language and code that uses replaced token detection to learn transferable representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CodeBERT as a Transformer model pre-trained jointly on natural language text and programming language code. It employs a hybrid training objective centered on replaced token detection, which draws on both paired NL-PL examples and separate unimodal data to learn better alternatives during pre-training. After fine-tuning, the resulting representations deliver state-of-the-art results on natural-language code search and automatic code documentation generation. The same model also outperforms prior pre-trained models on zero-shot probing tasks that test its grasp of NL-PL relationships.

Core claim

CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language code search and code documentation generation. The model is built on a Transformer architecture and trained with a hybrid objective that incorporates replaced token detection on plausible alternatives sampled from generators. This setup lets the training process use both bimodal NL-PL pairs, which supply input tokens, and unimodal data, which improves the generators themselves. Fine-tuning on the target tasks produces state-of-the-art performance, and zero-shot evaluation on an NL-PL probing dataset shows gains over earlier pre-trained models.

What carries the argument

hybrid objective function that combines replaced token detection with bimodal NL-PL pairs and unimodal data to learn general-purpose representations

If this is right

Fine-tuning CodeBERT raises accuracy on natural-language queries that retrieve relevant code snippets.
The same fine-tuned model improves the quality of generated natural-language documentation for given code.
Fixed CodeBERT parameters already yield stronger zero-shot performance on tasks that probe alignment between code and text.
The learned representations are intended to support a range of additional NL-PL applications beyond the two evaluated tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pre-training recipe could be applied to other cross-modal pairs such as code and visual diagrams.
Probing experiments hint that the model encodes finer semantic correspondences than earlier unimodal or separately trained encoders.
Extending the unimodal data to additional programming languages would likely broaden the model's utility for polyglot codebases.

Load-bearing premise

The hybrid pre-training objective produces representations general enough to transfer effectively when the model is later fine-tuned on downstream tasks.

What would settle it

A controlled experiment in which CodeBERT, after identical fine-tuning, fails to exceed the best prior models on the standard code-search and documentation-generation benchmarks would falsify the central claim.

read the original abstract

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CodeBERT extends BERT pretraining to bimodal PL-NL data via a hybrid replaced-token objective and reports downstream gains, but the abstract supplies no scores or ablations so the SOTA claim is hard to weigh.

read the letter

CodeBERT is a Transformer pre-trained on both programming language and natural language data. The main technical move is a hybrid objective that runs replaced token detection on NL-PL pairs while using unimodal data to train the generators that produce the replacements. They fine-tune on natural language code search and code documentation generation, and they also build a small probing set for zero-shot NL-PL knowledge checks. The model beats prior pre-trained baselines on the probing task and is presented as SOTA on the two downstream tasks. That is the concrete advance: a practical way to mix paired and unpaired data inside the same pre-training loop for this domain. The approach is straightforward and the downstream tasks are relevant to real developer tools. The soft spot is exactly the one the stress-test flags. The abstract asserts SOTA results but gives no numbers, no baselines, no error bars, and no ablation that isolates the hybrid objective from model size or data volume. Without those controls it is difficult to tell whether the claimed gains come from the new training recipe or from simply having more compute and data. The probing experiment is a nice addition, but it is also zero-shot and small. This paper is for groups working on code search, documentation, or any NL-PL interface. A reader who needs a strong starting checkpoint for those tasks will find the model and the data construction useful. The work is clear enough on its own terms to merit referee time, even if the results section will need more detail and controls before publication.

Referee Report

3 major / 3 minor

Summary. The paper introduces CodeBERT, a Transformer-based bimodal pre-trained model for natural language (NL) and programming language (PL). It is trained with a hybrid objective combining replaced token detection on NL-PL pairs (using generators trained on unimodal data) to learn general-purpose representations. After fine-tuning, it claims state-of-the-art results on natural language code search and code documentation generation; a zero-shot probing evaluation on a constructed NL-PL dataset also shows gains over prior pre-trained models.

Significance. If the empirical claims hold after addressing ablations, the work would be significant for demonstrating effective transfer from a hybrid pre-training regime that mixes bimodal pairs with unimodal data to downstream NL-PL tasks. The probing setup provides a useful lens on what knowledge is captured, and the overall approach could serve as a strong baseline for code intelligence applications.

major comments (3)

[§4.1] §4.1 and Table 2: The SOTA claim on natural language code search reports improved MRR but supplies no ablation that isolates the replaced token detection component from standard MLM on identical bimodal data; without this, it is unclear whether the hybrid objective (rather than data scale or architecture) drives the reported gains over baselines such as RoBERTa.
[§4.2] §4.2 and Table 3: The code documentation generation results claim SOTA BLEU scores after fine-tuning, yet no statistical significance tests or variance across random seeds are provided, and the contribution of the unimodal generator training step is not quantified via controlled removal.
[§5] §5: The NL-PL probing dataset and zero-shot protocol are introduced to show superior performance, but the section does not report the exact number of probe examples per category or the precise metric (accuracy vs. F1) used for the comparison against prior models, weakening the interpretability of the knowledge-acquisition claim.

minor comments (3)

[Abstract] Abstract: No quantitative metrics, dataset names, or baseline references are supplied, making the SOTA assertion difficult to assess at a glance.
[§3.1] §3.1: The notation for bimodal NL-PL pairs and the generator sampling process could be illustrated with a short concrete example to improve reproducibility.
[Figure 1] Figure 1: The architecture diagram lacks labels for the replaced-token-detection head and the flow of unimodal data, reducing clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate the revisions we will make.

read point-by-point responses

Referee: [§4.1] §4.1 and Table 2: The SOTA claim on natural language code search reports improved MRR but supplies no ablation that isolates the replaced token detection component from standard MLM on identical bimodal data; without this, it is unclear whether the hybrid objective (rather than data scale or architecture) drives the reported gains over baselines such as RoBERTa.

Authors: We agree that an explicit ablation is needed to isolate the contribution of replaced token detection. In the revised version we will train a controlled baseline using standard MLM on exactly the same bimodal NL-PL pairs (with identical data scale and architecture) and report its MRR on the code search task alongside CodeBERT in an updated Table 2. This will allow direct attribution of gains to the hybrid objective. revision: yes
Referee: [§4.2] §4.2 and Table 3: The code documentation generation results claim SOTA BLEU scores after fine-tuning, yet no statistical significance tests or variance across random seeds are provided, and the contribution of the unimodal generator training step is not quantified via controlled removal.

Authors: We will add statistical rigor by reporting BLEU scores averaged over five random seeds with standard deviations and include paired significance tests against baselines in §4.2 and Table 3. We will also add a controlled ablation that removes the unimodal data from generator training while keeping all other settings fixed, quantifying its effect on downstream documentation generation performance. revision: yes
Referee: [§5] §5: The NL-PL probing dataset and zero-shot protocol are introduced to show superior performance, but the section does not report the exact number of probe examples per category or the precise metric (accuracy vs. F1) used for the comparison against prior models, weakening the interpretability of the knowledge-acquisition claim.

Authors: We will revise §5 to state the exact number of examples per probe category and explicitly note that accuracy is the evaluation metric. A supplementary table listing the category sizes will be added for full transparency. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pre-training and evaluation

full rationale

The paper describes CodeBERT as a Transformer model trained via a hybrid objective (replaced token detection on bimodal NL-PL pairs plus unimodal data for generators) and then fine-tuned for downstream tasks. All central claims rest on reported experimental metrics for code search and documentation generation, plus zero-shot probing, rather than any derivation, equation, or prediction that reduces to its own inputs by construction. No self-citation chains, ansatzes smuggled via prior work, or fitted parameters renamed as predictions appear in the load-bearing steps. The work is self-contained against external benchmarks and follows standard empirical ML practice.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from the pre-training literature that Transformer models trained with replaced token detection learn transferable representations; no specific free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption Transformer-based models pre-trained with replaced token detection on bimodal data learn general-purpose NL-PL representations
Invoked implicitly when claiming the hybrid objective enables downstream applications.

pith-pipeline@v0.9.0 · 5517 in / 1168 out tokens · 43438 ms · 2026-05-13T20:59:34.348781+00:00 · methodology

discussion (0)

Forward citations

Cited by 31 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Deep Graph-Language Fusion for Structure-Aware Code Generation
cs.SE 2026-05 unverdicted novelty 7.0

CGFuse enables deep token-level fusion of graph-derived structural features into language models, yielding 10-16% BLEU and 6-11% CodeBLEU gains on code generation tasks.
Identifying and Characterizing Semantic Clones of Solidity Functions
cs.SE 2026-04 unverdicted novelty 7.0

A code-and-comment analysis method detects semantic clones in Solidity functions with 59% overall precision (84% for same-name functions) and 97% recall on 300k contracts, plus LLM summaries for uncommented code.
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates
cs.SE 2026-04 unverdicted novelty 7.0

RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than p...
R2Code: A Self-Reflective LLM Framework for Requirements-to-Code Traceability
cs.SE 2026-04 unverdicted novelty 7.0

R2Code improves requirement-to-code traceability with a bidirectional alignment network, self-reflective consistency verification, and dynamic context-adaptive retrieval, yielding 7.4% average F1 gain and up to 41.7% ...
SynthFix: Adaptive Neuro-Symbolic Code Vulnerability Repair
cs.SE 2026-04 unverdicted novelty 7.0

SynthFix adaptively routes LLM code repairs to supervised fine-tuning or symbolic-reward fine-tuning, yielding up to 32% higher exact match on JavaScript and C vulnerability benchmarks.
AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits
cs.SE 2026-04 conditional novelty 7.0

AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.
GAIA: a benchmark for General AI Assistants
cs.CL 2023-11 unverdicted novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
cs.SE 2020-09 conditional novelty 7.0

CodeBLEU improves correlation with human programmer scores on code synthesis tasks by adding syntactic AST matching and semantic data-flow matching to the standard BLEU n-gram approach.
GraphCodeBERT: Pre-training Code Representations with Data Flow
cs.SE 2020-09 accept novelty 7.0

GraphCodeBERT uses data flow graphs in pre-training to capture semantic code structure and reaches state-of-the-art results on code search, clone detection, translation, and refinement.
NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification
cs.SE 2026-05 unverdicted novelty 6.0

NeuroFlake integrates discriminative token mining into LLMs to classify flaky tests, raising F1-score to 69.34% on FlakeBench while showing greater robustness to semantic-preserving perturbations than prior methods.
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System
cs.AI 2026-05 unverdicted novelty 6.0

MAS-Algorithm is a multi-agent workflow that improves AI acceptance rates on algorithmic problems by 6.48% on average, outperforming parameter-efficient fine-tuning.
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System
cs.AI 2026-05 unverdicted novelty 6.0

MAS-Algorithm is a multi-agent workflow that raises acceptance rates on algorithmic problems by 6.48% on average over baseline models.
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
cs.SE 2026-05 unverdicted novelty 6.0

Reinforcement learning on MIR features with fuzz testing feedback reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59% and accuracy to 65.2% while keeping 74.6% recall.
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
cs.SE 2026-05 unverdicted novelty 6.0

Reinforcement learning on MIR features combined with cargo-fuzz validation reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59.0% and accuracy to 65.2%.
VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection
cs.CR 2026-04 unverdicted novelty 6.0

VulStyle pre-trains on 4.9M functions using code, non-terminal ASTs, and stylometry features, then fine-tunes to achieve SOTA F1 gains of 4-48% on BigVul and VulDeePecker.
A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair
cs.SE 2026-04 unverdicted novelty 6.0

Metamorphic testing on Defects4J and GitBug-Java reveals substantial performance drops in seven LLMs that correlate with NLL, indicating data leakage in LLM-based program repair.
On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation
cs.SE 2026-04 unverdicted novelty 6.0

Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.
DiffHLS: Differential Learning for High-Level Synthesis QoR Prediction with GNNs and LLM Code Embeddings
cs.LG 2026-04 unverdicted novelty 6.0

DiffHLS predicts HLS QoR via differential learning: separate GNN+LLM models for kernel baseline and design delta are composed to yield the final estimate, showing lower MAPE than GNN baselines on PolyBench.
ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts
cs.CR 2026-04 unverdicted novelty 6.0

ContractShield achieves 89% Hamming score and 91% F1-score for five vulnerability types in obfuscated smart contracts via hierarchical cross-modal fusion of semantic, temporal, and structural features with only 1-3% p...
GoCoMA: Hyperbolic Multimodal Representation Fusion for Large Language Model-Generated Code Attribution
cs.CL 2026-03 unverdicted novelty 6.0

GoCoMA fuses code stylometry and binary artifact images via hyperbolic Poincaré ball projection and geodesic-cosine attention to attribute LLM-generated code, outperforming baselines on CoDET-M4 and LLMAuthorBench.
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
cs.RO 2025-06 unverdicted novelty 6.0

RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
cs.CL 2021-11 accept novelty 6.0

DeBERTaV3 improves DeBERTa by switching to replaced token detection pre-training and using gradient-disentangled embedding sharing, reaching 91.37% on GLUE and new SOTA on XNLI zero-shot.
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
cs.SE 2021-02 unverdicted novelty 6.0

CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.
PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection
cs.SE 2026-04 unverdicted novelty 5.0

Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.
From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks
cs.CR 2026-04 unverdicted novelty 5.0

LLMs generated 615 vulnerable code snippets aligned with CAPEC and CWE frameworks across three languages, with 0.98 cosine similarity between model outputs.
Improving MPI Error Detection and Repair with Large Language Models and Bug References
cs.SE 2026-04 unverdicted novelty 5.0

Augmenting LLMs with bug references, few-shot learning, chain-of-thought, and RAG improves MPI error detection accuracy from 44% to 77% and generalizes across models.
What Are Adversaries Doing? Automating Tactics, Techniques, and Procedures Extraction: A Systematic Review
cs.SE 2026-04 accept novelty 5.0

Systematic review of 80 papers shows TTP extraction shifting to transformer and LLM methods but limited by narrow datasets, single-label focus, and low reproducibility.
StarCoder: may the source be with you!
cs.CL 2023-05 accept novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
Prompt-Driven Code Summarization: A Systematic Literature Review
cs.SE 2026-04 unverdicted novelty 4.0

A systematic review that categorizes prompting strategies for LLM-based code summarization, assesses their effectiveness, and identifies gaps in research and evaluation practices.
A systematic literature Review for Transformer-based Software Vulnerability detection
cs.SE 2026-04 unverdicted novelty 3.0

A review of 80 studies from 2021-2025 on transformer-based software vulnerability detection identifies trends in architectures, datasets, and challenges such as data imbalance and interpretability.
A Survey on Large Language Models for Code Generation
cs.CL 2024-06 unverdicted novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...

Reference graph

Works this paper leans on

283 extracted references · 283 canonical work pages · cited by 29 Pith papers · 8 internal anchors

[2]

Advances in neural information processing systems , pages=

Sequence to sequence learning with neural networks , author=. Advances in neural information processing systems , pages=

work page
[3]

Proceedings of the 20th international conference on Computational Linguistics , pages=

Orange: a method for evaluating automatic evaluation metrics for machine translation , author=. Proceedings of the 20th international conference on Computational Linguistics , pages=. 2004 , organization=

work page 2004
[4]

Proceedings of the 40th annual meeting on association for computational linguistics , pages=

BLEU: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting on association for computational linguistics , pages=. 2002 , organization=

work page 2002
[7]

URL https://s3-us-west-2

Improving language understanding by generative pre-training , author=. URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf , year=

work page
[8]

BioBERT: a pre-trained biomedical language repre- sentationmodelforbiomedicaltextmining

Biobert: pre-trained biomedical language representation model for biomedical text mining , author=. arXiv preprint arXiv:1901.08746 , year=

work page arXiv 1901
[9]

Visualbert: A simple and perfor- 13 mant baseline for vision and language

Visualbert: A simple and performant baseline for vision and language , author=. arXiv preprint arXiv:1908.03557 , year=

work page arXiv 1908
[12]

Le and Christopher D

Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning , booktitle=

work page
[15]

Advances in neural information processing systems , pages=

Attention is all you need , author=. Advances in neural information processing systems , pages=

work page
[19]

Advances in Neural Information Processing Systems , pages=

Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks , author=. Advances in Neural Information Processing Systems , pages=

work page
[21]

International Conferenceon Learning Representations , year=

code2seq: Generating sequences from structured representations of code , author=. International Conferenceon Learning Representations , year=

work page
[22]

2000 , publisher=

Speech & language processing , author=. 2000 , publisher=

work page 2000
[23]

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) , pages=

Deep code search , author=. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) , pages=. 2018 , organization=

work page 2018
[24]

Foundations and Trends

An introduction to neural information retrieval , author=. Foundations and Trends. 2018 , publisher=

work page 2018
[29]

Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions , pages=

Moses: Open source toolkit for statistical machine translation , author=. Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions , pages=

work page
[32]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Summarizing source code using a neural attention model , author=. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[33]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. International Conferenceon Learning Representations

work page 2019
[35]

Kyunghyun Cho, Bart Van Merri \"e nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

work page internal anchor Pith review Pith/arXiv arXiv 2014
[36]

Le, and Christopher D

Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. \ ELECTRA \ : Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations

work page 2020
[37]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[38]

Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 933--944. IEEE

work page 2018
[39]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436

work page internal anchor Pith review Pith/arXiv arXiv 2019
[40]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2073--2083

work page 2016
[41]

Dan Jurafsky. 2000. Speech & language processing. Pearson Education India

work page 2000
[42]

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2019. Pre-trained contextual embedding of source code. arXiv preprint arXiv:2001.00059

work page arXiv 2019
[43]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

work page Pith review arXiv 2014
[44]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of...

work page 2007
[45]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461

work page internal anchor Pith review Pith/arXiv arXiv 2019
[46]

Chin-Yew Lin and Franz Josef Och. 2004. Orange: a method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the 20th international conference on Computational Linguistics, page 501. Association for Computational Linguistics

work page 2004
[47]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 2019
[48]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in Neural Information Processing Systems, pages 13--23

work page 2019
[49]

Bhaskar Mitra, Nick Craswell, et al. 2018. An introduction to neural information retrieval. Foundations and Trends in Information Retrieval , 13(1):1--126

work page 2018
[50]

Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365

work page Pith review arXiv 2018
[51]

Fabio Petroni, Tim Rockt \"a schel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. Language models as knowledge bases? arXiv preprint arXiv:1909.01066

work page arXiv 2019
[52]

Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual bert? arXiv preprint arXiv:1906.01502

work page Pith review arXiv 2019
[53]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf

work page 2018
[54]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683

work page internal anchor Pith review Pith/arXiv arXiv 2019
[55]

Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685

work page arXiv 2015
[56]

Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. Videobert: A joint model for video and language representation learning. arXiv preprint arXiv:1904.01766

work page arXiv 2019
[57]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112

work page 2014
[58]

Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075

work page Pith review arXiv 2015
[59]

Alon Talmor, Yanai Elazar, Yoav Goldberg, and Jonathan Berant. 2019. olmpics--on what language model pre-training captures. arXiv preprint arXiv:1912.13283

work page arXiv 2019
[60]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998--6008

work page 2017
[61]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

work page internal anchor Pith review Pith/arXiv arXiv 2016
[62]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237

work page arXiv 2019
[63]

International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 1, F ebruary 2002: Special Issue on H ow N et and Its Applications. 2002

work page 2002
[64]

以構詞律與相似法為本的中文動詞自動分類研究 (A Hybrid Approach for Automatic Classification of C hinese Unknown Verbs) [In C hinese]

Tseng, Hui-Hsin and Liu, Chao-Lin and Gao, Zhao-Ming and Chen, Keh-Jiann. 以構詞律與相似法為本的中文動詞自動分類研究 (A Hybrid Approach for Automatic Classification of C hinese Unknown Verbs) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 1, F ebruary 2002: Special Issue on H ow N et and Its Applications. 2002

work page 2002
[65]

Word Sense Disambiguation and Sense-Based NV Event Frame Identifier

Tsai, Jia-Lin and Hsu, Wen-Lian and Su, Jeng-Woei. Word Sense Disambiguation and Sense-Based NV Event Frame Identifier. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 1, F ebruary 2002: Special Issue on H ow N et and Its Applications. 2002

work page 2002
[66]

一種基於知網的語義排歧模型研究 (A Study of Semantic Disambiguation Based on H ow N et) [In C hinese]

Yang, Xiaofeng and Li, Tangqiu. 一種基於知網的語義排歧模型研究 (A Study of Semantic Disambiguation Based on H ow N et) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 1, F ebruary 2002: Special Issue on H ow N et and Its Applications. 2002

work page 2002
[67]

基於文本概念和k NN 的跨語種文本過濾 (Cross-Language Text Filtering Based on Text Concepts and k NN ) [In C hinese

Su, Weifeng and Li, Shaozi and Li, Tanqiu and You, Wenjian. 基於文本概念和k NN 的跨語種文本過濾 (Cross-Language Text Filtering Based on Text Concepts and k NN ) [In C hinese. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 1, F ebruary 2002: Special Issue on H ow N et and Its Applications. 2002

work page 2002
[68]

International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[69]

情境 --- --- 組織/存放辭彙語義知識的恰當框架 (Situation -- A Suitable Framework for Organizing and Positioning Lexical Semantic Knowledge) [In C hinese]

Chen, Zusun and Zhou, Qiang and Zhao, Qiang. 情境 --- --- 組織/存放辭彙語義知識的恰當框架 (Situation -- A Suitable Framework for Organizing and Positioning Lexical Semantic Knowledge) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[70]

A Study on Word Similarity using Context Vector Models

Chen, Keh-Jiann and You, Jia-Ming. A Study on Word Similarity using Context Vector Models. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[71]

基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In C hinese]

Liu, Qun and Li, Sujian. 基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[72]

基於組合特徵的漢語名詞詞義消歧 (A Study on Noun Sense Disambiguation Based on Syntagmatic Features) [In C hinese]

Wang, Hui. 基於組合特徵的漢語名詞詞義消歧 (A Study on Noun Sense Disambiguation Based on Syntagmatic Features) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[73]

《現代漢語新詞語資訊電子詞典》的研究與實現 (Development and Study of the `` Modern C hinese New Words Information Electronic Dictionary '' ) [In C hinese]

Kang, Shiyong. 《現代漢語新詞語資訊電子詞典》的研究與實現 (Development and Study of the `` Modern C hinese New Words Information Electronic Dictionary '' ) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[74]

基於詞彙語義的百科辭典知識提取實驗 (An Experiment on Knowledge Extraction from an Encyclopedia Based on Lexicon Semantics) [In C hinese]

Song, Rou and Xu, Yong. 基於詞彙語義的百科辭典知識提取實驗 (An Experiment on Knowledge Extraction from an Encyclopedia Based on Lexicon Semantics) [In C hinese]. International Journal of Computational Linguistics & C hinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational C hinese Lexical Semantics. 2002

work page 2002
[75]

Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016)

Wu, Chung-Hsien and Tseng, Yuen-Hsien and Kao, Hung-Yu. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[76]

評估尺度相關最佳化方法於華語錯誤發音檢測之研究(Evaluation Metric-related Optimization Methods for M andarin Mispronunciation Detection) [In C hinese]

Hsu, Yao-Chi and Yang, Ming-Han and Hung, Hsiao-Tsung and Lin, Yi-Ju and Chen, Berlin. 評估尺度相關最佳化方法於華語錯誤發音檢測之研究(Evaluation Metric-related Optimization Methods for M andarin Mispronunciation Detection) [In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[77]

融合多任務學習類神經網路聲學模型訓練於會議語音辨識之研究(Leveraging Multi-task Learning with Neural Network Based Acoustic Modeling for Improved Meeting Speech Recognition) [In C hinese]

Yang, Ming-Han and Hsu, Yao-Chi and Hung, Hsiao-Tsung and Chen, Ying-Wen and Chen, Berlin and Chen, Kuan-Yu. 融合多任務學習類神經網路聲學模型訓練於會議語音辨識之研究(Leveraging Multi-task Learning with Neural Network Based Acoustic Modeling for Improved Meeting Speech Recognition) [In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ...

work page 2016
[78]

使用字典學習法於強健性語音辨識(The Use of Dictionary Learning Approach for Robustness Speech Recognition) [In C hinese]

Yan, Bi-Cheng and Shih, Chin-Hong and Liu, Shih-Hung and Chen, Berlin. 使用字典學習法於強健性語音辨識(The Use of Dictionary Learning Approach for Robustness Speech Recognition) [In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[79]

以多層感知器辨識情緒於國台客語料庫 (Use Multilayer Perceptron To Recognize Emotion in M andarin, T aiwanese and H akka Database) [In C hinese]

Chan, Chia-Hsien and Chen, Chia-Ping. 以多層感知器辨識情緒於國台客語料庫 (Use Multilayer Perceptron To Recognize Emotion in M andarin, T aiwanese and H akka Database) [In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[80]

「 V 到」結構的合分詞及語意區分(Word segmentation and sense representation for V-dao structure in C hinese)[In C hinese]

Huang, Shu-Ling and Li, Shi-Min and Bai, Ming-Hong and Wu, Jian-Cheng and Wang, Ying-Ni and Lin, Qing-Long. 「 V 到」結構的合分詞及語意區分(Word segmentation and sense representation for V-dao structure in C hinese)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[81]

歌詞演唱錯誤偵測(Automatic Sung Lyrics Verification)[In C hinese]

Kung, Shiang-Shiun and Ma, Cin-Hao and Shen, Sin-Fu and Hsiao, Po-Yuan and Tsai, Wei-Ho. 歌詞演唱錯誤偵測(Automatic Sung Lyrics Verification)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[82]

基於詞語分布均勻度的核心詞彙選擇之研究(A Study on Dispersion Measures for Core Vocabulary Compilation )[In C hinese]

Bai, Ming-Hong and Wu, Jian-Cheng and Chien, Ying-Ni and Huang, Shu-Ling and Lin, Ching-Lung. 基於詞語分布均勻度的核心詞彙選擇之研究(A Study on Dispersion Measures for Core Vocabulary Compilation )[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[83]

什麼時候「認真就輸了」？ --- --- 語料庫中「認真」一詞的語意變化(Do We Lose When Being Serious? --- C hange in Meaning of the Word `` Renzen(認真) '' in Corpora)

Chen, Pei-Yi and Chung, Siaw-Fong. 什麼時候「認真就輸了」？ --- --- 語料庫中「認真」一詞的語意變化(Do We Lose When Being Serious? --- C hange in Meaning of the Word `` Renzen(認真) '' in Corpora). Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[84]

Crowdsourcing Experiment Designs for C hinese Word Sense Annotation

Huang, Tzu-Yun and Wu, Hsiao-Han and Lee, Chia-Chen and Lee, Shao-Man and Li, Guan-Wei and Hsieh, Shu-Kai. Crowdsourcing Experiment Designs for C hinese Word Sense Annotation. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[85]

基於相依詞向量的剖析結果重估與排序(N-best Parse Rescoring Based on Dependency-Based Word Embeddings)

Hsieh, Yu-Ming and Ma, Wei-Yun. 基於相依詞向量的剖析結果重估與排序(N-best Parse Rescoring Based on Dependency-Based Word Embeddings). Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[86]

以語言模型評估學習者文句修改前後之流暢度(Using language model to assess the fluency of learners sentences edited by teachers)[In C hinese]

Pu, Guan-Ying and Chen, Po-Lin and Wu, Shih-Hung. 以語言模型評估學習者文句修改前後之流暢度(Using language model to assess the fluency of learners sentences edited by teachers)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[87]

運用序列到序列生成架構於重寫式自動摘要(Exploiting Sequence-to-Sequence Generation Framework for Automatic Abstractive Summarization)[In C hinese]

Hsieh, Yu-Lun and Liu, Shih-Hung and Chen, Kuan-Yu and Wang, Hsin-Min and Hsu, Wen-Lian and Chen, Berlin. 運用序列到序列生成架構於重寫式自動摘要(Exploiting Sequence-to-Sequence Generation Framework for Automatic Abstractive Summarization)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[88]

基於字元階層之語音合成用文脈訊息擷取(Character-Level Linguistic Features Extraction for Text-to-Speech System) [In C hinese]

Chen, Kuan-Hung and Liao, Shu-Han and Liao, Yuan-Fu and Wang, Yih-Ru. 基於字元階層之語音合成用文脈訊息擷取(Character-Level Linguistic Features Extraction for Text-to-Speech System) [In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[89]

多通道之多重音頻串流方法之研究(Multi-channel Source Clustering of Polyphonic Music) [In C hinese]

Kuan, Chih Yi and Su, Li and Chin, Yu Hao and Wang, Jia-Ching. 多通道之多重音頻串流方法之研究(Multi-channel Source Clustering of Polyphonic Music) [In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[90]

Support Super-Vector Machines in Automatic Speech Emotion Recognition

Chen, Chia-Ying and Chen, Chia-Ping. Support Super-Vector Machines in Automatic Speech Emotion Recognition. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[91]

Speech Intelligibility and the Production of Fricative and Affricate among M andarin-speaking Children with Cerebral Palsy

Liu, Chin-Ting and Chen, Li-mei and Lin, Yu-Ching and Cheng, Chia-Fang and Chang, Hui-chen. Speech Intelligibility and the Production of Fricative and Affricate among M andarin-speaking Children with Cerebral Palsy. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[92]

網路新興語言 & 耍 ' 之語意辨析：以批踢踢語料庫為本(On the semantic analysis of the verb shua3 in Taiwan M andarin: The PTT corpus-based study)[In C hinese]

Hu, Hsueh-ying and Chung, Siaw-Fong. 網路新興語言 & 耍 ' 之語意辨析：以批踢踢語料庫為本(On the semantic analysis of the verb shua3 in Taiwan M andarin: The PTT corpus-based study)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[93]

非負矩陣分解法於語音調變頻譜強化之研究(A study of enhancing the modulation spectrum of speech signals via nonnegative matrix factorization)[In C hinese]

Wang, Xu-Xiang and Zheng, Zhi-Hao and Tsao, Yu and Hong, Jhih-Wei. 非負矩陣分解法於語音調變頻譜強化之研究(A study of enhancing the modulation spectrum of speech signals via nonnegative matrix factorization)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[94]

以多重表示選擇文章分類的樣本(Using Multiple Representations to Select Instances for Text Classification)[In C hinese]

Chen, Yao-Hui and Wang, Jhih-Wei. 以多重表示選擇文章分類的樣本(Using Multiple Representations to Select Instances for Text Classification)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[95]

Computing Sentiment Scores of Verb Phrases for V ietnamese

Tran, Thien Khai and Phan, Tuoi Thi. Computing Sentiment Scores of Verb Phrases for V ietnamese. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[96]

Automatic evaluation of surface coherence in L 2 texts in C zech

Rysov \'a , Kate r ina and Rysov \'a , Magdal \'e na and M \' rovsk \'y , Ji r \'. Automatic evaluation of surface coherence in L 2 texts in C zech. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016
[97]

F acebook 活動事件擷取系統( F acebook Activity Event Extraction System)[In C hinese]

Lin, Yuan-Hao and Chang, Chia-Hui. F acebook 活動事件擷取系統( F acebook Activity Event Extraction System)[In C hinese]. Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ( ROCLING 2016). 2016

work page 2016

Showing first 80 references.