Word Embeddings for the Construction Domain

Antoine J.-P. Tixier; Matthew R. Hallowell; Michalis Vazirgiannis

arxiv: 1610.09333 · v1 · pith:UK2D7UF7new · submitted 2016-10-28 · 💻 cs.CL

Word Embeddings for the Construction Domain

Antoine J.-P. Tixier , Michalis Vazirgiannis , Matthew R. Hallowell This is my paper

classification 💻 cs.CL

keywords vectorscorpusclassificationconstructioncreateddomainembeddingsgoogle

0 comments

read the original abstract

We introduce word vectors for the construction domain. Our vectors were obtained by running word2vec on an 11M-word corpus that we created from scratch by leveraging freely-accessible online sources of construction-related text. We first explore the embedding space and show that our vectors capture meaningful construction-specific concepts. We then evaluate the performance of our vectors against that of ones trained on a 100B-word corpus (Google News) within the framework of an injury report classification task. Without any parameter tuning, our embeddings give competitive results, and outperform the Google News vectors in many cases. Using a keyword-based compression of the reports also leads to a significant speed-up with only a limited loss in performance. We release our corpus and the data set we created for the classification task as publicly available, in the hope that they will be used by future studies for benchmarking and building on our work.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Automatically Learning Construction Injury Precursors from Text
cs.CL 2019-07 unverdicted novelty 4.0

Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.