Measuring the Expertise of Workers for Crowdsourcing Applications
Pith reviewed 2026-05-25 17:38 UTC · model grok-4.3
The pith
A new expertise measure for crowdsourcing workers uses four factors from belief functions theory when an objective dataset exists.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose an innovative measure of expertise assuming that we possess a dataset with an objective comparison of the items concerned. Our method is based on the definition of four factors with the theory of belief functions. We compare our method to the Fagin distance on a dataset from a real experiment, where users have to assess the quality of some audio recordings. Then, we propose to fuse both the Fagin distance and our expertise measure.
What carries the argument
Four factors defined with the theory of belief functions to quantify worker expertise from objective item comparisons.
If this is right
- The expertise measure applies to crowdsourcing tasks like audio quality assessment where objective item comparisons exist.
- Direct comparison to the Fagin distance on real experimental data reveals relative performance.
- Fusing the belief function measure with the Fagin distance yields a combined estimator for worker expertise.
- Improved expertise assessment supports better task assignment and quality control in crowdsourcing platforms.
Where Pith is reading between the lines
- Platforms might weight individual worker contributions more heavily when objective comparison data is available for calibration.
- The fusion step implies that combining distance-based and belief-function approaches could increase robustness across varied task types.
- If the four factors prove stable, the method could extend to other crowdsourced domains such as image labeling or text verification without full ground truth.
Load-bearing premise
A dataset with an objective comparison of the items concerned is available to define and apply the four factors.
What would settle it
If the four-factor belief function measure applied to the audio recording dataset shows no improvement in alignment with known worker performance over the Fagin distance, the claim of an innovative measure would be challenged.
read the original abstract
Crowdsourcing platforms enable companies to propose tasks to a large crowd of users. The workers receive a compensation for their work according to the serious of the tasks they managed to accomplish. The evaluation of the quality of responses obtained from the crowd remains one of the most important problems in this context. Several methods have been proposed to estimate the expertise level of crowd workers. We propose an innovative measure of expertise assuming that we possess a dataset with an objective comparison of the items concerned. Our method is based on the definition of four factors with the theory of belief functions. We compare our method to the Fagin distance on a dataset from a real experiment, where users have to assess the quality of some audio recordings. Then, we propose to fuse both the Fagin distance and our expertise measure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an expertise measure for crowdsourcing workers based on four factors defined in the theory of belief functions. The construction assumes access to a dataset supplying objective comparisons between items; the measure is compared to the Fagin distance on a real audio-recording quality-assessment experiment and the two are proposed to be fused.
Significance. If the prerequisite objective-comparison dataset can be obtained or approximated from crowdsourced responses alone, the belief-function construction would supply a new formal route to expertise estimation and the reported comparison on real data would provide a concrete empirical anchor.
major comments (2)
- [Abstract / Method] Abstract and method statement: the four-factor construction is defined only in terms of an objective item-comparison dataset, yet no procedure is supplied for constructing or approximating that dataset from the crowdsourced responses that constitute the motivating setting.
- [Abstract] Abstract: no derivation details, validation metrics, or error analysis are supplied for the four factors, so the central claim cannot be checked against the paper's own equations or data.
minor comments (1)
- [Experiment] The description of the audio-recording experiment lacks detail on how the objective comparisons were obtained for that specific case.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive comments. We respond to each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract / Method] Abstract and method statement: the four-factor construction is defined only in terms of an objective item-comparison dataset, yet no procedure is supplied for constructing or approximating that dataset from the crowdsourced responses that constitute the motivating setting.
Authors: The manuscript explicitly frames the expertise measure under the assumption of access to an objective item-comparison dataset, as stated in the abstract and method. This assumption enables the direct application of belief function theory to define the four factors. We agree that no explicit procedure is provided for deriving or approximating such a dataset from crowdsourced responses alone. In revision we will add a dedicated discussion subsection outlining possible approximation strategies, including the use of majority-vote consensus as a proxy, iterative refinement with partial ground truth, or hybrid approaches that combine limited objective data with worker responses. revision: yes
-
Referee: [Abstract] Abstract: no derivation details, validation metrics, or error analysis are supplied for the four factors, so the central claim cannot be checked against the paper's own equations or data.
Authors: The abstract is intentionally concise. The full manuscript supplies the mathematical definitions of the four factors within the belief-function framework (Section on method), together with the empirical comparison against Fagin distance on the audio-recording quality-assessment dataset. This comparison constitutes the primary validation. To improve accessibility we will expand the abstract with a short clause referencing the four factors and the real-data evaluation, and we will ensure the method section cross-references the defining equations and the experimental protocol. revision: partial
Circularity Check
No circularity: expertise factors defined from external objective-comparison dataset
full rationale
The paper explicitly conditions its four-factor belief-function measure on the availability of an external dataset supplying objective item comparisons. This input is treated as given rather than derived or fitted inside the paper, and the subsequent comparison to Fagin distance is performed on a separate real-experiment dataset. No equation or step reduces the claimed expertise output to a parameter fitted from the same responses the method is meant to evaluate, nor does any load-bearing claim rest on a self-citation chain. The derivation therefore remains self-contained against the stated external benchmark.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Existence of a dataset with objective comparison of the items concerned
Reference graph
Works this paper leans on
-
[1]
Ben Rjab, A., Kharoune, M., Miklos, Z., and Martin, A. (2016). Charac- terization of experts in crowdsourcing platforms. In The 4th International Conference on Belief Functions, volume 9861, pages 97 –
work page 2016
-
[2]
Dawid, P. and Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the em algorithm. 28:20–28. Dempster,
work page 1979
-
[3]
Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. The annals of mathematical statistics, pages 325–339. Essaid et al.,
work page 1967
-
[4]
Essaid, A., Martin, A., Smits, G., and Ben Yaghlane, B. (2014). A distance- based decision in the credal level. In Artificial Intelligence and Symbolic Computation - 12th International Conference, AISC 2014, Seville, Spain, December 11-13,
work page 2014
-
[5]
Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., and Vee, E. (2004). Com- paring and aggregating rankings with ties. In twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 47–58. Howe,
work page 2004
-
[6]
Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6):1–4. Ipeirotis et al.,
work page 2006
-
[7]
Ipeirotis, P. G., Provost, F., and Wang, J. (2010). Machine-learning for spam- mer detection in crowd-sourcing. In HCOMP ’10 Proceedings of the ACM SIGKDD Workshop on Human Computation. ITU,
work page 2010
-
[8]
Modulated noise reference unit (MNRU)
ITU (1996). Modulated noise reference unit (MNRU). Technical Report ITU-T P.810, International Telecommunication Union. Jouili,
work page 1996
-
[9]
Jouili, S. (2011). Indexation de masses de documents graphiques : approches struc- turelles. PhD thesis, Universit ´e Nancy II. Jousselme et al.,
work page 2011
-
[10]
Jousselme, A.-L., Grenier, D., and Boss ´e, ´E. (2001). A new distance be- tween two bodies of evidence. Information fusion, 2(2):91–101. Kendall,
work page 2001
-
[11]
Kendall, M. (1945). The treatment of ties in ranking problems. Biometrika, pages 239–251. Le et al.,
work page 1945
-
[12]
Le, J., Edmonds, A., Hester, V ., and Biewald, L. (2010). Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. InWork- shop on Crowdsourcing for Search Evaluation, pages 17–20. Measuring the expertise of workers for crowdsourcing applications 19 Raykar and Yu,
work page 2010
-
[13]
Raykar, V . C. and Yu, S. (2012). Eliminating spammers and ranking anno- tators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13:491–518. Raykar et al.,
work page 2012
-
[14]
Raykar, V . C., Yu, S., Zhao, L. H., Hermosillo Valadez, G., Florin, C., Bogoni, L., and Moy, L. (2010). Learning from crowds.Journal of Machine Learning Research, 11:1297–
work page 2010
-
[15]
Shafer, G. (1976). A mathematical theory of evidence, volume
work page 1976
-
[16]
Smets, P. (1990). The combination of evidence in the transferable belief model. 12:447 –
work page 1990
-
[17]
Smyth, P., Fayyad, U., Burl, M., Perona, P., and Baldi, P. (1995). Inferring ground truth from subjective labelling of venus images. Advances in Neural Information Pro- cessing Systems, 7:1085–1092
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.