pith. sign in

arxiv: 1907.10588 · v1 · pith:NKS4NGSInew · submitted 2019-06-24 · 💻 cs.HC · cs.SI

Measuring the Expertise of Workers for Crowdsourcing Applications

Pith reviewed 2026-05-25 17:38 UTC · model grok-4.3

classification 💻 cs.HC cs.SI
keywords crowdsourcingexpertise measurementbelief functionsworker qualityFagin distanceaudio quality assessmentquality evaluation
0
0 comments X

The pith

A new expertise measure for crowdsourcing workers uses four factors from belief functions theory when an objective dataset exists.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to measure the expertise of crowd workers on platforms by assuming access to a dataset that provides objective comparisons between items. Four factors are defined using the theory of belief functions to capture aspects of worker performance. This measure is compared to the Fagin distance on data from a real experiment involving audio recording quality assessments. The two approaches are then fused together. A sympathetic reader would care because more accurate expertise estimates could help platforms better assign tasks and improve the reliability of results obtained from the crowd.

Core claim

We propose an innovative measure of expertise assuming that we possess a dataset with an objective comparison of the items concerned. Our method is based on the definition of four factors with the theory of belief functions. We compare our method to the Fagin distance on a dataset from a real experiment, where users have to assess the quality of some audio recordings. Then, we propose to fuse both the Fagin distance and our expertise measure.

What carries the argument

Four factors defined with the theory of belief functions to quantify worker expertise from objective item comparisons.

If this is right

  • The expertise measure applies to crowdsourcing tasks like audio quality assessment where objective item comparisons exist.
  • Direct comparison to the Fagin distance on real experimental data reveals relative performance.
  • Fusing the belief function measure with the Fagin distance yields a combined estimator for worker expertise.
  • Improved expertise assessment supports better task assignment and quality control in crowdsourcing platforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms might weight individual worker contributions more heavily when objective comparison data is available for calibration.
  • The fusion step implies that combining distance-based and belief-function approaches could increase robustness across varied task types.
  • If the four factors prove stable, the method could extend to other crowdsourced domains such as image labeling or text verification without full ground truth.

Load-bearing premise

A dataset with an objective comparison of the items concerned is available to define and apply the four factors.

What would settle it

If the four-factor belief function measure applied to the audio recording dataset shows no improvement in alignment with known worker performance over the Fagin distance, the claim of an innovative measure would be challenged.

read the original abstract

Crowdsourcing platforms enable companies to propose tasks to a large crowd of users. The workers receive a compensation for their work according to the serious of the tasks they managed to accomplish. The evaluation of the quality of responses obtained from the crowd remains one of the most important problems in this context. Several methods have been proposed to estimate the expertise level of crowd workers. We propose an innovative measure of expertise assuming that we possess a dataset with an objective comparison of the items concerned. Our method is based on the definition of four factors with the theory of belief functions. We compare our method to the Fagin distance on a dataset from a real experiment, where users have to assess the quality of some audio recordings. Then, we propose to fuse both the Fagin distance and our expertise measure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an expertise measure for crowdsourcing workers based on four factors defined in the theory of belief functions. The construction assumes access to a dataset supplying objective comparisons between items; the measure is compared to the Fagin distance on a real audio-recording quality-assessment experiment and the two are proposed to be fused.

Significance. If the prerequisite objective-comparison dataset can be obtained or approximated from crowdsourced responses alone, the belief-function construction would supply a new formal route to expertise estimation and the reported comparison on real data would provide a concrete empirical anchor.

major comments (2)
  1. [Abstract / Method] Abstract and method statement: the four-factor construction is defined only in terms of an objective item-comparison dataset, yet no procedure is supplied for constructing or approximating that dataset from the crowdsourced responses that constitute the motivating setting.
  2. [Abstract] Abstract: no derivation details, validation metrics, or error analysis are supplied for the four factors, so the central claim cannot be checked against the paper's own equations or data.
minor comments (1)
  1. [Experiment] The description of the audio-recording experiment lacks detail on how the objective comparisons were obtained for that specific case.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and method statement: the four-factor construction is defined only in terms of an objective item-comparison dataset, yet no procedure is supplied for constructing or approximating that dataset from the crowdsourced responses that constitute the motivating setting.

    Authors: The manuscript explicitly frames the expertise measure under the assumption of access to an objective item-comparison dataset, as stated in the abstract and method. This assumption enables the direct application of belief function theory to define the four factors. We agree that no explicit procedure is provided for deriving or approximating such a dataset from crowdsourced responses alone. In revision we will add a dedicated discussion subsection outlining possible approximation strategies, including the use of majority-vote consensus as a proxy, iterative refinement with partial ground truth, or hybrid approaches that combine limited objective data with worker responses. revision: yes

  2. Referee: [Abstract] Abstract: no derivation details, validation metrics, or error analysis are supplied for the four factors, so the central claim cannot be checked against the paper's own equations or data.

    Authors: The abstract is intentionally concise. The full manuscript supplies the mathematical definitions of the four factors within the belief-function framework (Section on method), together with the empirical comparison against Fagin distance on the audio-recording quality-assessment dataset. This comparison constitutes the primary validation. To improve accessibility we will expand the abstract with a short clause referencing the four factors and the real-data evaluation, and we will ensure the method section cross-references the defining equations and the experimental protocol. revision: partial

Circularity Check

0 steps flagged

No circularity: expertise factors defined from external objective-comparison dataset

full rationale

The paper explicitly conditions its four-factor belief-function measure on the availability of an external dataset supplying objective item comparisons. This input is treated as given rather than derived or fitted inside the paper, and the subsequent comparison to Fagin distance is performed on a separate real-experiment dataset. No equation or step reduces the claimed expertise output to a parameter fitted from the same responses the method is meant to evaluate, nor does any load-bearing claim rest on a self-citation chain. The derivation therefore remains self-contained against the stated external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the availability of an objective comparison dataset and on the standard axioms of belief function theory; no free parameters or invented entities are identifiable from the abstract alone.

axioms (1)
  • domain assumption Existence of a dataset with objective comparison of the items concerned
    Explicitly stated as a prerequisite for the proposed measure.

pith-pipeline@v0.9.0 · 5701 in / 1058 out tokens · 29107 ms · 2026-05-25T17:38:11.587082+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Ben Rjab, A., Kharoune, M., Miklos, Z., and Martin, A. (2016). Charac- terization of experts in crowdsourcing platforms. In The 4th International Conference on Belief Functions, volume 9861, pages 97 –

  2. [2]

    and Skene, A

    Dawid, P. and Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the em algorithm. 28:20–28. Dempster,

  3. [3]

    Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. The annals of mathematical statistics, pages 325–339. Essaid et al.,

  4. [4]

    Essaid, A., Martin, A., Smits, G., and Ben Yaghlane, B. (2014). A distance- based decision in the credal level. In Artificial Intelligence and Symbolic Computation - 12th International Conference, AISC 2014, Seville, Spain, December 11-13,

  5. [5]

    Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., and Vee, E. (2004). Com- paring and aggregating rankings with ties. In twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 47–58. Howe,

  6. [6]

    Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6):1–4. Ipeirotis et al.,

  7. [7]

    G., Provost, F., and Wang, J

    Ipeirotis, P. G., Provost, F., and Wang, J. (2010). Machine-learning for spam- mer detection in crowd-sourcing. In HCOMP ’10 Proceedings of the ACM SIGKDD Workshop on Human Computation. ITU,

  8. [8]

    Modulated noise reference unit (MNRU)

    ITU (1996). Modulated noise reference unit (MNRU). Technical Report ITU-T P.810, International Telecommunication Union. Jouili,

  9. [9]

    Jouili, S. (2011). Indexation de masses de documents graphiques : approches struc- turelles. PhD thesis, Universit ´e Nancy II. Jousselme et al.,

  10. [10]

    Jousselme, A.-L., Grenier, D., and Boss ´e, ´E. (2001). A new distance be- tween two bodies of evidence. Information fusion, 2(2):91–101. Kendall,

  11. [11]

    Kendall, M. (1945). The treatment of ties in ranking problems. Biometrika, pages 239–251. Le et al.,

  12. [12]

    Le, J., Edmonds, A., Hester, V ., and Biewald, L. (2010). Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. InWork- shop on Crowdsourcing for Search Evaluation, pages 17–20. Measuring the expertise of workers for crowdsourcing applications 19 Raykar and Yu,

  13. [13]

    Raykar, V . C. and Yu, S. (2012). Eliminating spammers and ranking anno- tators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13:491–518. Raykar et al.,

  14. [14]

    C., Yu, S., Zhao, L

    Raykar, V . C., Yu, S., Zhao, L. H., Hermosillo Valadez, G., Florin, C., Bogoni, L., and Moy, L. (2010). Learning from crowds.Journal of Machine Learning Research, 11:1297–

  15. [15]

    Shafer, G. (1976). A mathematical theory of evidence, volume

  16. [16]

    Smets, P. (1990). The combination of evidence in the transferable belief model. 12:447 –

  17. [17]

    Smyth, P., Fayyad, U., Burl, M., Perona, P., and Baldi, P. (1995). Inferring ground truth from subjective labelling of venus images. Advances in Neural Information Pro- cessing Systems, 7:1085–1092