Data Consortia

Eric Bax; John Donald; Kimberly Williams; Lisa Giaffo; Melissa Gerber; Nikki Thompson; Tanisha Sharma

arxiv: 1906.11803 · v1 · pith:R54CZ3HUnew · submitted 2019-06-27 · 💻 cs.CY

Data Consortia

Eric Bax , John Donald , Melissa Gerber , Lisa Giaffo , Tanisha Sharma , Nikki Thompson , Kimberly Williams This is my paper

Pith reviewed 2026-05-25 14:16 UTC · model grok-4.3

classification 💻 cs.CY

keywords data consortiauser data poolinginformed consentdata privacysocietal data usedata frameworksuser control

0 comments

The pith

Groups of consenting users can pool their data through frameworks to benefit themselves and society rather than only companies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that data currently serves company interests like advertising and investment analysis, yet the same data could support public goods such as disease outbreak detection, genetic studies, and economic trend analysis if users direct it. It proposes that groups of informed, consenting users form data consortia to retain control and channel their data toward those ends. A sympathetic reader would care because this shifts the default from privacy defense alone to active user-led data use. The authors map directions, challenges, and possible evolution of such consortia as an alternative to current company-dominated models.

Core claim

Data consortia are groups of consenting, informed users who pool their data under frameworks that let them direct its use for personal and collective benefit, including societal applications like health monitoring and macroeconomic insights, rather than leaving control solely with web companies.

What carries the argument

Data consortia, the framework mechanism that lets users collectively pool and govern their data to achieve benefits beyond individual privacy protection.

If this is right

Data can generate early warnings for disease outbreaks when users direct pooled information toward public health analysis.
Pooled user data can support detailed studies linking genetics to disease without company intermediaries deciding access.
Local and macroeconomic trends become available in real time when users authorize consortia to analyze their data for those purposes.
Users gain leverage to insist their data serves their interests alongside or instead of company-selected advertising and pricing.
Legislative efforts may evolve from privacy restrictions toward enabling user-controlled data sharing structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Existing privacy laws could be amended to recognize and protect data consortia as legal entities with collective rights.
New compensation models might emerge in which consortia members receive direct payments or services in exchange for pooled data access.
Technical standards for secure multi-party computation could become necessary infrastructure if consortia grow beyond small groups.

Load-bearing premise

Users can be made sufficiently informed to give meaningful consent to data pooling without coercion or misunderstanding caused by power imbalances with companies.

What would settle it

A demonstration that large numbers of users consistently fail to understand or control the terms of any proposed data-pooling agreement even after repeated education efforts would show the approach cannot scale.

read the original abstract

Today, web-based companies use user data to provide and enhance services to users, both individually and collectively. Some also analyze user data for other purposes, for example to select advertisements or price offers for users. Some even use or allow the data to be used to evaluate investments in financial markets. Users' concerns about how their data is or may be used has prompted legislative action in the European Union and congressional questioning in the United States. But data can also benefit society, for example giving early warnings for disease outbreaks, allowing in-depth study of relationships between genetics and disease, and elucidating local and macroeconomic trends in a timely manner. So, instead of just a focus on privacy, in the future, users may insist that their data be used on their behalf. We explore potential frameworks for groups of consenting, informed users to pool their data for their own benefit and that of society, discussing directions, challenges, and evolution for such efforts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript claims that users should move beyond privacy-focused concerns to insist that their data be used on their behalf through data consortia—groups of consenting, informed users who pool data for personal and societal benefits such as early disease outbreak warnings, genetic-disease studies, and timely macroeconomic trend analysis. It explores potential frameworks, directions, challenges, and evolutionary paths for such consortia.

Significance. If operational frameworks for user-controlled data consortia can be developed, the work could help reframe data governance from defensive privacy protections toward proactive collective benefit in public health and economics. The exploratory discussion usefully identifies the tension between corporate data use and user interests, but supplies no models, mechanisms, or evidence, so any significance remains prospective and agenda-setting rather than demonstrative.

major comments (1)

The manuscript states that it will 'explore potential frameworks' yet supplies no concrete governance structures, consent protocols, incentive designs, or technical architectures. This absence is load-bearing for the central claim, as the proposal cannot be assessed for feasibility or risks without at least one worked example or high-level specification.

minor comments (1)

The abstract and opening paragraphs could more sharply separate the descriptive problem statement from the forward-looking exploration of consortia.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and address the major comment below.

read point-by-point responses

Referee: The manuscript states that it will 'explore potential frameworks' yet supplies no concrete governance structures, consent protocols, incentive designs, or technical architectures. This absence is load-bearing for the central claim, as the proposal cannot be assessed for feasibility or risks without at least one worked example or high-level specification.

Authors: Our manuscript is explicitly framed as an exploratory and agenda-setting discussion rather than a prescriptive design paper. The abstract states that we 'explore potential frameworks... discussing directions, challenges, and evolution for such efforts,' and the body focuses on the conceptual tension between corporate data use and user interests, along with high-level opportunities in public health and economics. We do not claim to deliver operational models or evidence of feasibility; instead, the contribution lies in reframing data governance toward collective benefit and identifying open questions. Concrete governance structures, consent protocols, or architectures would require substantial follow-on research involving legal, technical, and empirical work that lies beyond this paper's scope. revision: no

Circularity Check

0 steps flagged

No circularity; conceptual proposal with no derivations or fitted claims

full rationale

The manuscript is a high-level exploratory discussion proposing frameworks for user data consortia. It contains no equations, no fitted parameters, no predictions derived from inputs, and no self-citation chains supporting technical claims. The central idea—that consenting users can pool data for mutual benefit—is presented as a direction for future work rather than a derived result. No load-bearing step reduces to its own inputs by construction, satisfying the criteria for a score of 0 with an empty steps list.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a conceptual discussion with no quantitative claims, models, or empirical content; it rests on domain assumptions about user consent and data utility.

axioms (1)

domain assumption Users can be made sufficiently informed about complex data uses to provide meaningful consent
The proposed frameworks depend on informed consent as a foundational premise.

pith-pipeline@v0.9.0 · 5694 in / 1027 out tokens · 26161 ms · 2026-05-25T14:16:20.498247+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Letters from Iceland

2015. Letters from Iceland. Nature Genetics 47 (28 04 2015), 425 EP –. https://doi.org/10.1038/ng.3277

work page doi:10.1038/ng.3277 2015
[2]

REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAME NT AND OF THE COUNCIL

2016. REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAME NT AND OF THE COUNCIL. Oﬃcial Journal of the European Union (2016)

work page 2016
[3]

Ali Alessa and Miad Faezipour. 2018. A review of inﬂuenza detectio n and prediction through social networking sites. Theoretical biology & medical modelling 15, 1 (02 2018), 2; 2–2. https://doi.org/10.1186/s12976-01 7-0074-5

work page doi:10.1186/s12976-01 2018
[4]

Ricardo Baeza-Yates. 2018. Bias on the Web. Commun. ACM 61, 6 (2018), 54–61

work page 2018
[5]

E Cambria, B Schuller, Y Xia, and C Havasi. 2013. New avenues in op inion mining and sentiment analysis. IEEE Intelligent Systems 28, 2 (2013), 15–21

work page 2013
[6]

Francesco DâĂŹAmuri and Juri Marcucci. 2017. The predictive p ower of Google searches in forecasting US unem- ployment. International Journal of Forecasting 33, 4 (2017), 801 – 816. https://doi.org/10.1016/j.ijforeca st.2017.03.004

work page doi:10.1016/j.ijforeca 2017
[7]

Sunna Ebenesersdóttir, Marcela Sandoval-Velasco, Ellen D

S. Sunna Ebenesersdóttir, Marcela Sandoval-Velasco, Ellen D . Gunnarsdóttir, Anuradha Jagadeesan, Valdís B. Guð- mundsdóttir, Elísabet L. Thordardóttir, Margrét S. Einarsdóttir, Kristjan H. S. Moore, Ásgeir Sigurðsson, Droplaug N. Magnúsdóttir, Hákon Jónsson, Steinunn Snorradóttir, Eivind Hovig, Pål Mø ller, Ingrid Kockum, Tomas Olsson, Lars Alfredsson, T...

work page doi:10.1126/scie 2018
[8]

Elshrif Elmurngi and Abdelouahed Gherbi. 2017. An empirical st udy on detecting fake reviews using machine learn- ing techniques. Seventh International Conference on Innovative Computing Technology (INTECH) (2017), 107–114. https://doi.org/10.1109/INTECH.2017.8102442

work page doi:10.1109/intech.2017.8102442 2017
[9]

Daniel F Gudbjartsson, Hannes Helgason, Sigurjon A Gudjonsson, Fl orian Zink, Asmundur Oddson, Arnaldur Gylfa- son, Soren Besenbacher, Gisli Magnusson, Bjarni V Halldorsson, Eirik ur Hjartarson, Gunnar Th Sigurdsson, Simon N Stacey, Michael L Frigge, Hilma Holm, Jona Saemundsdottir, Hafdis T h Helgadottir, Hrefna Johannsdottir, Gunnlau- gur Sigfusson, Gud...

work page doi:10.1038/ng.3247 2015
[10]

Rebecca Hellerstein and Menno Middeldorp. 2012. Forecasting w ith Internet Search Data. Liberty Street Economics (2012). https://libertystreeteconomics.newyorkfed.org/20 12/01/forecasting-with-internet-search-data.html

work page 2012
[11]

Sharpe JD, Hopkins RS, Cook RL, and Striley CW. 2016. Evaluat ing Google, Twitter, and Wikipedia as Tools for Inﬂuenza Surveillance Using Bayesian Change Point Analysis: A Comparativ e Analysis. JMIR Public Health Surveill. 2, 2 (2016)

work page 2016
[13]

Farshad Kooti, Mihajlo Grbovic, Luca Maria Aiello, Nemanja Dj uric, Vladan Radosavljevic, and Kristina Lerman. 2017. Analyzing Uber’s Ride-sharing Economy. In Proceedings of the 26th International Conference on World W ide Web Com- panion (WWW ’17 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switz...

work page doi:10.1145/30 2017
[14]

Theodoros Lappas, Gaurav Sabnis, and Georgios Valkanas. 201 6. The Impact of Fake Reviews on Online Visibility: A Vulnerability Assessment of the Hotel Industry. Information Systems Research 27, 4 (2016)

work page 2016
[15]

Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting Fa ke Reviewer Groups in Consumer Reviews. In Proceedings of the 21st International Conference on World W ide Web (WWW ’12). ACM, New York, NY, USA, 191–200. https://doi.org/10.1145/2187836.2187863

work page doi:10.1145/2187836.2187863 2012
[16]

Irem Onder and Ulrich Gunter. 2015. Forecasting Tourism Demand w ith Google Trends For a Major European City Destination. Tourism Analysis 21 (01 2015), 203–220. https://doi.org/10.3727/108354216 X14559233984773

work page doi:10.3727/108354216 2015
[17]

Cathy O’Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, New York, NY, USA

work page 2016
[18]

Andrew Ortony, G Clore, and A Collins. 1988. The Cognitive Structure of Emotions . Cambridge University Press

work page 1988
[19]

Katie M. Palmer. 2015. Why Iceland Is the World’s Greatest Genetic Laboratory. Wired.com (2015). https://www.wired.com/2015/03/iceland-worlds-greatest- genetic-laboratory/

work page 2015
[20]

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumb s up? Sentiment classiﬁcation using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natur al Language Processing (EMNLP) (2002), 79–86

work page 2002
[21]

David Ramli. 2018. Apple’s Tim Cook Calls for More Regulatio ns on Data Privacy. Bloomberg.com (2018). https://www.bloomberg.com/news/articles/2018-03-24/apple-s-tim-cook-calls-for-more-regulations-on-data-privac y

work page 2018
[22]

Valentyn Rogovskyy. 2018. How companies use alternative dat a and AI in FinTech market. Intellias.com (2018). https://www.intellias.com/artiﬁcial-intelligence-predicts-ﬁnanc ial-markets/

work page 2018
[23]

Lloyd S. Shapley. 1953. A Value for n-person Games. Annals of Mathematical Studies 28 (1953), 307–317

work page 1953
[24]

Ryan Stevenson, Joseph Mikels, and Thomas James. 2007. Char acterization of the aﬀective norms for English words by discrete emotional categories. Behavior Research Methods 39 (2007), 1020–1024

work page 2007
[25]

Abraham Thomas. 2016. Email Receipts used to Forecast Ama zon and Uber Revenues. Quandl.com (2016). https://blog.quandl.com/alternative-data-action-email-rec eipts

work page 2016
[26]

Abraham Thomas. 2016. How Email Receipts Predicted GoProâ ĂŹs Q3 Earnings. Quandl.com (2016). https://blog.quandl.com/email-receipts-predicted-gopros-q3-earnings

work page 2016
[27]

Karma Ura, Sabina Alkire, Tshoki Zangmo, and Karma Wangdi. [n. d. ]. An Extensive Analysis of GNH Index

work page
[28]

Karma Ura, Sabina Alkire, Tshoki Zangmo, and Karma Wangdi. [n. d. ]. A Short Guide to Gross National Happiness Index

work page

[1] [1]

Letters from Iceland

2015. Letters from Iceland. Nature Genetics 47 (28 04 2015), 425 EP –. https://doi.org/10.1038/ng.3277

work page doi:10.1038/ng.3277 2015

[2] [2]

REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAME NT AND OF THE COUNCIL

2016. REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAME NT AND OF THE COUNCIL. Oﬃcial Journal of the European Union (2016)

work page 2016

[3] [3]

Ali Alessa and Miad Faezipour. 2018. A review of inﬂuenza detectio n and prediction through social networking sites. Theoretical biology & medical modelling 15, 1 (02 2018), 2; 2–2. https://doi.org/10.1186/s12976-01 7-0074-5

work page doi:10.1186/s12976-01 2018

[4] [4]

Ricardo Baeza-Yates. 2018. Bias on the Web. Commun. ACM 61, 6 (2018), 54–61

work page 2018

[5] [5]

E Cambria, B Schuller, Y Xia, and C Havasi. 2013. New avenues in op inion mining and sentiment analysis. IEEE Intelligent Systems 28, 2 (2013), 15–21

work page 2013

[6] [6]

Francesco DâĂŹAmuri and Juri Marcucci. 2017. The predictive p ower of Google searches in forecasting US unem- ployment. International Journal of Forecasting 33, 4 (2017), 801 – 816. https://doi.org/10.1016/j.ijforeca st.2017.03.004

work page doi:10.1016/j.ijforeca 2017

[7] [7]

Sunna Ebenesersdóttir, Marcela Sandoval-Velasco, Ellen D

S. Sunna Ebenesersdóttir, Marcela Sandoval-Velasco, Ellen D . Gunnarsdóttir, Anuradha Jagadeesan, Valdís B. Guð- mundsdóttir, Elísabet L. Thordardóttir, Margrét S. Einarsdóttir, Kristjan H. S. Moore, Ásgeir Sigurðsson, Droplaug N. Magnúsdóttir, Hákon Jónsson, Steinunn Snorradóttir, Eivind Hovig, Pål Mø ller, Ingrid Kockum, Tomas Olsson, Lars Alfredsson, T...

work page doi:10.1126/scie 2018

[8] [8]

Elshrif Elmurngi and Abdelouahed Gherbi. 2017. An empirical st udy on detecting fake reviews using machine learn- ing techniques. Seventh International Conference on Innovative Computing Technology (INTECH) (2017), 107–114. https://doi.org/10.1109/INTECH.2017.8102442

work page doi:10.1109/intech.2017.8102442 2017

[9] [9]

Daniel F Gudbjartsson, Hannes Helgason, Sigurjon A Gudjonsson, Fl orian Zink, Asmundur Oddson, Arnaldur Gylfa- son, Soren Besenbacher, Gisli Magnusson, Bjarni V Halldorsson, Eirik ur Hjartarson, Gunnar Th Sigurdsson, Simon N Stacey, Michael L Frigge, Hilma Holm, Jona Saemundsdottir, Hafdis T h Helgadottir, Hrefna Johannsdottir, Gunnlau- gur Sigfusson, Gud...

work page doi:10.1038/ng.3247 2015

[10] [10]

Rebecca Hellerstein and Menno Middeldorp. 2012. Forecasting w ith Internet Search Data. Liberty Street Economics (2012). https://libertystreeteconomics.newyorkfed.org/20 12/01/forecasting-with-internet-search-data.html

work page 2012

[11] [11]

Sharpe JD, Hopkins RS, Cook RL, and Striley CW. 2016. Evaluat ing Google, Twitter, and Wikipedia as Tools for Inﬂuenza Surveillance Using Bayesian Change Point Analysis: A Comparativ e Analysis. JMIR Public Health Surveill. 2, 2 (2016)

work page 2016

[12] [13]

Farshad Kooti, Mihajlo Grbovic, Luca Maria Aiello, Nemanja Dj uric, Vladan Radosavljevic, and Kristina Lerman. 2017. Analyzing Uber’s Ride-sharing Economy. In Proceedings of the 26th International Conference on World W ide Web Com- panion (WWW ’17 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switz...

work page doi:10.1145/30 2017

[13] [14]

Theodoros Lappas, Gaurav Sabnis, and Georgios Valkanas. 201 6. The Impact of Fake Reviews on Online Visibility: A Vulnerability Assessment of the Hotel Industry. Information Systems Research 27, 4 (2016)

work page 2016

[14] [15]

Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting Fa ke Reviewer Groups in Consumer Reviews. In Proceedings of the 21st International Conference on World W ide Web (WWW ’12). ACM, New York, NY, USA, 191–200. https://doi.org/10.1145/2187836.2187863

work page doi:10.1145/2187836.2187863 2012

[15] [16]

Irem Onder and Ulrich Gunter. 2015. Forecasting Tourism Demand w ith Google Trends For a Major European City Destination. Tourism Analysis 21 (01 2015), 203–220. https://doi.org/10.3727/108354216 X14559233984773

work page doi:10.3727/108354216 2015

[16] [17]

Cathy O’Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, New York, NY, USA

work page 2016

[17] [18]

Andrew Ortony, G Clore, and A Collins. 1988. The Cognitive Structure of Emotions . Cambridge University Press

work page 1988

[18] [19]

Katie M. Palmer. 2015. Why Iceland Is the World’s Greatest Genetic Laboratory. Wired.com (2015). https://www.wired.com/2015/03/iceland-worlds-greatest- genetic-laboratory/

work page 2015

[19] [20]

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumb s up? Sentiment classiﬁcation using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natur al Language Processing (EMNLP) (2002), 79–86

work page 2002

[20] [21]

David Ramli. 2018. Apple’s Tim Cook Calls for More Regulatio ns on Data Privacy. Bloomberg.com (2018). https://www.bloomberg.com/news/articles/2018-03-24/apple-s-tim-cook-calls-for-more-regulations-on-data-privac y

work page 2018

[21] [22]

Valentyn Rogovskyy. 2018. How companies use alternative dat a and AI in FinTech market. Intellias.com (2018). https://www.intellias.com/artiﬁcial-intelligence-predicts-ﬁnanc ial-markets/

work page 2018

[22] [23]

Lloyd S. Shapley. 1953. A Value for n-person Games. Annals of Mathematical Studies 28 (1953), 307–317

work page 1953

[23] [24]

Ryan Stevenson, Joseph Mikels, and Thomas James. 2007. Char acterization of the aﬀective norms for English words by discrete emotional categories. Behavior Research Methods 39 (2007), 1020–1024

work page 2007

[24] [25]

Abraham Thomas. 2016. Email Receipts used to Forecast Ama zon and Uber Revenues. Quandl.com (2016). https://blog.quandl.com/alternative-data-action-email-rec eipts

work page 2016

[25] [26]

Abraham Thomas. 2016. How Email Receipts Predicted GoProâ ĂŹs Q3 Earnings. Quandl.com (2016). https://blog.quandl.com/email-receipts-predicted-gopros-q3-earnings

work page 2016

[26] [27]

Karma Ura, Sabina Alkire, Tshoki Zangmo, and Karma Wangdi. [n. d. ]. An Extensive Analysis of GNH Index

work page

[27] [28]

Karma Ura, Sabina Alkire, Tshoki Zangmo, and Karma Wangdi. [n. d. ]. A Short Guide to Gross National Happiness Index

work page