{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2015:JTXGZ6XRNNF3MCBYMEK3ALCJF2","short_pith_number":"pith:JTXGZ6XR","schema_version":"1.0","canonical_sha256":"4cee6cfaf16b4bb608386115b02c492ea34223319b3655f8bd71fd74311a2161","source":{"kind":"arxiv","id":"1502.02259","version":1},"attestation_state":"computed","paper":{"title":"Contextual Markov Decision Processes","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Assaf Hallak, Dotan Di Castro, Shie Mannor","submitted_at":"2015-02-08T14:58:50Z","abstract_excerpt":"We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work f"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"1502.02259","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"stat.ML","submitted_at":"2015-02-08T14:58:50Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"50e088e6d4d41d9bb6268d6c993318b0903101370f97943954868d4e99612a81","abstract_canon_sha256":"87257b35451750dbe41178c7d06dea43cd09df5edd39aaef83548ffc637d40e0"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T02:27:43.089414Z","signature_b64":"rFS77MO25bkinKpPgEjkEhIJD6Yn+oKcRi9+veRg04nueIs6plLmV+V+mAfDyNjtKQmmCFYgHakExBDskkRsBQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"4cee6cfaf16b4bb608386115b02c492ea34223319b3655f8bd71fd74311a2161","last_reissued_at":"2026-05-18T02:27:43.088700Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T02:27:43.088700Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Contextual Markov Decision Processes","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Assaf Hallak, Dotan Di Castro, Shie Mannor","submitted_at":"2015-02-08T14:58:50Z","abstract_excerpt":"We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work f"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1502.02259","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1502.02259","created_at":"2026-05-18T02:27:43.088799+00:00"},{"alias_kind":"arxiv_version","alias_value":"1502.02259v1","created_at":"2026-05-18T02:27:43.088799+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1502.02259","created_at":"2026-05-18T02:27:43.088799+00:00"},{"alias_kind":"pith_short_12","alias_value":"JTXGZ6XRNNF3","created_at":"2026-05-18T12:29:27.538025+00:00"},{"alias_kind":"pith_short_16","alias_value":"JTXGZ6XRNNF3MCBY","created_at":"2026-05-18T12:29:27.538025+00:00"},{"alias_kind":"pith_short_8","alias_value":"JTXGZ6XR","created_at":"2026-05-18T12:29:27.538025+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":8,"internal_anchor_count":5,"sample":[{"citing_arxiv_id":"2605.16054","citing_title":"Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making","ref_index":299,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17431","citing_title":"MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2509.15519","citing_title":"Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2509.22981","citing_title":"MDP modeling for multi-stage stochastic programs","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02348","citing_title":"Contextual Intelligence The Next Leap for Reinforcement Learning","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03023","citing_title":"Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control","ref_index":14,"is_internal_anchor":false},{"citing_arxiv_id":"2604.12645","citing_title":"Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring","ref_index":13,"is_internal_anchor":false},{"citing_arxiv_id":"2604.21640","citing_title":"Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation","ref_index":14,"is_internal_anchor":false}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2","json":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2.json","graph_json":"https://pith.science/api/pith-number/JTXGZ6XRNNF3MCBYMEK3ALCJF2/graph.json","events_json":"https://pith.science/api/pith-number/JTXGZ6XRNNF3MCBYMEK3ALCJF2/events.json","paper":"https://pith.science/paper/JTXGZ6XR"},"agent_actions":{"view_html":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2","download_json":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2.json","view_paper":"https://pith.science/paper/JTXGZ6XR","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1502.02259&json=true","fetch_graph":"https://pith.science/api/pith-number/JTXGZ6XRNNF3MCBYMEK3ALCJF2/graph.json","fetch_events":"https://pith.science/api/pith-number/JTXGZ6XRNNF3MCBYMEK3ALCJF2/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2/action/timestamp_anchor","attest_storage":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2/action/storage_attestation","attest_author":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2/action/author_attestation","sign_citation":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2/action/citation_signature","submit_replication":"https://pith.science/pith/JTXGZ6XRNNF3MCBYMEK3ALCJF2/action/replication_record"}},"created_at":"2026-05-18T02:27:43.088799+00:00","updated_at":"2026-05-18T02:27:43.088799+00:00"}