MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

Chenbin Su; Cong Gao; Ge Chu; Jianfei Tang; Jieshuai Yang; Jingwei Ye; Xin Li; Zhi Wang

arxiv: 2605.09421 · v3 · pith:EAYAS546new · submitted 2026-05-10 · 💻 cs.SE

MACAA: Belief-Revision Multi-Agent Reasoning for Code Authorship Verification

Jingwei Ye , Zhi Wang , Xin Li , Cong Gao , Chenbin Su , Jieshuai Yang , Jianfei Tang , Ge Chu This is my paper

Pith reviewed 2026-05-19 16:48 UTC · model grok-4.3

classification 💻 cs.SE

keywords code authorship verificationmulti-agent systemsbelief revisionlarge language modelscross-language code analysistraining-free methodssoftware forensicsplagiarism detection

0 comments

The pith

A coordinator and four expert agents use belief revision to verify code authorship without any training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MACAA as a way to determine if two code samples come from the same author even when no prior examples of that author exist. Traditional methods need lots of labeled training data and struggle with code in different languages or from new programmers. By breaking the analysis into layout, word choice, structure, and coding style, then having a coordinator expand, contract, and revise beliefs to keep them consistent, the system produces decisions with traceable reasons. This matters because it opens authorship checks for real-world cases like detecting copied code or investigating software incidents where training sets are unavailable.

Core claim

MACAA replaces direct judgments from large language models with a structured process of hypothesis refinement: the Coordinator collects signals from four Expert Agents on layout, lexical, syntactic, and programming-pattern evidence, then applies expansion to gather more, contraction to discount unreliable parts, and revision to resolve conflicts, resulting in auditable authorship decisions that maintain consistency.

What carries the argument

The belief-revision multi-agent framework with a Coordinator that manages expansion, contraction, and revision of evidence collected by four specialized Expert Agents.

Load-bearing premise

The four expert agents can pull accurate evidence without hallucinating even from mixed-language code, and the coordinator's expansion-contraction-revision steps lead to correct authorship calls.

What would settle it

Running the system on a new set of cross-language code pairs with known authors and checking if its accuracy falls below that of simpler baseline methods when the agents produce conflicting signals.

Figures

Figures reproduced from arXiv: 2605.09421 by Chenbin Su, Cong Gao, Ge Chu, Jianfei Tang, Jieshuai Yang, Jingwei Ye, Xin Li, Zhi Wang.

**Figure 2.** Figure 2: MACAA overview with Coordinator Agent state-machine flow for expert evidence analysis, belief revision, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Code authorship attribution (CAA) supports software forensics, plagiarism detection, and intellectual property protection. However, existing supervised CAA approaches suffer from scarce training data and closed-world assumptions: they require sufficient labeled code from fixed candidate-author sets, making training difficult in low-data cases and predictions unreliable for open-world test pairs with unseen samples, or heterogeneous code pairs. Large language models remove task-specific training, but direct prompting depends on costly expert-designed prompts, can hallucinate over complex heterogeneous code pairs, and rarely yields auditable evidence traces. We propose MACAA, a belief-revision-based multi-agent framework for training-free code authorship verification. MACAA comprises a Coordinator and four Expert Agents analyzing layout, lexical, syntactic, and programming-pattern evidence. The Coordinator gathers expert signals for expansion, discounts unreliable evidence through contraction, and resolves conflicts through revision to preserve belief consistency, replacing direct LLM judgment with auditable hypothesis refinement. MACAA achieves 89.15\% F1 on same-language benchmarks and 80.00\% on mixed cross-language pairs, outperforming the baselines overall in both same-language and cross-language evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MACAA's multi-agent belief-revision setup for code authorship verification is a reasonable attempt at an auditable alternative to direct prompting, but the abstract leaves the performance claims hard to verify without ablations or experimental details.

read the letter

The one or two things to know about this paper are that it proposes a multi-agent framework called MACAA for code authorship verification using belief revision steps, and it claims better performance on cross-language cases than baselines without any training. What is actually new here is tailoring the expansion, contraction, and revision operations from belief revision theory to code-specific evidence from layout, lexical, syntactic, and programming pattern experts. The coordinator then maintains consistency across these signals, which is presented as an improvement over direct prompting that can hallucinate on complex pairs. This setup aims to make the reasoning more auditable, which fits well with the needs of software forensics and IP protection. The paper does well in identifying the real pain points with existing supervised CAA methods, such as the need for large labeled datasets and the closed-world assumption that doesn't hold for unseen authors or heterogeneous code. By going training-free and focusing on open-world verification, it targets practical deployment scenarios like mixed-language code pairs. The soft spots are mainly around the experimental validation. The headline numbers like 89.15% F1 for same-language and 80% for mixed are given, but without details on baseline selection, how the datasets were split, statistical tests, or error bars, it's difficult to assess how much the framework contributes. The stress-test point about possible LLM priors driving the cross-language results is worth checking; if there's no ablation comparing the full system to just the expert agents or simple aggregation, then the value of the revision process remains unclear. Assuming the full manuscript has more on this, that would be key. This paper is for researchers and practitioners in software engineering who deal with authorship attribution for plagiarism or forensics, as well as those interested in multi-agent systems applied to code. A reader focused on practical AI tools for code analysis would likely find the architecture ideas useful even if the results need more backing. It deserves a serious referee to evaluate the full experimental section and any ablations provided.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MACAA, a training-free multi-agent framework for code authorship verification that employs a Coordinator together with four Expert Agents specialized in layout, lexical, syntactic, and programming-pattern evidence. The Coordinator performs expansion, contraction, and revision steps to maintain belief consistency and produce auditable authorship decisions. The central empirical claim is that MACAA attains 89.15% F1 on same-language benchmarks and 80.00% F1 on mixed cross-language pairs while outperforming the chosen baselines in both regimes.

Significance. If the reported gains are shown to arise specifically from the belief-revision loop rather than from the underlying LLM priors or individual agent signals, the work would offer a concrete, auditable alternative to both supervised CAA methods and direct LLM prompting for open-world and heterogeneous code settings.

major comments (2)

[§4 (Experimental Results)] §4 (Experimental Results) and Table 2: the headline F1 scores of 89.15% (same-language) and 80.00% (cross-language) are presented without an ablation that disables the Coordinator’s expansion-contraction-revision loop while retaining identical agent outputs and test pairs; without this comparison it is impossible to attribute the cross-language improvement to the proposed mechanism rather than to the base LLM’s cross-lingual code understanding.
[§3.2 (Expert Agents)] §3.2 (Expert Agents) and §4.1 (Dataset Construction): the claim that the four agents reliably extract non-hallucinated evidence on heterogeneous or cross-language pairs is not supported by any quantitative validation (e.g., inter-agent agreement rates, manual inspection of extracted evidence, or error analysis on mixed-language pairs); this assumption is load-bearing for the cross-language result.

minor comments (2)

[Abstract] The abstract states that MACAA “outperforms the baselines overall” but neither names the baselines nor supplies the corresponding F1 numbers; this information should appear in the abstract or be cross-referenced to a table.
[Figure 3] Figure 3 (coordinator workflow) uses abbreviations (E, C, R) that are defined only in the caption; inline definitions or a legend would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the empirical claims.

read point-by-point responses

Referee: [§4 (Experimental Results)] §4 (Experimental Results) and Table 2: the headline F1 scores of 89.15% (same-language) and 80.00% (cross-language) are presented without an ablation that disables the Coordinator’s expansion-contraction-revision loop while retaining identical agent outputs and test pairs; without this comparison it is impossible to attribute the cross-language improvement to the proposed mechanism rather than to the base LLM’s cross-lingual code understanding.

Authors: We agree that an ablation isolating the Coordinator’s expansion-contraction-revision loop is required to attribute gains specifically to the belief-revision mechanism rather than to the underlying LLM. While our current baselines include direct LLM prompting, they do not hold agent outputs fixed. In the revised manuscript we will add this controlled ablation to §4 and Table 2, using identical agent outputs and test pairs but replacing the belief-revision steps with simple aggregation, and report the resulting performance to quantify the loop’s contribution. revision: yes
Referee: [§3.2 (Expert Agents)] §3.2 (Expert Agents) and §4.1 (Dataset Construction): the claim that the four agents reliably extract non-hallucinated evidence on heterogeneous or cross-language pairs is not supported by any quantitative validation (e.g., inter-agent agreement rates, manual inspection of extracted evidence, or error analysis on mixed-language pairs); this assumption is load-bearing for the cross-language result.

Authors: We acknowledge that direct quantitative validation of the agents’ evidence extraction on cross-language pairs is currently absent and would strengthen the cross-language claims. Although overall performance and the auditable traces provide supporting evidence, we will add inter-agent agreement rates, a manual inspection of evidence from a sample of mixed-language pairs, and a focused error analysis on heterogeneous cases to the revised §3.2 and §4.1. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and results presented as empirical construction without self-referential reduction

full rationale

The paper introduces MACAA as a novel multi-agent belief-revision framework with a coordinator and four expert agents for layout, lexical, syntactic, and pattern analysis. No equations, fitted parameters, or self-citations are invoked in the provided text to derive the reported F1 scores (89.15% same-language, 80.00% cross-language) from the framework definition itself. Performance claims are positioned as outcomes of the described process evaluated on benchmarks, not tautological renamings or load-bearing self-citations. The derivation chain remains self-contained against external benchmarks and does not reduce any central result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested reliability of LLM-based expert agents and the effectiveness of the belief-revision loop; no free parameters, new physical entities, or machine-checked axioms are declared.

axioms (1)

domain assumption LLM expert agents produce reliable, non-hallucinated signals on layout, lexical, syntactic, and pattern evidence for heterogeneous code
Invoked when the coordinator gathers and revises agent outputs; if false, the entire evidence-refinement process collapses.

pith-pipeline@v0.9.0 · 5741 in / 1259 out tokens · 39326 ms · 2026-05-19T16:48:57.648527+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

Ruchir Puri, David Kung, Geert Janssen, Wei Zhang, Gi- acomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, and 1 others

Scs-gan: learning functionality-agnostic sty- lometric representations for source code authorship verification.IEEE Transactions on Software Engi- neering, 49(4):1426–1442. Ruchir Puri, David Kung, Geert Janssen, Wei Zhang, Gi- acomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, and 1 others. 2021. Codenet: A large...

work page 2021
[2]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao

Talk isn’t always cheap: Understanding fail- ure modes in multi-agent debate.arXiv preprint arXiv:2509.05396. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. In11th International Conference on Learn- ing Representations, ICLR 2023. A Experimental ...

work page arXiv 2023
[3]

naming: both codes use short, lowercase-dominant identifiers (avg_len<3), suggesting consistent personal naming compression habit

work page
[4]

structure: both adopt flat, single-block scripts without helper functions or abstractions

work page
[5]

comment: both are comment-free, aligning with rapid-competition authoring style

work page
[6]

Listing 1: Coordinator: configuration and preliminary review

confounders: competitive template, language_syntax may mimic author-level consistency. Listing 1: Coordinator: configuration and preliminary review. Expert Evidence.Four Expert Agents analyze complementary dimensions in parallel via ReAct tool loops. [LAYOUT] s=0.32, conf=0.58 -> different Python: space (37/37), avg_indent=7.28, indent_std=3.37, comma_tig...

work page
[7]

Source-code verified: flag at time-judge entry, is_half at same logic position

Lexical (s=0.52, conf=0.68): flag->is_half chain is stable, ecosystem-independent. Source-code verified: flag at time-judge entry, is_half at same logic position

work page
[8]

Rechecked 2x, no counter found

Layout (s=0.32) downweighted: Tab/Space attributed to language ecosystem (Py=space, C++=Tab). Rechecked 2x, no counter found

work page
[9]

Layout Expert Agent

Syntactic/Pattern uncertain but do not contradict. opponent notes: Tab/Space persistence >85% in literature; but no evidence author systematically switched. process: 4 rounds, 1 debate, 2 rechecks, 35/40 LLM calls. anchored: PRELIMINARY same@0.62. Listing 6: Layout recheck and final decision. The final decision (same_author, 0.79) agrees with the ground t...

work page
[10]

Current uncertainty? (which layout dims lack/conflict evidence)

work page
[11]

Candidate tool's new info? (expected distinguishing signals)

work page
[12]

Why now? (max info gain, complementary, avoid repetition) ReAct Structure per step:

work page
[13]

Thought: current uncertainty dimension, expected signals, causal link from previous observation

work page
[14]

Priority: uncovered > complementary > conflict resolution

Action: select tool. Priority: uncovered > complementary > conflict resolution

work page
[15]

Assess template/task influence; downweight if affected

Observation: convert output to 1-3 signals. Assess template/task influence; downweight if affected

work page
[16]

thought":

Stop: when coverage met or budget exhausted. Output evidence: summary (one-line style portrait), signals (per dimension), confidence (0-1, stability confidence, not same-author). Output (strict JSON only, no text/markdown/fences): Continue: {"thought":"...", "action":{"type":"tool", "name":"tool_name"}, "stop":false} Stop: {"thought":"...","action": {"typ...

work page
[17]

whitespace_profile: avg_indent, tab/space lines, avg_line_length, empty_line_ratio, trailing_space_lines, indent_std

work page
[18]

delimiter_layout_profile: control_space_before_paren, control_tight_before_paren, comma_space/tight, same_line_block_opener, next_line_block_opener

work page
[19]

comment_layout_profile: comment_line_ratio, inline_comments, standalone_comments, doc_comments

work page
[20]

Lexical Expert Agent

format_stability_profile: indent_switch_rate, line_length_std Key judgment principles: - Indentation and spacing preferences are strong author signals. - Delimiter formatting habits (if(x) vs if (x)) are stable. - Comment style aids judgment but content is task-influenced. - Large code-size differences distort absolute metrics; focus on ratios. - Layout i...

work page
[21]

token_frequency_profile: keyword_ratio, identifier_ratio, operator_ratio, punctuation_ratio, token_top

work page
[22]

token_ngram_profile: token_bigrams, abstract_token_trigrams, longest_repeated_sequence

work page
[23]

char_ngram_profile: char_4gram, char_5gram

work page
[24]

identifier_style_profile: identifier_cases, avg_length, unique_ratio, digit_ratio, underscore_ratio

work page
[25]

if(ID)" vs

abstract_lexical_profile: abstract distributions + bigrams Key principles: - Naming style = strong author signal (snake_case vs camelCase, identifier length, abbreviation habits). Stable across projects. - Abstract templates > concrete tokens. "if(ID)" vs "if(ID==NUM)". - Same-author/different-problem: trust identifier_style, abstract_lexical, char_ngram ...

work page
[26]

ast_node_profile: node type ratios (degraded mode possible)

work page
[27]

ast_path_profile: parent_child_pairs, sibling_pairs

work page
[28]

tree_shape_profile: max_depth, avg_branching, branching_std, node_count

work page
[29]

construct_usage_profile: if/for/while/switch/return ratios

work page
[30]

programming pattern

[Optional] Dolos: dolos_similarity, total_overlap, longest_fragment (reference only) Key principles: - AST paths + context = core author signals. - Tree shape = structural thinking (nested vs flat). - Control-structure prefs (for vs while, early return) = stable. - Size differences: compare RATIOS, not absolutes. - Degraded mode: reduce confidence. - Simi...

work page
[31]

function_metric_profile: function_count, avg_lines_per_function, return_per_function, avg_line_length

work page
[32]

control_strategy_profile: guard_if_ratio, recursive_function_hints, loop_count, if_count

work page
[33]

api_idiom_profile: api_families; plugin_flags

work page
[34]

one simple, one complex

semantic_habit_profile: short_temp_ratio, helper_name_ratio, uppercase_constant_ratio, assert_like_count Key principles: - Function size + organization = stable. - Control strategy = core author signal (guard clause, recursion). - Semantic habits = strong signals: temp variable naming (i/j/k vs x/y/z), helper naming, constant style. - Code-size difference...

work page
[35]

Naming 3

Coding Style 2. Naming 3. Code Structure

work page
[36]

Comments 6

Control-Flow 5. Comments 6. Language Features

work page
[37]

Lexical Fingerprints

Error-Handling 8. Lexical Fingerprints

work page
[38]

Research Manager

Statistical Cues 10. Idiosyncratic. Hard Rules: no default to different_author from artifacts; mark confounders; balanced same/different. Output: {overall_first_impression, candidate_style_axes[], suspected_confounders[], dimension_routing{layout/lexical/ syntactic/pattern{priority,why,focus_question}}, global_questions[], do_not_overtrust[]} Listing 15: ...

work page
[39]

FINALIZE: evidence sufficient or budget exhausted

work page
[40]

RECHECK_DIMENSION: low-confidence/high-impact dimension

work page
[41]

START_DEBATE: two dimensions in conflict

work page
[42]

Priority: no conflict+full evidence > FINALIZE; preliminary conflict > RECHECK; two expert dims conflict > DEBATE

ADJUST_WEIGHTS: post-debate/recheck credibility shift. Priority: no conflict+full evidence > FINALIZE; preliminary conflict > RECHECK; two expert dims conflict > DEBATE. Mandatory: dim-divergence check (dim<0.40 + dim>=0.60 => DEBATE/ RECHECK). LLM comparison failure => must RECHECK. Debate participants = real dimensions only (layout|lexical|syntactic|pat...

work page
[43]

Evidence Sufficiency: all dimensions covered? concrete signals?

work page
[44]

Conflict Resolution: cross-dimension conflicts explained?

work page
[45]

suspect

Marginal Gain: how much new info could further investigation yield? Low-Similarity Veto Rule: - A dimension with similarity_score < 0.40 is "suspect." - Suspect dims prevent evidence_sufficient unless: (a) after debate/recheck, similarity rises to >= 0.45, OR (b) the gap is confirmed as task/algorithm-driven, not style. - With 2+ suspect dims, prefer CONT...

work page
[46]

Ruling: conflict resolved?

work page
[47]

Tracing: dimension credibility update?

work page
[48]

New Evidence: source-level insights from debate

work page
[49]

Corrections: which dimension reports need revision? Evaluation: source consistency, preliminary review alignment, external consistency (one dim contradicts others?), argument strength (who provided more verifiable observations?). Required tasks: determine conflict resolution, assess which side is more persuasive, update dimension credibility, extract new ...

work page
[50]

After re-reading code, overall gut tendency?

work page
[51]

Which dims support same_author? different_author? uncertain?

work page
[52]

Is preliminary confirmed/weakened/overturned?

work page
[53]

Did debate bring genuinely new evidence?

work page
[54]

- Overturning preliminary requires explanation

Was reflection's advice adopted? Hard Rules: - No numeric anchors; uncertain != different_author. - Overturning preliminary requires explanation. - different_author requires >=2 moderate different dims OR 1 strong structural counter-evidence (confounder_risk=low) + debate. - Mixed evidence (2 same + 2 different) => uncertain. - Cross-lang: syntactic = wea...

work page
[55]

author_stable_signals: similarities persisting across problems

work page
[56]

different_author_signals: contrasts supporting different authors

work page
[57]

neutral_or_confounding_signals: overlaps better explained by templates, tasks, ecosystems, or language defaults. Prioritize: {per-dimension stable_focus items} Different-author cues: {per-dimension different_focus items} Actively discount: {per-dimension confounders} Return exactly one JSON object: {tendency, similarity_score, confidence, summary, author_...

work page

[1] [1]

Ruchir Puri, David Kung, Geert Janssen, Wei Zhang, Gi- acomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, and 1 others

Scs-gan: learning functionality-agnostic sty- lometric representations for source code authorship verification.IEEE Transactions on Software Engi- neering, 49(4):1426–1442. Ruchir Puri, David Kung, Geert Janssen, Wei Zhang, Gi- acomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, and 1 others. 2021. Codenet: A large...

work page 2021

[2] [2]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao

Talk isn’t always cheap: Understanding fail- ure modes in multi-agent debate.arXiv preprint arXiv:2509.05396. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. In11th International Conference on Learn- ing Representations, ICLR 2023. A Experimental ...

work page arXiv 2023

[3] [3]

naming: both codes use short, lowercase-dominant identifiers (avg_len<3), suggesting consistent personal naming compression habit

work page

[4] [4]

structure: both adopt flat, single-block scripts without helper functions or abstractions

work page

[5] [5]

comment: both are comment-free, aligning with rapid-competition authoring style

work page

[6] [6]

Listing 1: Coordinator: configuration and preliminary review

confounders: competitive template, language_syntax may mimic author-level consistency. Listing 1: Coordinator: configuration and preliminary review. Expert Evidence.Four Expert Agents analyze complementary dimensions in parallel via ReAct tool loops. [LAYOUT] s=0.32, conf=0.58 -> different Python: space (37/37), avg_indent=7.28, indent_std=3.37, comma_tig...

work page

[7] [7]

Source-code verified: flag at time-judge entry, is_half at same logic position

Lexical (s=0.52, conf=0.68): flag->is_half chain is stable, ecosystem-independent. Source-code verified: flag at time-judge entry, is_half at same logic position

work page

[8] [8]

Rechecked 2x, no counter found

Layout (s=0.32) downweighted: Tab/Space attributed to language ecosystem (Py=space, C++=Tab). Rechecked 2x, no counter found

work page

[9] [9]

Layout Expert Agent

Syntactic/Pattern uncertain but do not contradict. opponent notes: Tab/Space persistence >85% in literature; but no evidence author systematically switched. process: 4 rounds, 1 debate, 2 rechecks, 35/40 LLM calls. anchored: PRELIMINARY same@0.62. Listing 6: Layout recheck and final decision. The final decision (same_author, 0.79) agrees with the ground t...

work page

[10] [10]

Current uncertainty? (which layout dims lack/conflict evidence)

work page

[11] [11]

Candidate tool's new info? (expected distinguishing signals)

work page

[12] [12]

Why now? (max info gain, complementary, avoid repetition) ReAct Structure per step:

work page

[13] [13]

Thought: current uncertainty dimension, expected signals, causal link from previous observation

work page

[14] [14]

Priority: uncovered > complementary > conflict resolution

Action: select tool. Priority: uncovered > complementary > conflict resolution

work page

[15] [15]

Assess template/task influence; downweight if affected

Observation: convert output to 1-3 signals. Assess template/task influence; downweight if affected

work page

[16] [16]

thought":

Stop: when coverage met or budget exhausted. Output evidence: summary (one-line style portrait), signals (per dimension), confidence (0-1, stability confidence, not same-author). Output (strict JSON only, no text/markdown/fences): Continue: {"thought":"...", "action":{"type":"tool", "name":"tool_name"}, "stop":false} Stop: {"thought":"...","action": {"typ...

work page

[17] [17]

whitespace_profile: avg_indent, tab/space lines, avg_line_length, empty_line_ratio, trailing_space_lines, indent_std

work page

[18] [18]

delimiter_layout_profile: control_space_before_paren, control_tight_before_paren, comma_space/tight, same_line_block_opener, next_line_block_opener

work page

[19] [19]

comment_layout_profile: comment_line_ratio, inline_comments, standalone_comments, doc_comments

work page

[20] [20]

Lexical Expert Agent

format_stability_profile: indent_switch_rate, line_length_std Key judgment principles: - Indentation and spacing preferences are strong author signals. - Delimiter formatting habits (if(x) vs if (x)) are stable. - Comment style aids judgment but content is task-influenced. - Large code-size differences distort absolute metrics; focus on ratios. - Layout i...

work page

[21] [21]

token_frequency_profile: keyword_ratio, identifier_ratio, operator_ratio, punctuation_ratio, token_top

work page

[22] [22]

token_ngram_profile: token_bigrams, abstract_token_trigrams, longest_repeated_sequence

work page

[23] [23]

char_ngram_profile: char_4gram, char_5gram

work page

[24] [24]

identifier_style_profile: identifier_cases, avg_length, unique_ratio, digit_ratio, underscore_ratio

work page

[25] [25]

if(ID)" vs

abstract_lexical_profile: abstract distributions + bigrams Key principles: - Naming style = strong author signal (snake_case vs camelCase, identifier length, abbreviation habits). Stable across projects. - Abstract templates > concrete tokens. "if(ID)" vs "if(ID==NUM)". - Same-author/different-problem: trust identifier_style, abstract_lexical, char_ngram ...

work page

[26] [26]

ast_node_profile: node type ratios (degraded mode possible)

work page

[27] [27]

ast_path_profile: parent_child_pairs, sibling_pairs

work page

[28] [28]

tree_shape_profile: max_depth, avg_branching, branching_std, node_count

work page

[29] [29]

construct_usage_profile: if/for/while/switch/return ratios

work page

[30] [30]

programming pattern

[Optional] Dolos: dolos_similarity, total_overlap, longest_fragment (reference only) Key principles: - AST paths + context = core author signals. - Tree shape = structural thinking (nested vs flat). - Control-structure prefs (for vs while, early return) = stable. - Size differences: compare RATIOS, not absolutes. - Degraded mode: reduce confidence. - Simi...

work page

[31] [31]

function_metric_profile: function_count, avg_lines_per_function, return_per_function, avg_line_length

work page

[32] [32]

control_strategy_profile: guard_if_ratio, recursive_function_hints, loop_count, if_count

work page

[33] [33]

api_idiom_profile: api_families; plugin_flags

work page

[34] [34]

one simple, one complex

semantic_habit_profile: short_temp_ratio, helper_name_ratio, uppercase_constant_ratio, assert_like_count Key principles: - Function size + organization = stable. - Control strategy = core author signal (guard clause, recursion). - Semantic habits = strong signals: temp variable naming (i/j/k vs x/y/z), helper naming, constant style. - Code-size difference...

work page

[35] [35]

Naming 3

Coding Style 2. Naming 3. Code Structure

work page

[36] [36]

Comments 6

Control-Flow 5. Comments 6. Language Features

work page

[37] [37]

Lexical Fingerprints

Error-Handling 8. Lexical Fingerprints

work page

[38] [38]

Research Manager

Statistical Cues 10. Idiosyncratic. Hard Rules: no default to different_author from artifacts; mark confounders; balanced same/different. Output: {overall_first_impression, candidate_style_axes[], suspected_confounders[], dimension_routing{layout/lexical/ syntactic/pattern{priority,why,focus_question}}, global_questions[], do_not_overtrust[]} Listing 15: ...

work page

[39] [39]

FINALIZE: evidence sufficient or budget exhausted

work page

[40] [40]

RECHECK_DIMENSION: low-confidence/high-impact dimension

work page

[41] [41]

START_DEBATE: two dimensions in conflict

work page

[42] [42]

Priority: no conflict+full evidence > FINALIZE; preliminary conflict > RECHECK; two expert dims conflict > DEBATE

ADJUST_WEIGHTS: post-debate/recheck credibility shift. Priority: no conflict+full evidence > FINALIZE; preliminary conflict > RECHECK; two expert dims conflict > DEBATE. Mandatory: dim-divergence check (dim<0.40 + dim>=0.60 => DEBATE/ RECHECK). LLM comparison failure => must RECHECK. Debate participants = real dimensions only (layout|lexical|syntactic|pat...

work page

[43] [43]

Evidence Sufficiency: all dimensions covered? concrete signals?

work page

[44] [44]

Conflict Resolution: cross-dimension conflicts explained?

work page

[45] [45]

suspect

Marginal Gain: how much new info could further investigation yield? Low-Similarity Veto Rule: - A dimension with similarity_score < 0.40 is "suspect." - Suspect dims prevent evidence_sufficient unless: (a) after debate/recheck, similarity rises to >= 0.45, OR (b) the gap is confirmed as task/algorithm-driven, not style. - With 2+ suspect dims, prefer CONT...

work page

[46] [46]

Ruling: conflict resolved?

work page

[47] [47]

Tracing: dimension credibility update?

work page

[48] [48]

New Evidence: source-level insights from debate

work page

[49] [49]

Corrections: which dimension reports need revision? Evaluation: source consistency, preliminary review alignment, external consistency (one dim contradicts others?), argument strength (who provided more verifiable observations?). Required tasks: determine conflict resolution, assess which side is more persuasive, update dimension credibility, extract new ...

work page

[50] [50]

After re-reading code, overall gut tendency?

work page

[51] [51]

Which dims support same_author? different_author? uncertain?

work page

[52] [52]

Is preliminary confirmed/weakened/overturned?

work page

[53] [53]

Did debate bring genuinely new evidence?

work page

[54] [54]

- Overturning preliminary requires explanation

Was reflection's advice adopted? Hard Rules: - No numeric anchors; uncertain != different_author. - Overturning preliminary requires explanation. - different_author requires >=2 moderate different dims OR 1 strong structural counter-evidence (confounder_risk=low) + debate. - Mixed evidence (2 same + 2 different) => uncertain. - Cross-lang: syntactic = wea...

work page

[55] [55]

author_stable_signals: similarities persisting across problems

work page

[56] [56]

different_author_signals: contrasts supporting different authors

work page

[57] [57]

neutral_or_confounding_signals: overlaps better explained by templates, tasks, ecosystems, or language defaults. Prioritize: {per-dimension stable_focus items} Different-author cues: {per-dimension different_focus items} Actively discount: {per-dimension confounders} Return exactly one JSON object: {tendency, similarity_score, confidence, summary, author_...

work page