Reading ChromoSort Audit Tables
Use this guide when you have a ChromoSort TSV output and need to decide what it means before trusting a FASTA, plot, manual recipe, scaffold, or graph fill.
The important habit is simple: read the table before reading the FASTA header. ChromoSort FASTA headers carry useful provenance, but the TSV reports are the decision log.
The Core Idea
Most ChromoSort commands write both sequence outputs and audit tables. The sequence file is for downstream tools. The table explains why each record was kept, rejected, split, cut, joined, reviewed, or filled.
Audit tables are deliberately redundant. They keep original IDs, new IDs, reference labels, coordinates, status fields, and evidence summaries together so that users can review decisions without reverse-engineering them from FASTA headers.
How To Read Any Audit Table
Start with these columns or column families:
| What to find | Why it matters |
|---|---|
| Original ID | Tells you which input contig, piece, scaffold junction, or graph path the row describes. |
| New ID | Tells you what name was written to the output FASTA, if any. |
| Status or action | The command’s decision: kept, discarded, split, not split, gap mode, fill status, and so on. |
| Accepted or applied field | In reviewed workflows, tells you whether a row can change sequence. |
| Reference and coordinate fields | Let you compare the table to dot plots and alignment evidence. |
| Evidence summaries | Coverage, identity, MAPQ-derived filtering, graph support, GAF support, long-read support, and risk flags. |
| Reason fields | Explain why a candidate was rejected, smoothed over, left unresolved, or marked stale. |
Then ask two questions:
- Did this row change sequence or only report evidence?
- If it changed sequence, which input FASTA and evidence files made that change valid?
Output Families
| Command | Tables to read first | Main question |
|---|---|---|
chromo sort |
<prefix>.ordered.agp, <prefix>.ordered_components.tsv, <prefix>.contig_assignments.tsv, <prefix>.contig_ref_matches.tsv, <prefix>.chromosome_summary.tsv, <prefix>.submission_checklist.tsv |
Which contigs were kept, assigned, filtered, flagged for split review, and mapped into the ordered FASTA? |
chromo clean |
<prefix>.clean.agp, <prefix>.clean_components.tsv, <prefix>.initial_sort.contig_assignments.tsv, <prefix>.fix_report.tsv, <prefix>.clean_contigs.tsv, <prefix>.submission_checklist.tsv |
Which raw contigs were discarded, inspected, split, or emitted into the cleaned FASTA? |
chromo eval |
<prefix>.fix_review.tsv, <prefix>.scaffold_review.tsv, <prefix>.gapfill_review.tsv, optional <prefix>.eval_all_outputs.tsv |
Which candidate decisions need human accept/reject review, and which tables should feed targeted GAF prep? |
chromo fix |
The --report TSV plus <output-fasta>.agp, <output-fasta>.components.tsv, and <output-fasta>.submission_checklist.tsv by default |
Which requested or detected contigs were split, copied, left unsplit, and mapped into the fixed FASTA? |
chromo cut |
The --report TSV plus <output-fasta>.agp, <output-fasta>.components.tsv, and <output-fasta>.submission_checklist.tsv by default |
Which exact requested cut positions produced which pieces? |
chromo manual apply |
The optional --report TSV plus <output-fasta>.agp, <output-fasta>.components.tsv, and <output-fasta>.submission_checklist.tsv by default |
Which browser-reviewed pieces were emitted or removed? |
chromo gafprep |
<prefix>.targets.tsv, <prefix>.selected_reads.tsv, <prefix>.selected_read_review_links.tsv, <prefix>.dropped_gfa_links.tsv |
Which reads were selected for targeted GraphAligner, which review rows selected them, and did GFA sanitization limit evidence? |
chromo graph-map |
<prefix>.utg_to_ctg.tsv, <prefix>.path_summary.tsv, <prefix>.warnings.tsv |
Did unitig graph coordinates project cleanly onto contig FASTA coordinates? |
chromo scaffold |
<prefix>.scaffold.agp, <prefix>.scaffold_components.tsv, <prefix>.scaffold_gaps.tsv, <prefix>.scaffold_summary.tsv, <prefix>.submission_checklist.tsv, optional <prefix>.graph_gaps.tsv |
What gaps, overlaps, trims, graph context, AGP provenance, and FASTA/AGP handoff checks were recorded? |
chromo gapfill |
<prefix>.gapfill_plan.tsv, <prefix>.gapfilled.agp, <prefix>.gapfilled_components.tsv, <prefix>.submission_checklist.tsv |
Which graph paths are fillable, ambiguous, risky, accepted, or applied, and what final handoff checks remain? |
Status Gallery
Sort Assignment Rows
The status and kept fields in contig_assignments.tsv tell you whether a
contig entered the ordered FASTA.
| Status | Meaning | Usual next question |
|---|---|---|
kept |
Passed placement and overlap filters. | Does the dot plot support this order and orientation? |
kept_split_candidate |
Retained and flagged as a strong multi-reference candidate. | Should this contig go through chromo eval fix, chromo manual, or chromo fix review? |
kept_large_alignment |
Rescued because the best reference match was very large despite slightly low query coverage. | Is the missing coverage due to fragmentation, repeats, or a real issue? |
kept_terminal_overlap |
Retained because it contributes enough one-sided terminal reference span. | Should scaffolding later report or trim the overlap? |
no_alignment |
No usable alignment rows were found. | Do names match, and was the aligner too strict? |
below_min_aligned_bp or below_min_query_cov |
Alignment support did not pass thresholds. | Should thresholds change, or is the contig truly weakly supported? |
ambiguous_ref_match |
No reference dominated enough to assign confidently. | Is this repeat signal, a real translocation, or a split candidate? |
duplicate_overlap |
A better contig already covers nearly all of the reference span. | Is this an alternate fragment, haplotig, repeat, or real duplicated sequence? |
terminal_overlap |
A one-sided overlap did not pass keep or rescue thresholds. | Does the extension matter biologically or for scaffolding? |
Fix Report Rows
chromo fix reports both split and not-split outcomes. A split row describes
one emitted piece. A not_split_* row records why a reviewed or detected contig
was copied unchanged.
Pay attention to:
original_contig,status,new_contig,slice_startandslice_end,dominant_ref,orientation,- reason fields such as smoothing or breakpoint-budget rejection.
If a candidate is not_split_smooth, the planner saw discordance but decided it
was not worth a breakpoint under the current mode and thresholds. That is often
a prompt for manual review rather than a bug.
Eval Review Tables
chromo eval writes shared review-event tables. These are not sequence outputs.
They are editable decision queues.
Look for:
event_type, such assplit_piece,scaffold_gap, orfill_path,acceptor task-specific accepted fields,- source contigs and coordinates,
- evidence columns such as
graph_*,gaf_*, andlongread_*, - notes or review fields added by the user.
Rejected, deleted, stale, or unaccepted rows should not change sequence. The matching executor revalidates accepted rows before applying them.
Scaffold Gap Reports
scaffold_gaps.tsv explains every join between adjacent sorted contigs.
Key fields include:
raw_inferred_gap_bp,gap_bp,gap_mode,overlap_class,overlap_action,- trimming and sequence-identity fields when overlap policies check sequence.
A negative inferred gap means adjacent reference spans overlap. By default, ChromoSort writes a zero-length FASTA gap and reports the overlap. Trimming happens only when an explicit overlap policy asks for it.
Gapfill Plans
gapfill_plan.tsv is a review table first and a sequence application log when
--apply is used.
Read these fields together:
graph_status,path_nodes,candidate_paths,- GAF, Hi-C, and reference-placement support columns,
risk_flags,fill_status,accept_fill,applied.
Without a reviewed plan, chromo gapfill --apply applies fillable paths. With a
reviewed plan, only accepted rows are applied and all accepted rows are
rechecked against the current graph path and fillability status.
Practical Review Workflow
- Open the run summary to confirm inputs and thresholds.
- Open the main row-level table for the command.
- Sort or filter by status fields.
- Inspect rows that changed sequence.
- Inspect rows that were rejected or left unresolved.
- Compare suspicious rows to dot plots or manual dashboard evidence.
- Keep the table beside the FASTA in downstream folders.
For spreadsheet review, freeze the identifier and status columns before editing accept/reject fields. Avoid changing provenance columns unless the command docs explicitly say a field is user-editable.
Cheat Sheet
| If you want to know… | Read… |
|---|---|
| Why a contig was kept or discarded | contig_assignments.tsv |
| Which reference each contig matched before final assignment | contig_ref_matches.tsv |
Which raw contigs were emitted by chromo clean |
clean_contigs.tsv |
| Which fix pieces replaced a source contig | fix report or fix_report.tsv |
| Whether a reviewed eval row can change sequence | accept plus event_type |
| How many Ns were inserted between scaffold contigs | scaffold_gaps.tsv |
| Whether an overlap was trimmed | overlap_action in scaffold_gaps.tsv |
| Whether graph context changed a scaffold | It does not by default; read graph_gaps.tsv as report-only evidence. |
| Why a graph fill did not apply | fill_status, risk_flags, and applied in gapfill_plan.tsv |
Common Traps
Do not parse FASTA headers when a TSV report exists. Headers are helpful, but tables are more complete and easier to audit.
Do not treat report-only graph evidence as a sequence change. graph_* columns
often explain context without changing the FASTA.
Do not assume accept_fill=no means the candidate is biologically false. It
means the planning table has not accepted that sequence-changing action.
Do not edit provenance fields in reviewed tables unless you are intentionally creating a new reviewed decision row and understand the executor validation rules.
Do not apply an old reviewed table after changing FASTA, assignments, graph inputs, or path-search settings. Regenerate the table from current inputs.
What To Look At Next In ChromoSort
- Use Output Files for command-by-command output definitions.
- Use FASTA And Evidence Name Matching when a report contains missing or stale identifiers.
- Use How to Interpret Dot Plots when a table status needs visual context.
- Use chromo eval and chromo manual when a table row needs human review before sequence changes.