chromo gapfill
Use chromo gapfill after final sorting and manual review when a GFA graph
gives a validated sequence path between adjacent sorted contigs, or when an
otherwise ambiguous graph branch has unique non-conflicting support evidence.
What chromo gapfill Does
Given a final chromo sort ordered FASTA, the matching assignment report, a
GFA assembly graph, and optional GAF graph alignments, Hi-C pair evidence, or
reference-placement PAF evidence, chromo gapfill:
- Groups retained contigs by assigned reference sequence.
- Looks at adjacent contig pairs in sorted order.
- Resolves each flank to a GFA segment using original and renamed contig IDs.
- Enumerates graph paths up to
--max-path-edges. - Uses GAF read-path support, Hi-C contact support, and reference-placement PAF support to resolve an otherwise ambiguous graph branch only when one candidate path has unique support above threshold and evidence sources do not conflict.
- Annotates candidate-path risk, including high-degree graph nodes, self-loop nodes, unsequenced nodes, cycle guards, weak/tied/conflicting support, and a branch-complexity score.
- Rejects missing nodes, disconnected flanks, unresolved ambiguous paths, unsequenced nodes, unknown or invalid overlaps, oversized fills, and flank sequence mismatches.
- Writes
<prefix>.gapfill_plan.tsvfor review withaccept_fill=noby default, can write a self-contained HTML reviewer with--review-html, and can accept the shared review-event table fromchromo eval gapfill. - With
--apply, writes<prefix>.gapfilled.fa. Without--reviewed-plan, all currently fillable paths are applied; with--reviewed-plan, only rows with accepted fill decisions are applied and other junctions fall back to inferred or fixed N gaps.
Plan Graph Fills
chromo gapfill \
--ordered-fasta results/sample.ordered.fa \
--assignments results/sample.contig_assignments.tsv \
--gfa assembly_graph.gfa \
--ref-paf paf/sample.ref_vs_asm.paf \
--output-prefix results/sample.gapfill \
--review-html results/sample.gapfill.review.html
Planning mode writes the gapfill plan but does not create a FASTA. Add
--include-fill-sequences when you want short candidate sequences embedded in
the TSV for manual review. To make application explicitly reviewed, edit the
accept_fill column from no to yes only for rows you want to apply, then
pass that edited table back with --reviewed-plan. When --review-html is
provided, the HTML table can filter rows, toggle accepted fillable paths, show
side-by-side candidate path comparisons for ambiguous branches, and export a
reviewed-plan TSV with the same columns.
For spreadsheet-first review using the shared review-event schema, run
chromo eval gapfill instead. It writes <prefix>.gapfill_review.tsv with
accepted fill_path rows that can also be passed to --reviewed-plan.
Apply Reviewed Graph Fills
chromo gapfill \
--ordered-fasta results/sample.ordered.fa \
--assignments results/sample.contig_assignments.tsv \
--gfa assembly_graph.gfa \
--ref-paf paf/sample.ref_vs_asm.paf \
--gaf reads_to_graph.gaf \
--hic-pairs graph_contacts.tsv \
--reviewed-plan results/sample.eval_gapfill.gapfill_review.tsv \
--output-prefix results/sample.reviewed_gapfill \
--apply
Applied mode still refuses ambiguous or unverifiable paths. If GAF is provided,
an ambiguous GFA branch can be filled only when one candidate path has unique
support of at least --min-gaf-path-support reads after --min-gaf-mapq
filtering. If Hi-C pair evidence is provided, one candidate path must have
unique summed contact support of at least --min-hic-path-support. If
--ref-paf is provided, one candidate path can be chosen when its intermediate
graph nodes have uniquely stronger same-reference placement support inside the
expected reference-space gap. When evidence sources uniquely support different
paths, the branch remains unresolved. When --reviewed-plan is used,
ChromoSort accepts either the legacy gapfill-plan TSV with accept_fill=yes or
the shared chromo eval gapfill review-event TSV with accept=yes. It rechecks
the current scaffold, contig pair, and path_nodes before applying an accepted
row, so stale reviewed paths fail instead of being applied. For a fillable path,
ChromoSort inserts the graph sequence after the left flank and trims the right
flank prefix by the final GFA overlap so the joined sequence follows the graph
path without duplicating the overlap. Unfilled junctions receive the inferred
reference-space N gap, or --fixed-gap-bp when provided.
chromo gapfill Outputs
| Output | Description |
|---|---|
<prefix>.gapfill_plan.tsv |
One row per adjacent sorted contig pair with graph status, path nodes, GAF support counts/status/supporting reads, Hi-C and reference-placement support counts, risk flags, branch-complexity score, high-degree/self-loop/unsequenced node lists, fill status, inserted bp, right-trim bp, fallback gap bp, editable accept_fill, and whether the fill was applied. |
--review-html path |
Optional self-contained HTML table for reviewing gapfill-plan rows, comparing candidate paths, and exporting a reviewed-plan TSV. |
<prefix>.gapfilled.fa |
Optional FASTA written only with --apply, containing one record per assigned reference plus unassigned records. |
<prefix>.run_summary.txt |
Inputs, parameters, output paths, and fill-status counts. |
Example chromo gapfill Output
Table 1. Example gapfill_plan.tsv row. Selected columns from a graph fixture
show a junction resolved by reference-placement PAF support. The default
accept_fill=no makes planning review explicit before strict reviewed
application.
| scaffold | left_contig | right_contig | graph_status | path_nodes | candidate_paths | ref_path_support | ref_best_alt_support | risk_flags | fill_status | fill_bp | right_trim_bp | accept_fill | applied |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 |
chr1_left |
chr1_right |
ref_paf_resolved_paths |
left+,bridge_good+,right+ |
2 |
8 |
6 |
branching,high_degree |
fillable |
4 |
4 |
no |
no |
Listing 1. Example applied gapfilled FASTA output. With --apply, fillable
paths insert graph sequence and trim the right flank by the terminal GFA
overlap; unresolved junctions use fallback N gaps.
>chr1 contigs=2 filled_gaps=1 fallback_gaps=0 fill_bp=4 fallback_gap_bp=0 trimmed_bp=4
AAAACCCCGGGGTTTT
chromo gapfill Parameters
| Parameter | Default | Meaning |
|---|---|---|
--ordered-fasta |
required | Final ordered FASTA from chromo sort. |
--assignments |
required | Matching <prefix>.contig_assignments.tsv report from chromo sort. |
--gfa |
required | Assembly graph GFA containing segment sequences and links. |
--gaf |
none | Optional GAF graph alignments used to resolve otherwise ambiguous candidate graph paths. |
--hic-pairs |
none | Optional TSV of graph-node contact counts with node_a, node_b, and count columns. |
--ref-paf |
none | Optional reference-to-assembly PAF used to score intermediate graph nodes against the expected reference-space gap. |
--output-prefix |
required | Prefix for gapfill plan, run summary, and optional gapfilled FASTA. |
--apply |
off | Write <prefix>.gapfilled.fa using only accepted graph paths. |
--reviewed-plan |
none | Optional edited gapfill plan TSV or chromo eval gapfill review-event TSV. With --apply, only accepted rows are applied after the current path is rechecked. |
--review-html |
none | Optional self-contained HTML review dashboard for the gapfill plan. |
--fixed-gap-bp |
none | Use this many Ns for unresolved gaps in --apply output instead of inferred reference-space gaps. |
--max-path-edges |
4 |
Maximum GFA link depth searched between adjacent sorted contigs. |
--max-candidate-paths |
2 |
Stop path enumeration after this many candidates. The default distinguishes unique from ambiguous paths. |
--min-gaf-mapq |
20 |
Minimum GAF MAPQ for a read path to support a candidate fill. |
--min-gaf-path-support |
1 |
Minimum supporting GAF read paths required to resolve an ambiguous branch. |
--min-hic-path-support |
1 |
Minimum summed Hi-C contact support required to resolve an ambiguous branch. |
--min-ref-path-support |
1 |
Minimum expected-gap reference-placement support required to resolve an ambiguous branch. |
--min-ref-paf-mapq |
0 |
Minimum MAPQ for PAF rows used by --ref-paf. |
--min-ref-paf-idy |
0.0 |
Minimum percent identity for PAF rows used by --ref-paf. |
--include-secondary-ref-paf |
off | Include secondary PAF rows marked tp:A:S when reading --ref-paf. |
--max-fill-bp |
1000000 |
Maximum inserted graph sequence allowed for one fill. Set negative to disable. |
--include-fill-sequences |
off | Include candidate fill sequences in the TSV plan. |
--simple-headers |
off | Write gapfilled FASTA headers containing only the scaffold ID. |
Reasoning Behind chromo gapfill
Filling Is Explicit
chromo scaffold --gfa remains report-only. chromo gapfill is the explicit
sequence-changing command, and it only changes sequence when --apply is set.
This keeps evidence review separate from FASTA construction.
Reviewed Plan Gate
For strict reviewed application, run either chromo eval gapfill or
chromo gapfill once in planning mode, mark only approved rows as accepted,
then rerun with --apply and --reviewed-plan. ChromoSort recomputes the graph
path and validates the accepted row before applying it. Accepted rows whose
current path_nodes or fillable status no longer match are rejected with an
error. --review-html writes a browser-based table for the legacy plan-review
step; chromo eval gapfill writes the table-only counterpart.
Unique Paths Or Unique Evidence
Assembly graphs often contain repeats, bubbles, and alternate paths. If more
than one candidate path is found within the search limit, chromo gapfill
usually marks the junction ambiguous_paths and falls back to Ns in applied
output. GAF read paths, Hi-C contacts, and reference-placement PAF are supported
tie-breakers: an ambiguous branch can be resolved only when one candidate has
unique support above the configured threshold and no other evidence source
uniquely supports a different path. Ties, weak support, or conflicting evidence
remain unresolved for manual review.
Verify the Flanks
Before applying a graph fill, ChromoSort checks that the ordered FASTA flank sequences match the oriented GFA segments used by the path. This protects against applying a graph path to a FASTA that has been renamed, trimmed, reverse-complemented, or otherwise edited without matching graph coordinates.