chromo graph-map

Use chromo graph-map when graph evidence is in unitig coordinates but your FASTA, alignments, and dot plots are in contig coordinates. This is common with hifiasm outputs where dot plots are made from p_ctg.fa or hap*.p_ctg.fa, while graph evidence comes from p_utg.gfa, r_utg.gfa, or their .noseq.gfa equivalents.

Unitig and contig coordinates are not directly comparable. A p_ctg.fa coordinate is local to the contig sequence. A p_utg.gfa coordinate is local to one unitig segment. ChromoSort projects unitig intervals onto contig coordinates only through GFA P path records or W walk records whose path names match FASTA contig names.

Run chromo graph-map

chromo graph-map \
  --ctg-fasta assembly.p_ctg.fa \
  --utg-gfa assembly.p_utg.noseq.gfa \
  --output-prefix results/sample.graphmap

This writes:

Output Description
<prefix>.utg_to_ctg.tsv One projected interval per path step, including contig coordinates, unitig name, orientation, segment length, overlap fields, source GFA, and reuse flags.
<prefix>.path_summary.tsv One row per requested contig/path with projected length, FASTA length, length-difference status, zero-length step count, and reuse count.
<prefix>.warnings.tsv Clear warnings for missing paths, missing segments, no-sequence/no-LN:i segments, and path-vs-FASTA length mismatches.

By default, ChromoSort reports GFA overlaps but does not subtract them from projected spans. Add --trim-overlaps only when you intentionally want each left overlap removed from the following path step.

hifiasm noseq GFA

For graph topology, unitig boundaries, and junction context, .noseq.gfa is usually preferred because it preserves segment names, lengths, links, paths/walks, and tags without embedding large sequence strings. Use full .gfa or FASTA only when nucleotide sequence is needed.

If a noseq segment has S utgX * LN:i:1000, ChromoSort can project it as a 1000 bp segment. If the segment has S utgX * and no LN:i, ChromoSort keeps the length as 0 and writes:

Segment utgX has no sequence and no LN:i length; cannot project coordinates reliably.

Some hifiasm GFA files contain only S, L, and read-alignment A records. Those files are still useful as graph topology inputs, but they do not contain the path/walk records needed to map unitig coordinates onto contig FASTA coordinates. In that case chromo graph-map writes a warning instead of pretending the coordinate systems are interchangeable.

Parameters

Parameter Default Meaning
--ctg-fasta required Contig FASTA whose record names should match GFA path/walk names.
--ctg-fai none Optional FASTA index; defaults to <ctg-fasta>.fai when present.
--utg-gfa required Unitig or contig GFA. .noseq.gfa is preferred when sequence is not needed.
--path-name all FASTA contigs Specific GFA path/walk name to project; may be repeated.
--output-prefix required Prefix for projection, path-summary, and warning TSVs.
--trim-overlaps off Subtract each left overlap from the following path-step span.
--length-tolerance-bp 1000 Absolute path-vs-FASTA length tolerance.
--length-tolerance-frac 0.01 Fractional path-vs-FASTA length tolerance.