Welcome Guest, you are in: Login

Fulton Wiki

RSS RSS

Navigation





Search the wiki
»

PoweredBy

Running Barnacle

RSS
Modified on 2014/02/19 23:40 by Ben Fulton Categorized as Uncategorized
./src/barnacle.pl -lib SIM06 -lib_dir /media/ben/Drive2/Bioinformatics/Capstone/barnacle-1.0.0/sample_data/SIM06 -config /media/ben/Drive2/Bioinformatics/Capstone/barnacle-1.0.0/sample_data/SIM06/project.cfg -identify_candidates

./src/barnacle.pl -lib SIM06 -lib_dir /media/ben/Drive2/Bioinformatics/Capstone/barnacle-1.0.0/sample_data/pre_assembled/SIM06/ -config /media/ben/Drive2/Bioinformatics/Capstone/barnacle-1.0.0/sample_data/SIM06/project.cfg -identify_candidates

ERROR: No BLAT command path found in config file

Added: blat = /usr/bin/blat

to config file

ERROR: Cannot find 2bit genome file: /home/ben/barnacle-1.0.0/annotations/hg19.2bit

Downloaded that file from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/

ERROR: No SAMtools command path found in config file: /media/ben/Drive2/Bioinformatics/Capstone/barnacle-1.0.0/sample_data/SIM06/project.cfg

Added to config

ERROR: Cannot find gene annotations file: /home/ben/barnacle-1.0.0/annotations/UCSC_genes_ref.txt

Found a file called setup_annotations.sh which seems to download required references

Running this identified did download several files.

ERROR: Cannot find gene feature coordinates file: /home/ben/barnacle-1.0.0/annotations/UCSC_genes_ref.exons.introns.std_chr.bed

Googling for this file leads me to a version of Barnacle available on GitHub. Haven't found the exact reference yet, but it does mention running setup.py so trying that. After some trial-and-error, determined that it wants a login to a cluster as an argument:

python setup.py befulton@mason.indiana.edu

Fails on ubuntu due to warnings being treated as errors. Asking a question on biostars led me to this line:

export CFLAGS="-O2 -U_FORTIFY_SOURCE"

Decided to start over by forking the Github project, which has a more detailed README.

Commited a few modifications to Git to allow compilation in Ubuntu. Now, following the README, run:

./src/barnacle.pl -lib SIM06 -lib_dir sample_data/SIM06 -config sample_data/sample.cfg -identify_candidates

Now it appears that input information comes from the Assembly dir inside the lib_dir option. That directory can contain a subdirectory called "current" which will be analyzed automatically, or a specific version can be analyzed with the assembly_ver option. Created a directory called new_stuff, then copied in the directory sample_data/pre_assembled/abyss-1.3.2/merge. Then:

./src/barnacle.pl -lib SIM06 -lib_dir sample_data/SIM06 -config sample_data/sample.cfg -identify_candidates -debug -assembly_ver new_stuff

Seems to work.

Now copy a Trinity output file, Trinity.fasta, to the sample_data folder in a Trinity folder, and run

./src/barnacle.pl -lib SIM06 -lib_dir sample_data/SIM06 -config sample_data/sample.cfg -identify_candidates -debug -assembly_ver Trinity

Result: ERROR: Cannot find contig-to-genome alignments directory: /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/output

Created that directory and reran.

Result: ERROR: Cannot find contig sequences: /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/merge/SIM06-contigs.fa

Renamed Trinity.fasta to merge/SIM06-contigs.fa and reran.

Result: ERROR: Could not find any contig-to-genome alignment files: /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/output/seq.*.psl

blat annotations/hg19.2bit sample_data/SIM06/Assembly/Trinity/merge/SIM06-contigs.fa output.psl mv output.psl sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/output/seq.1.psl ./src/barnacle.pl -lib SIM06 -lib_dir sample_data/SIM06 -config sample_data/sample.cfg -identify_candidates -debug -assembly_ver Trinity

Result: ERROR: Cannot find contig sequence file: /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/input/seq.1.fa

copied merge/SIM06-contigs.fa to sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/input/seq.1.fa and reran.

Result: WARNING: contig comp7_c0_seq1 not present in alignment file Error while searching for candidate contigs: Error reading contig sequence file: CGAACTCCGGGAGCCAGGAAGTACACTGCTTGCAAGACGCCTTTGCAGCCTGCTCCCTCC CANDIDATE IDENTIFICATION FAIL ERROR: while running /usr/bin/python /home/ben/gsc/barnacle-1.0.0/src/alignment_processing/identify_candidate_contigs.py /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/output/seq.1.psl /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/barnacle/ver_1.0.0.0/local_cid/job_1/ split-candidates num-aligns 500 min-identity 40 genes gene-coords /home/ben/gsc/barnacle-1.0.0/annotations/ensembl65_ref.exons.introns.std_chr.bed single-align 0.999 merge-overlap 0.8 smart-chooser maintain-pared-groups ctg-rep 0.85 no-mito log-file /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/barnacle/ver_1.0.0.0/SIM06.barnacle.log gap-candidates ctg-file /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/merge/cluster/SIM06-contigs/input/seq.1.fa gap-realigner /home/ben/gsc/barnacle-1.0.0/src/alignment_processing/gap_realigner gap-min-size 4 gap-min-identity 0.95 gap-min-fraction 0.3 gap-max-len 50000 gap-config /home/ben/gsc/barnacle-1.0.0/sample_data/SIM06/Assembly/Trinity/barnacle/ver_1.0.0.0/SIM06.barnacle.gap.cfg: 1

The problem appears to be that seq1.fa holds its sequences on multiple lines. The following script puts the sequences on single lines: awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < seq1.fa

then copied the resulting output back to seq1.fa

ScrewTurn Wiki version 3.0.4.560. Some of the icons created by FamFamFam.