SchizoScholar Pombe: Shell Script To Prefix All Files in a Directory WIth The Directory Name

I'm currently analysing multiple RNA-seq fastq files for members of my laboratory. My pipeline for analysing this data currently takes all the fastq files from the specified directory and distributes each as a separate job to a node on our cluster. Quality control is performed by FASTQC, each sample is aligned to the reference genome using Tophat, and a count table is produced using HTSeq-count. The results for each sample are output into a separate directory named after the fastq file. The issue is that the contents of each folder are:

Directory:
tophat_alignment_1511_6.fastq
Files:
accepted_hits.bam
accepted_hits.txt
deletions.bed
insertions.bed
junctions.bed
prep_reads.info
unmapped.bam

with no indication of the sample they came from. This is not a problem for me as my subsequent pipelines use the directory name when importing samples, however as this analysis will be used by many different members of the lab long after I have left there is a potential for confusion. To remedy this I wrote the script below which takes the directory name, for example tophat_alignment_1511_6.fastq, cuts out the sample name 1511_6 and prefixes it to each file name in that directory. I kept the script simple so that it was easy to read and test before implementation - as it recursively moves through directories renaming files a bug has the potential to cause massive problems.

for i  in /SEQDATA/RNASEQ/my_mutants/Alignments/*
do
cd $i
for x in $i/*
do
FNAME=$(basename $x)
DNAME=$(dirname $x)
SAMPLENAME=`echo $DNAME | sed "s/^.*tophat_alignment_\(.*\)\.fastq$/\1/"`
NEW=`echo $DNAME"/"$SAMPLENAME"."$FNAME`
OLD=$x
COMMAND=`echo "mv" $OLD $NEW`  
eval $COMMAND
done
done
cd /SEQDATA/RNASEQ/my_mutants/Alignments/

Returning:
1511_6.accepted_hits.bam
1511_6.accepted_hits.txt
1511_6.deletions.bed
1511_6.insertions.bed
1511_6.junctions.bed
1511_6.prep_reads.info
1511_6.unmapped.bam

SchizoScholar Pombe

Wednesday, 15 June 2016

Shell Script To Prefix All Files in a Directory WIth The Directory Name

No comments:

Post a Comment

About Me