edu.upenn.gloDB.io
Class FASTATrack

java.lang.Object
  extended by edu.upenn.gloDB.io.FASTATrack
All Implemented Interfaces:
DataFile, TrackFile

public class FASTATrack
extends java.lang.Object
implements TrackFile

Import Track data from a FASTA file. The basic file format dictates a header line at the beginning of each Feature. There are no standards as to what the header line should contain or how it should be formatted, other than to stipulate that it begins with a ">". Thus this format sufficient for coding Sequence objects but not ideal for sequence annotations (Tracks). Since some sites, such as www.fruitfly.org, release annotations as FASTA files, some attempt has been made to parse the headers from specific sites. Users can use the FASTAParser interface to create their own header parsers as well. Here the default is FASTAParserFly.

Version:
$Id: FASTATrack.java,v 1.1.2.21 2007/03/01 21:17:33 fisher Exp $
Notes:
Can we assume that the header starts with a Sequence ID?, SaveTrack() looks for 'ID', 'descriptors', 'dbxref', 'strand', 'source', and 'boundaries' in the Feature attributes and processes these uniquely. In particular, 'boundaries' is discarded because it's assumed to be the same as Feature.start and Feature.stop. If 'source' is also discarded if it's the same as Feature.getSource().getID(). 'strand' is used in creating 'gene_boundaries' and similarly discarded. the 'descriptors' and 'dbxref' labels are not included in the output, but their HashMap values are included.

Nested Class Summary
private  class FASTATrack.FASTAFilter
          FASTA specific FileFilter.
 
Field Summary
private  java.lang.String DESC
           
private  java.lang.String[] EXT
           
private  javax.swing.filechooser.FileFilter fileFilter
           
private  int ID
           
 
Constructor Summary
FASTATrack()
           
 
Method Summary
 java.lang.String getDesc()
          Get a description of the file type.
 java.lang.String[] getExt()
          Get an array of file extensions.
 javax.swing.filechooser.FileFilter getFileFilter()
          Get a FileFilter for use in the GUI.
 int getID()
          Get the file ID.
 Track load(java.lang.String filename)
          Load all Features in the FASTA file into a single Track and return the resulting Track object.
 Track load(java.lang.String filename, java.lang.String sourceID)
          Load all Features in the FASTA file into a single Track and return the resulting Track object.
 void save(java.lang.String id)
          Save the Track to a file based on it's ID.
 void save(java.lang.String id, java.lang.String filename, boolean overwrite)
          Save all Features in a FASTA file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ID

private final int ID
See Also:
Constant Field Values

DESC

private final java.lang.String DESC
See Also:
Constant Field Values

EXT

private final java.lang.String[] EXT

fileFilter

private final javax.swing.filechooser.FileFilter fileFilter
Constructor Detail

FASTATrack

public FASTATrack()
Method Detail

getID

public int getID()
Description copied from interface: DataFile
Get the file ID. FileIO contains constant values and string equivalents for built-in DataFiles.

Specified by:
getID in interface DataFile

getDesc

public java.lang.String getDesc()
Description copied from interface: DataFile
Get a description of the file type. This description will be used in the file chooser.

Specified by:
getDesc in interface DataFile

getExt

public java.lang.String[] getExt()
Description copied from interface: DataFile
Get an array of file extensions. These extensions will be used by the file chooser.

Specified by:
getExt in interface DataFile

getFileFilter

public javax.swing.filechooser.FileFilter getFileFilter()
Description copied from interface: DataFile
Get a FileFilter for use in the GUI.

Specified by:
getFileFilter in interface DataFile

load

public Track load(java.lang.String filename)
Load all Features in the FASTA file into a single Track and return the resulting Track object. If possible, a Sequence object will be loaded/created for each Feature from the FASTA file.

Specified by:
load in interface TrackFile

load

public Track load(java.lang.String filename,
                  java.lang.String sourceID)
Load all Features in the FASTA file into a single Track and return the resulting Track object. If a Sequence is given, then that will be used as the source file all Features in the file, otherwise a Sequence object will be loaded/created for each Feature from the FASTA file. The header is parsed using FASTAParserFly. An ExactFeature object is created with the start and stop positions taken from the "boundaries" key:value pair. The parsed header is stored in the AbstractFeature.attributes field of the ExactFeature object. If the file is empty then returns 'null'. If this can't get a valid Sequence ID from the user or the Feature's header, then can't be associated with any existing Sequence and so this will create a Sequence with it's best guess at the Sequence ID. However, this isn't very useful because it's not likely that other Features will share this Sequence. There's also no capacity to load this Sequence data later, so the Sequence data is load here as well, which is very inefficient.

Specified by:
load in interface TrackFile
Notes:
When skipping a Feature because the Sequence data loaded doesn't contain the correct range, should we discard the loaded Sequence or leave it in the sequencePool?, I'm not sure how the position information is formatted., Need to throw FileIO exceptions, rather than just print errors.

save

public void save(java.lang.String id)
Save the Track to a file based on it's ID. This will overwrite any existing file. This will append ".fasta" to the filename.

Specified by:
save in interface TrackFile

save

public void save(java.lang.String id,
                 java.lang.String filename,
                 boolean overwrite)
Save all Features in a FASTA file.

Specified by:
save in interface TrackFile
Notes:
need to throw FileIO exceptions, rather than just print errors., How should the attributes be formatted? Should we remove 'ID', 'descriptors', 'dbxref', 'strand', 'source', and 'boundaries' from the header since these were most likely added when we created the header?



Copyright 2012 Stephen Fisher and Junhyong Kim, University of Pennsylvania. All Rights Reserved.