edu.upenn.gloDB
Class Sequence

java.lang.Object
  extended by edu.upenn.gloDB.Sequence

public class Sequence
extends java.lang.Object

Sequence.

Version:
$Id: Sequence.java,v 1.30.2.20 2007/03/01 21:17:33 fisher Exp $

Field Summary
private  java.util.HashMap attributes
          Metadata related to the sequence.
private  byte[] cData
           
private  java.lang.String data
          The sequence raw data as an unformatted string.
private  int dataLength
           
private  boolean dataLoaded
          This is a flag for whether data has been loaded.
private  SequenceLoader dataLoader
          This is the object that will handle getting the data for this Sequence.
static int FORMAT_WIDTH
          This is the number of characters to print out per line when formatting the output in getDataFormatted().
private  java.lang.String id
          This is a unique name for the sequence, that is used by the parser to identify the sequence.
private  java.util.HashMap loaderArgs
          This is a map of key:value pairs needed to load the data from the data source, as defined by 'dataLoader': URL, file, database, etc.
private  int offset
          This is the starting position for this Sequence on the chromosome.
private static java.util.Random random
          Used to create random ids.
static boolean USE_COMPRESSION
          When true then sequence data will be stored in the Sequence object in a compressed format.
 
Constructor Summary
Sequence()
          Create a new Sequence object and add it to the set of Sequence objects.
Sequence(boolean addToPool)
          Create a new Sequence object and add the newly created Sequence object to the set of sequence objects if addToPool is true.
Sequence(boolean addToPool, java.lang.String id)
          Create a new Sequence object and add the newly created Sequence object to the set of sequence objects if addToPool is true.
Sequence(java.lang.String id)
          Create a new Sequence object with the specified id, and add it to the set of Sequence objects.
 
Method Summary
 void addAttribute(java.lang.Object key, java.lang.Object value)
          Add a sequence attribute.
 void addLoaderArg(java.lang.Object key, java.lang.Object value)
          Add a sequence parserArg.
 boolean contains(Feature feature)
          Returns 'true' if 'feature' is contained in this Sequence object.
 boolean contains(int pos)
          Returns 'true' if the position 'pos' is contained in this Sequence object.
 boolean containsAttribute(java.lang.Object key)
          Returns true if attribute 'key' exists.
 void delAttribute(java.lang.Object key)
          Remove an attribute.
 java.lang.Object getAttribute(java.lang.Object key)
          Get a sequence attribute.
 java.util.HashMap getAttributes()
          Get the sequence attributes.
 java.lang.String getData()
          Returns the Sequence data as a single unformatted string.
 java.lang.String getDataBounded(int min, int max)
          Returns the sequence data between position '(min-1)' and position 'max'.
 java.lang.String getDataBoundedFormatted(int min, int max)
          Returns the bounded sequence data with "\n" inserted every FORMAT_WIDTH characters.
 java.lang.String getDataFormatted()
          Returns the sequence data with "\n" inserted every FORMAT_WIDTH characters.
 SequenceLoader getDataLoader()
          Returns the parser for the Sequence source.
 java.lang.String getID()
          Get the id.
 java.lang.Object getLoaderArg(java.lang.Object key)
          Get a sequence parserArg.
 java.util.HashMap getLoaderArgs()
          Get the sequence loaderArgs.
 int getMax()
          Returns the maximum position of the Sequence on the chromosome.
 int getMin()
          Returns the initial position of the Sequence on the chromosome.
 int getOffset()
          Returns the Sequence starting position on the chromosome.
 int getType()
          Returns Feature type (see GloDBUtils)
 boolean isDataLoaded()
          Returns true if data was loaded.
 int length()
          Returns the length of the data string.
 java.lang.String loadData()
          This will load the data from 'dataLoader' if data is currently empty.
static java.lang.String randomID(java.lang.String base)
           
 void reloadData()
          This will load the data from 'dataLoader' if overwriting the current value of data.
 void setAttributes(java.util.HashMap attributes)
          Set the sequence attributes.
 void setData(java.lang.String data)
          Set the Sequence data, expecting a single unformatted string.
 void setDataLoader(SequenceLoader dataLoader)
          Set the Sequence source parser.
 void setLoaderArgs(java.util.HashMap loaderArgs)
          Set the sequence loaderArgs.
 void setOffset(int offset)
          Set the Sequence starting position on the chromosome.
 java.lang.String toString()
          Returns attributes information.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

FORMAT_WIDTH

public static int FORMAT_WIDTH
This is the number of characters to print out per line when formatting the output in getDataFormatted().

Notes:
This should probably be a user adjustable parameter.

USE_COMPRESSION

public static boolean USE_COMPRESSION
When true then sequence data will be stored in the Sequence object in a compressed format. When false, the data will be stored as a String object.

Notes:
This should probably be a user adjustable parameter.

id

private java.lang.String id
This is a unique name for the sequence, that is used by the parser to identify the sequence. This can not be changed to preserve Feature references.


dataLoader

private SequenceLoader dataLoader
This is the object that will handle getting the data for this Sequence. The object referenced by dataLoader will use the values in 'loaderArgs' and return the data as a String.


loaderArgs

private java.util.HashMap loaderArgs
This is a map of key:value pairs needed to load the data from the data source, as defined by 'dataLoader': URL, file, database, etc.


dataLoaded

private boolean dataLoaded
This is a flag for whether data has been loaded. It's possible that data was 'loaded' from a source that returned an empty string.


offset

private int offset
This is the starting position for this Sequence on the chromosome. If the Sequence is a chromosome, then offset will be 0.


attributes

private java.util.HashMap attributes
Metadata related to the sequence. ex: source, locus, accession no., version, GI, protein_id. Should any of these be hardcoded as fields?


data

private java.lang.String data
The sequence raw data as an unformatted string. The data is not loaded by default, rather it is loaded when the user performs an operation that requires the data. Note that concatination operations should be done on StringBuffer objects with the results stored as a String, since Strings are immutable and thus converted to StringBuffers during the operations. This is particularly important when loading data from a file which might entail a lot of concatinations.


cData

private byte[] cData

dataLength

private int dataLength

random

private static java.util.Random random
Used to create random ids.

Constructor Detail

Sequence

public Sequence()
Create a new Sequence object and add it to the set of Sequence objects.


Sequence

public Sequence(java.lang.String id)
Create a new Sequence object with the specified id, and add it to the set of Sequence objects.


Sequence

public Sequence(boolean addToPool)
Create a new Sequence object and add the newly created Sequence object to the set of sequence objects if addToPool is true.

Notes:
This should probably be 'protected' instead of 'public' because all Sequences should really be added to sequencePool.

Sequence

public Sequence(boolean addToPool,
                java.lang.String id)
Create a new Sequence object and add the newly created Sequence object to the set of sequence objects if addToPool is true.

Notes:
This should probably be 'protected' instead of 'public' because all Sequences should really be added to sequencePool.
Method Detail

getType

public int getType()
Returns Feature type (see GloDBUtils)


getID

public java.lang.String getID()
Get the id.


setDataLoader

public void setDataLoader(SequenceLoader dataLoader)
Set the Sequence source parser.


getDataLoader

public SequenceLoader getDataLoader()
Returns the parser for the Sequence source.


setLoaderArgs

public void setLoaderArgs(java.util.HashMap loaderArgs)
Set the sequence loaderArgs.


getLoaderArgs

public java.util.HashMap getLoaderArgs()
Get the sequence loaderArgs.


addLoaderArg

public void addLoaderArg(java.lang.Object key,
                         java.lang.Object value)
Add a sequence parserArg.


getLoaderArg

public java.lang.Object getLoaderArg(java.lang.Object key)
Get a sequence parserArg.


isDataLoaded

public boolean isDataLoaded()
Returns true if data was loaded.


setOffset

public void setOffset(int offset)
Set the Sequence starting position on the chromosome.


getOffset

public int getOffset()
Returns the Sequence starting position on the chromosome.


setData

public void setData(java.lang.String data)
Set the Sequence data, expecting a single unformatted string. This will set the dataLoaded flag to 'true'.


getData

public java.lang.String getData()
Returns the Sequence data as a single unformatted string.


setAttributes

public void setAttributes(java.util.HashMap attributes)
Set the sequence attributes.


getAttributes

public java.util.HashMap getAttributes()
Get the sequence attributes.


addAttribute

public void addAttribute(java.lang.Object key,
                         java.lang.Object value)
Add a sequence attribute.


delAttribute

public void delAttribute(java.lang.Object key)
Remove an attribute.


containsAttribute

public boolean containsAttribute(java.lang.Object key)
Returns true if attribute 'key' exists.


getAttribute

public java.lang.Object getAttribute(java.lang.Object key)
Get a sequence attribute.


reloadData

public void reloadData()
This will load the data from 'dataLoader' if overwriting the current value of data. If dataLoader is null, then this won't do anything.


loadData

public java.lang.String loadData()
This will load the data from 'dataLoader' if data is currently empty. If data is not empty, then this won't do anything. This method is called internally whenever data is used, so the user should never need to call this method. We return the uncompressed data, because in some instances, the method calling loadData() requires uncompressed data. If we use setData() and getData(), then the data will be compressed and then uncompressed.


length

public int length()
Returns the length of the data string. If the dataLoader isn't set and thus no data is loaded, then will return -1.


getMin

public int getMin()
Returns the initial position of the Sequence on the chromosome. This will return the same value as getOffset().


getMax

public int getMax()
Returns the maximum position of the Sequence on the chromosome. If the dataLoader isn't set and thus no data is loaded, then will return -1.


contains

public boolean contains(int pos)
Returns 'true' if the position 'pos' is contained in this Sequence object.


contains

public boolean contains(Feature feature)
Returns 'true' if 'feature' is contained in this Sequence object. This will return 'false' if the Feature's source ID doesn't match this Sequence's ID.


getDataBounded

public java.lang.String getDataBounded(int min,
                                       int max)
Returns the sequence data between position '(min-1)' and position 'max'. Goes from ((min-1) to max) because java Strings go from (0 to (length-1)) and the actual position data assumes (1 to length)

Parameters:
min - the starting position
max - the ending position

getDataBoundedFormatted

public java.lang.String getDataBoundedFormatted(int min,
                                                int max)
Returns the bounded sequence data with "\n" inserted every FORMAT_WIDTH characters.


getDataFormatted

public java.lang.String getDataFormatted()
Returns the sequence data with "\n" inserted every FORMAT_WIDTH characters.


randomID

public static java.lang.String randomID(java.lang.String base)

toString

public java.lang.String toString()
Returns attributes information. The data isn't included here. To get the data use getData() or getDataFormatted().

Overrides:
toString in class java.lang.Object



Copyright 2012 Stephen Fisher and Junhyong Kim, University of Pennsylvania. All Rights Reserved.