In addition to posted FAQs and help issues, we are happy to add
any general questions about using the web page and interpreting the data. Send your questions via our comments
page and we'll post the answers here.
How do I use this database?
Queries can be made in several different ways. On the front page, the code for a specific gene may be entered. Alternately, it is possible to explore the sequences in a particular region by clicking on the chromosome map at the bottom. Some of our other organism-specific pages offer alternative ways to access the data which will be implemented here as soon as possible.
What is SBS?
SBS stands for sequencing by synthesis, a technique invented and commercialized by Illumina, Inc. of Hayward, California. The millions of short reads produced from a single channel of the sequencing reaction are a perfect technological match for deep sequencing of small RNAs or the short tags of PARE. Since most small RNAs are 21-24 nt or the tags or PARE are 20 or 21 nt, but SBS reads are typically longer 26 to 35 nt, or more, we trim off the adapter at the 3' end before we display the data.
Where do I get started?
A good place to start is the chromosome viewer, where you can surf the chromosomes and see how the tags align with genomic data. Also, read through these FAQs to get a better understanding of how the data work and what tools we offer.
How does the chromosome viewer work?
The viewer is launched from the image of the chromosomes, under the basic query page. Clicking on these chromosomes takes you to a second level view, and allows you zoom one more level to a 100 kb region. In this final page, the viewer an image above indicating the location of the current image relative to the entire chromosome, the centromere, and the telomeres. You can click on this image to display a different region of the chromosome. Below, the image shows the ORFs on the strand (Watson) in red, and those on the bottom strand (Crick) in blue. The tRNAs are green, rRNAs are beige, snRNAs are dark purple, other RNA genes are gray, transposons are yellow and LTRs are fuchsia. The 5' ends of the top and bottom strands are indicated in red on the left and right ends, respectively. A legend is provided at the bottom of the page. The chromosomal coordinates are listed at the left and right of each line (in multiples of 20,000 bp).
We are extremely grateful to Mike Cherry and his group at the Saccharomyces Genome Database.
They provided the source code for their SAGE viewer. We re-wrote this in PHP and modified
it for N. benthi, but couldn't have done it without using their code as a guide.
How can I determine if my favorite gene is expressed in the libraries?
This is relevant only when there is mRNA expression data on the site, which not all of our sites have. If you see colored triangles pointing left and right, that's mRNA data. There are four entry points for mRNA analysis. 1) Enter the gene identifier in the basic query page. This will take you to the chromosome viewer, showing that gene and the flanking genomic DNA. 2) Enter a BAC clone in the basic query page. This will take you to that clone in the chromosome viewer. 3) Enter the sequence on the Query by Sequence page. This will extract the potential tags from your gene sequence and compare them against the database. 4) Find the gene in the chromosome viewer by clicking through to its physical location.
Why are sequence frequencies normalized and how does this work?
Sequences are normalized to a nice round number like 1 million or 2 million to facilitate comparisons among libraries. The number of sequences per library depends on sequencing results, and in our case the total number of sequences has been ~2 million per library. The expression of a gene is measured by the determining the abundance of tags derived from the transcript in a given library. Normalization is necessary to ensure that comparisons across libraries accurately reflect biological differences and not merely differences in the total number of tags sequenced.
How precise is SBS?
SBS is a digital method of analyzing gene expression or small RNAs; in other words, it is based on a direct count of sequences from a given cDNA library, using sequence tags to determine the abundance of cognate transcripts in the library. In digital expression analyses, comparisons between large libraries facilitate the detection of significant differential expression for genes expressed at low levels. It is far better at detecting low levels of transcripts than RNA gel blots, simply because it's possible to sequence so deeply.
Is the data freely available, and what do I do if I want to publish with it?
The data is absolutely free and publicly available. This web page and the data contained therein represent your tax dollars at work. The research is funded by the National Science Foundation, Plant Genome Research Program.
Please tell the funding agency and your congressman if you think it's worthwhile (or not). You are welcome to publish using this data, but we'd like to know, only so that we can measure the utility of the data! The more people use it, and publish with it, the better. Please send us an email if you have found the data to be useful. And please cite one of our publications that describe this work. The most appropriate paper to cite might be our description of the website in Plant Physiology (Nakano et al., 2019).
What do I do if I have a tissue for which I want to get SBS data?
Please contact Illumina (www.illumina.com). This is their business, and they perform the sequencing for a fee.