Description
The ENCODE4 long-read RNA-seq collection annotates trancripts using numerical triplets representing
the identity of the start site, exon junction chain, and transcript end site of each transcript.
This method reveals how promoter selection, splice pattern, and 3’ processing are deployed across
mouse tissues.
Display Conventions
Transcript names include a triplet annotation that represents transcript start site, exon junction
chain, and transcript end site. For example, if transcript A has the label [1,2,3] and transcript B
is labeled [1,1,3], then those transcripts share start and end sites but have a different combination
of exons. Here is an exmaple drawn from hg38 at the INSIG1 locus:
In this example, the first two transcripts marked by arrows have the same start
site ("1") and the same set of exons ("8"), but they have different end sites
("2" vs "1"). Similarly, the second two marked transcripts have the same start
site ("1"), but a different set of exons ("8" vs "9") and a different end site
("1" vs "2").
GENCODE V29 and V40 were used as reference data; any transcript not present in either of these is
colored blue.
Mouseover on transcripts shows their ENCODE gene ID and the tissue or cell line where it’s most highly
expressed, and its TPM in that sample.
Data Access
The raw data can be explored interactively with the
Table Browser or the
Data Integrator.
For automated analysis, the data may be queried from our
REST API.
The data underlying this track is available in the file
encode4LongRna.bb.
Individual regions or the whole genome annotation can be obtained using our
tool bigBedToBed, which is available on our
download server.
For example, to extract only annotations in a given region, you could use the following command:
bigBedToBed -chrom=chr1 -start=100000 -end=100500 https://hgdownload.gi.ucsc.edu/gbdb/mm10/encode4LongRna.bb stdout
Please refer to our
mailing list archives
for questions, or our
Data Access FAQ
for more information.
Methods
Data were retrieved from https://zenodo.org/records/15116042.
The mouse_ucsc_transcripts.gtf was converted to BED format, and expression and CDS data
added from the relevant files using a custom script.
Credits
Thanks to Fairlie Reese for providing data access and for helpful feedback.
References
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D,
Razavi-Mohseni M, Jiang Y et al.
The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure
diversity.
bioRxiv. 2023 May 16;.
PMID: 37292896; PMC: PMC10245583
|