Long-read Transcripts ENCODE4 transcripts Track Settings
 
ENCODE4 Long read transcripts

Track collection: Transcripts and other data generated using long-read sequencing technology (PacBio and Oxford Nanopore)

+  Description
+  All tracks in this collection (1)

Display mode:   

Filter by counts per million: to (0 to 192148.13)
Label: Name of item    transcript name   

Display data as a density graph:
View table schema
Data last updated at UCSC: 2025-04-04 12:38:12

Description

The ENCODE4 long-read RNA-seq collection annotates trancripts using numerical triplets representing the identity of the start site, exon junction chain, and transcript end site of each transcript. This method reveals how promoter selection, splice pattern, and 3’ processing are deployed across mouse tissues.

Display Conventions

Transcript names include a triplet annotation that represents transcript start site, exon junction chain, and transcript end site. For example, if transcript A has the label [1,2,3] and transcript B is labeled [1,1,3], then those transcripts share start and end sites but have a different combination of exons. Here is an exmaple drawn from hg38 at the INSIG1 locus:

In this example, the first two transcripts marked by arrows have the same start site ("1") and the same set of exons ("8"), but they have different end sites ("2" vs "1"). Similarly, the second two marked transcripts have the same start site ("1"), but a different set of exons ("8" vs "9") and a different end site ("1" vs "2").

GENCODE V29 and V40 were used as reference data; any transcript not present in either of these is colored blue.

Mouseover on transcripts shows their ENCODE gene ID and the tissue or cell line where it’s most highly expressed, and its TPM in that sample.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API.

The data underlying this track is available in the file encode4LongRna.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which is available on our download server. For example, to extract only annotations in a given region, you could use the following command:

bigBedToBed -chrom=chr1 -start=100000 -end=100500 https://hgdownload.gi.ucsc.edu/gbdb/mm10/encode4LongRna.bb stdout

Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

Methods

Data were retrieved from https://zenodo.org/records/15116042. The mouse_ucsc_transcripts.gtf was converted to BED format, and expression and CDS data added from the relevant files using a custom script.

Credits

Thanks to Fairlie Reese for providing data access and for helpful feedback.

References

Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y et al. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. bioRxiv. 2023 May 16;. PMID: 37292896; PMC: PMC10245583