I am looking forward to be able to download normalized expression data for each cluster in the future

I am looking forward to be able to download normalized expression data for each cluster in the future.

We’re working on implementing this feature right now! It should be available in the next 2 weeks. I’ll keep you posted - I’d love to hear your feedback when the feature is live :slight_smile:

To know which genes have high expression levels, should we just average and compare gene by gene?

I think it depends on the specific case. Could you please provide more details? (e.g. Are you trying to get genes with high expression levels in each cluster? Are you referring to converting Ensembl IDs to gene symbols?)

Thank you, Sara! I want to get genes with high expression levels in each cluster, so I was wondering if we could average the rows of the matrix obtained from this feature and compare them.

I would like to extract a list of genes that are highly expressed in a particular cluster without comparing it to all other clusters.

Hi Haruki. Yes, if you would like to get highly expressed genes in a particular cluster using this feature you could:

  • download the normalized expression matrix using the “Subset by clusters” selector to select the cluster of interest;
  • average the rows of the matrix: rows are Ensembl IDs, so in this case it makes sense to average the expression of the Ensembl IDs corresponding to the same gene symbol.

Does this answer your question?

Thank you very much! This is exactly what I needed to find!

This may be a future request, but I think when we do GSEA, we need a file called “Phenotype Data Formats” to identify the groups.


I think this is the file that shows the assignment of groups when we want to compare treatments A and B, for example. Still, I do not think it may be simple to separate the two groups using the “Normalized Expression Matrix” in Cellenics. Please let me know if there is a good way to do this.

If I understand correctly, you need the Normalized Expression Matrix with samples ordered according to a given group (which we call metadata).
This is because when you do GSEA, the order of samples in the normalized matrix needs to match the order of samples in the “Phenotype Data Formats” file, right?
In this case, I would suggest to download first the Normalized Expression Matrix for, let’s say, treatment A (using the “Subset by metadata group” selector), then download a different matrix for treatment B, and then bind together the two matrices (you can do that in R, for example. It’s quite easy, let me know if you need suggestions on how to do that).
In this way, the samples in the normalized expression matrix would be separated by metadata group.

Let me know if this helps or if you have any other questions!

1 Like