Cell surface proteins and messenger RNA (mRNA) expression are coupled via a highly complex co-regulatory network which is nearly intractable analytically. Cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) enables high-throughput profiling of both the mRNA and surface proteins expression in single cells, giving new possibilities to study mRNA-protein linkage at finer granularity.
To investigate the connection between the paired mRNA and surface protein modalities in CITE-seq data, we formulate this problem as a regression task where single-cell mRNA expression is taken as input to predict surface protein expression level. In all, the trained AutoEncoder model achieves elementary level of prediction performance in linking paired mRNA and surface proteins profiles with 0.4 pearson correlation agreement between the predicted and measured protein expression levels; the architecture has the potential to be used in studying the complex regulation of protein expression.
- Set the parameters to be swept in the header lines of param_search.py.
- Specify the data version in the calls to
CovidDataset
in functionget_data_loaders
. - Run the script: param_search.py.
Set the parameter output_activation
in AutoEncoder()
call to one of:
- Linear:
'linear'
- Sigmoid:
'sigmoid'
Set the parameter normalization_method
in CovidDataset()
call to one of:
- MinMax Linear Scaling between 0 and 1:
'minmax'
- No normalization:
None
During parameter search, the normalization_method
can be set through the utility function get_data_loaders
in param_search.py.
Note
Normalization is performed on train
split only; the same normalization parameters are use to transform the test
and valid
splits.
Set the parameter input_type
in CovidDataset()
call to one of:
- Log normalized expression:
'norm'
- Raw expression:
'raw'
During parameter search, the input_type
can be set through the utility function get_data_loaders
in param_search.py.