generated from alshedivat/al-folio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f69bf2d
commit 4bed4d7
Showing
7 changed files
with
199 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
layout: page | ||
title: RNA structure prediction | ||
description: sampling & clustering | ||
img: assets/img/SAMIII.png | ||
importance: 3 | ||
category: research | ||
--- | ||
|
||
A RiboNucleic Acid (RNA) can form complex structure through intra-molecular base-pairing. Some classes of RNAs can regulate biological functions by changing its conformations. An example is illustrated below. | ||
<p align="center"> | ||
<img width="350" src="https://irisyoon.com/assets/img/SAMIII_conformation1.png"> | ||
<img width="350" src="https://irisyoon.com/assets/img/SAMIII_conformation2.png"> | ||
</p> | ||
|
||
|
||
Identifying multiple structures of a RNA can bring therapeutic advancements for RNA viruses. A popular approach is to sample low-energy structures from the nearest neighbor thermodyanmic model. Most algorithms follow the general flow of <b>sampling</b>, <b>clustering</b>, and reporting <b>cluster representatives</b>. | ||
|
||
I worked on improving the <b>clustering</b> aspect of an RNA structure prediction algorithm called <a href="https://github.com/gtDMMB/RNAStructProfiling">profiling</a>. The current method resulted in too many clusters with negligible biological difference. I proposed algorithmic ways to identify clusters that should be merged based on structural similarity. The enhanced version of profiling is under development by Georgia Tech <a href="https://github.com/gtDMMB">Discrete Mathematics and Molecular Biology</a> group. | ||
|
||
I also examined the prospect of using current methods to identify new multimodal RNAs. I found that there is a class of RNAs (kinetic riboswitches) that is difficult to detect from current sampling methods. I proposed a simple co-transcription simulation method to identify multimodality of such RNAs. The results have been published in this <a href="https://www.researchgate.net/publication/337314911_Towards_an_understanding_of_RNA_structural_modalities_a_riboswitch_case_study">paper.</a> | ||
|
||
*Georgia Tech (2018-2019), joint work with <a href="https://sites.google.com/site/christineheitsch/">Christine Heitsch</a> (Georgia Tech) and <a href="https://ribosnitch.bio.unc.edu/">Alain Laederach</a> (UNC).* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
--- | ||
layout: page | ||
title: fake news detector | ||
description: using BERT & transfer learning | ||
img: assets/img/wordcloud_real_news_titles.png | ||
importance: 3 | ||
category: data science | ||
--- | ||
|
||
During my 5 week fellowship at <a href="https://www.correlation-one.com/ds4a">Data Science for All Women's Summit</a> (Fall 2020), my teammates and I built a fake news detector using various natural language processing tools such as embeddings, RNN, BERT, and transfer learning. We performed careful preprocessing to remove biases in the dataset, and we used a model interpretability tool called LIME to identify points of improvement for our model. | ||
|
||
Take a look at our <a href="https://github.com/s-chrodinger/fake-news-detection">code</a> on GitHub! | ||
|
||
<iframe src="//www.slideshare.net/slideshow/embed_code/key/f5TLIG5Ag7wuGg" width="595" height="400" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> </div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
layout: page | ||
title: Sheaf theory for data science | ||
description: Applications of persistent sheaf cohomology. | ||
img: assets/img/PointCloudExample.png | ||
importance: 4 | ||
category: research | ||
--- | ||
|
||
|
||
For my PhD dissertation, I worked on applications of topology to data science. I used <b>cosheaves</b> and <b>spectral sequences</b> to compute <b>persistence</b> in a distributed manner. I applied such distributed computation to study <b>multi-density data</b> and recovered the information lost in persistence diagrams. | ||
|
||
For example, consider the following point cloud and its coresponding persistence diagram in dimension one. | ||
|
||
<p align="center"> | ||
<img width="350" src="https://irisyoon.com/assets/img/PointCloudExample.png"> | ||
<img width="350" src="https://irisyoon.com/assets/img/PD.png"> | ||
</p> | ||
|
||
|
||
|
||
By observing the persistence diagram, one would conclude that there is one significant feature. However, one can see from the point cloud that there are small but significant features that are densely sampled. My construction of distributed computation allows one to identify such significant features that are neglected by traditional methods. | ||
|
||
Here is a 30 minute video of my presentation at <a href="https://www.ima.umn.edu/2017-2018/SW5.21-25.18/27292">IMA special workshop on Bridging Statistics and Sheaves.</a> | ||
|
||
The paper can be found on <a href="https://arxiv.org/abs/2001.01623">arXiv.</a> Here is a copy of my <a href="https://repository.upenn.edu/edissertations/2936/">PhD dissertation.</a> | ||
|
||
|
||
*University of Pennsylvania (2013-2018), PhD dissertation. Joint work with <a href="https://www.math.upenn.edu/~ghrist/">Robert Ghrist</a> (U. Penn).* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
layout: page | ||
title: classical musicians recommender | ||
description: | ||
img: assets/img/smallgraph.png | ||
importance: 1 | ||
category: data science | ||
--- | ||
|
||
I built a recommendation system for classical music performers. The recommender is based on the idea that musicians with frequent collaborations likely have similar performance styles. It first creates a graph of classical musicians and their collaborations and uses node2vec embeddings to find vector representations of the musicians. Given a list of users' favorite artists, the recommender uses similarity of the vector representations to recommend artists that a user may enjoy. | ||
|
||
Checkout the app at <a href="https://musicians-rec.herokuapp.com">https://musicians-rec.herokuapp.com</a> | ||
Code: <a href="https://github.com/irishryoon/musicians_recommendation">github</a> | ||
Blog post: <a href="https://medium.com/@irishryoon/classical-musicians-recommender-22ee176daee8">medium</a> | ||
For interactive exploration of the artist graph, click below | ||
|
||
[<center><img src="http://irisyoon.com/assets/img/graph.png" height ="400"></center>](http://irisyoon.com/musicians_recommendation/graph_80000/graph_visualization/network/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
layout: page | ||
title: topology & neuroscience | ||
description: A topological approach to neural encoding | ||
img: assets/img/neuro_image.png | ||
importance: 1 | ||
category: research | ||
--- | ||
|
||
What does it mean for there to be circular structures in neural activity? | ||
|
||
Consider a collection of images with different orientations shown in Figure 1A. The red and dark orange images have high similarity, whereas the red and green images have low similarity. What would happen if we arranged the eight images in a way that respects this similarity? The images would be arranged in a circular fashion, as shown in Figure 1B. | ||
|
||
|
||
<p align="center"> | ||
<img width="750" src="https://irisyoon.com/assets/img/neuro_cyclic_structures.png"> | ||
</p> | ||
<div class="caption"> | ||
Figure 1. Circular structure of data. <span style="font-weight:bold">A.</span> Collection of images. <span style="font-weight:bold">B.</span> An arrangement of images based on the similarity of orientation reveals a circular structure. <span style="font-weight:bold">C.</span> Collection of neural activities (spike trains). <span style="font-weight:bold">D.</span> Consider two spike trains to be similar if the vertical lines are well-aligned after "sliding" one spike train by a small amount. An arrangement of neural activity based on spike train similarity reveals a circular structure. | ||
</div> | ||
|
||
|
||
Similarly, consider a collection of neural activities (called spike trains) shown in Figure 1C. Each row indicates the activity of a single neuron over some time period. The vertical line indicates the neuron's firing at a corresponding time. If we observe the neuron for, say $$ M $$ time intervals, then each spike train is a binary vector in $$ \mathbb{R}^M $$. Given two spike trains $$ s_1 $$ and $$ s_2 $$, let's measure similarities between two spike trains as the amount one needs to "slide" $$ s_1 $$ to "match" with $$ s_2 $$. Then, the red spike train is similar to the dark orange spike train, but it is quite dissimilar to the green spike train. Again, if we were to arrange the spike trains in a way that respects this similarity, we would arrange them in a cyclic manner (Figure 1D). | ||
|
||
Now, suppose there are many images and long spike trains that we cannot make the arrangements by hand. How would a computer recognize that these high-dimensional data contain cyclic structures? Let $$ P $$ denote the point cloud representing a system of interest, such as the collection of stimulus or neural activity. We calculate the similarity between every pair of elements in the system. We construct a representation of the system as we vary the similarity level by a sequence of simplicial complexes. The loops in this sequence are summarized by a persistence diagram, where the points far from the diagonal represent the large loops. See the following figure. | ||
|
||
<p align="center"> | ||
<img width="750" src="https://irisyoon.com/assets/img/neural_PH.png"> | ||
</p> | ||
<div class="caption"> | ||
Figure 2. Detecting circular structures from high dimensional data. Given the data (either images or spike trains), we first compute a matrix encoding pairwise similarity between all elements in the system. We then create the sequence of simplicial complexes that represents the connectivity of the system at various similarity values. Finally, we summarize the loops in the simplicial complexes using a persistence diagram. Points far from the diagonal represent significant structures. | ||
</div> | ||
|
||
|
||
So far, we have seen that persistence diagrams can indicate if a collection of images or spike trains contain circular structures. Consider a hypothetical experiment in which we present a stimulus video to a mouse while measuring its neural activity. Let's assume that the persistence diagram indicates that there are two circular structures in the stimulus and one circular structure in the neural activity. Is the unique circular feature in the neural activity reflecting one of the circular features in the stimulus? If so, which one? Such questions are topological manifestations of fundamental problems in neuroscience called neural encoding that study how neurons represent information. | ||
|
||
|
||
<p align="center"> | ||
<img width="450" src="https://irisyoon.com/assets/img/encoding.png"> | ||
</p> | ||
<div class="caption"> | ||
Figure 3. Neural encoding, stated as a problem in topology. Consider an experiment in which we present some stimulus while measuring neural activity. The persistence diagrams indicate that there are two circular features in the stimulus while there is only one circular feature in the neural activity. Which feature of the stimulus is represented by the neurons? | ||
</div> | ||
|
||
To address the above questions, I developed a framework for comparing persistence diagrams called the <a href="https://arxiv.org/abs/2201.05190">analogous bars method</a>. | ||
|
||
The methods paper has been accepted in the Journal of Applied and Computational Topology, conditional on minor revisions. A follow-up paper implementing the method on simulated and experimental neuroscience datasets is under preparation. | ||
|
||
Preprint: <a href="https://arxiv.org/abs/2201.05190">Persistent Extension and Analogous Bars: Data-Induced Relations Between Persistence Barcodes</a>. | ||
|
||
Code: <a href="https://github.com/UDATG/analogous_bars">github: analogous bars</a> | ||
|
||
*Joint work with <a href="http://www.chadgiusti.com/">Chad Giusti</a> (U. Delaware) and <a href="https://www.math.upenn.edu/~ghrist/">Robert Ghrist</a> (U. Penn).* | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
layout: page | ||
title: Philadelphia Bail Fund | ||
description: | ||
img: assets/img/philly.png | ||
importance: 2 | ||
category: data science | ||
--- | ||
From 2020 to 2021, I volunteered as a data scientist for <a href="https://codeforphilly.org/">Code for Philly</a>, specifically the <a href="https://www.phillybailfund.org/">Philadelphia Bail Fund</a>. The volunteers and I wanted to understand how the bail system of Philadelphia was affecting the citizens. The following lists a few questions that we wanted to address. | ||
|
||
* Which neighborhoods are most heavily impacted by the bail system? | ||
* How do the defendant's race and gender impact the bail amount? | ||
* Is there consistency across magistrates (the person who sets the bail)? That is, do two different magistrates set a similar bail amount for similar cases? | ||
|
||
The volunteers and I gathered new criminal filing records from the municipal court. We then performed various statistical analyses to address the above questions. One challenge of this analysis was that many variables were correlated, and we needed to control for the correlations. For example, if one magistrate is likely to handle more severe offenses than another, then one would have to take such differences into account when comparing the bail amounts set by the two magistrates. | ||
|
||
To that end, we used **topic modeling** on the criminal filing records to group cases into similar offense types and severity. We then performed a **matched study** to study if two magistrates set similar bail amounts given similar offense severity. We found that there is still a high variance in the bail amounts set across magistrates even after controlling for the difference in the offense type and severity. | ||
|
||
For more info, please visit the following app: <a href="https://codeforphilly-pbf-analysis-app-hzafyl.streamlitapp.com/">PBF app</a> | ||
Code: <a href="https://github.com/CodeForPhilly/pbf-analysis">github</a> | ||
|
||
<p align="center"> | ||
<img width="700" src="https://irisyoon.com/assets/img/pbf_magistrate.png"> | ||
</p> | ||
|