Skip to content

Commit

Permalink
dissertation submitted!
Browse files Browse the repository at this point in the history
  • Loading branch information
awxuelong committed Nov 12, 2023
1 parent 70c4087 commit b26cc13
Show file tree
Hide file tree
Showing 23 changed files with 398 additions and 16 deletions.
1 change: 1 addition & 0 deletions .vscode/ltex.hiddenFalsePositives.en-US.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"rule":"EN_UNPAIRED_BRACKETS","sentence":"^\\QThe simplest way would be to set up an intrinsic template like \"subject-verb-object\"^1 in order to offset portion of the probabilistic model's support^3 to not put any probability mass to structures other than \"subject-verb-object\".\\E$"}
4 changes: 3 additions & 1 deletion _bibliography/papers.bib
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ @article{anantonio2023
author={An, Xuelong and Vergari, Antonio},
year={2023},
poster={sassy-clevr-poster-beta.pdf},
preview={nesy_models_framework_verbeta.png}
slides={nesy-slides.pdf},
preview={nesy_models_framework_verbeta.png},
pdf={final_draft_chart_nesy_reasoner-gamma.pdf}
}
Expand Down
Empty file added _posts/2023-02-08-workload.md
Empty file.
Empty file added _posts/2023-04-26-path.md
Empty file.
Empty file added _posts/2023-06-28-lecunn.md
Empty file.
12 changes: 7 additions & 5 deletions _posts/2023-09-01-nesy-summer-school.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ I attended the NeuroSymbolic (NeSy) Summer School 2023 held [virtually](https://

They mostly talked about the use of logic, propositional or first-order, to shape a problem domain and drive neural networks to make explainable predictions, discussing the strenghts and drawbacks of this approach.

I am personally skeptical of the use of logic to encode knowledge or describe relations. This is because there exists infinite knowledge, and thus this entails that in order to solve real-world problems that involve complex systems, e.g. predicting cell progression in a tissue, may involve writing infinite amount of rules encompassing cell physiology or pathogenesis. It is also disputable what are the putative rules to encode in order to solve problems, and rarely two experts on the same field may converge on what rules should be encoded into a model for solving a problem like cell trajectory analysis. On a tangent, I also liked the point raised by Prof. Benjamin Grosof on defeasible knowledge, referring to rules which are only true circumstantially. For example, for a rule like 'every cell has a nucleus', this is only true except when it doesn’t such as during anaphase of mitosis or if it's a bacterium. One can relax the "absoluteness" of rules by encoding probabilistic knowledge, i.e., rules with probability distributions on their truth values attached to them. However, this only exacerbates the inherent combinatorial search space related to the search for rules that can answer a query.
I am personally skeptical of the use of logic to encode knowledge or describe relations. This is because there exists infinite knowledge, and thus this entails that in order to solve real-world problems that involve complex systems, e.g. predicting cell progression in a tissue, may involve writing infinite amount of rules encompassing cell physiology or pathogenesis. It is also disputable what are the putative rules to encode in order to solve problems, and rarely two experts on the same field may converge on what rules should be encoded into a model for solving a problem like cell trajectory analysis. On a tangent, I also liked the point raised by Prof. Benjamin Grosof on defeasible knowledge, referring to rules which are only true circumstantially. For example, for a rule like 'every cell has a nucleus', this is only true when the cell is not going through anaphase of mitosis or if it's a bacterium. One can relax the "absoluteness" of rules using temporal logic ('something is true during certain timeframes') or by encoding probabilistic knowledge, i.e., rules attached with probability distributions on their truth values. However, these only exacerbate the inherent combinatorial search space related to the search for rules that can answer a query.

Let us illustrate with the following hypothetical logic program:

Expand Down Expand Up @@ -51,9 +51,11 @@ My criticism towards this kind of logic is that it doesn’t reflect the complex

Rather, I believe that the brain stores operations/programs which operate on the world to obtain knowledge. In other words, it doesn’t store knowledge, it scaffolds from the environment to obtain knowledge. For example, it is not the case that a Herbrand Base of some logic program is stored in my brain, rather, inside it there are programs, such as [modus pollens and modus tollens](https://human.libretexts.org/Bookshelves/Philosophy/A_Miniguide_to_Critical_Thinking_(Lau)/01%3A_Chapters/1.08%3A_Patterns_of_Valid_Arguments) that allow to process this logic program that is written on the screen in front of you. What is interesting about this is that, for example, Shirley the person is not stored in my mind, she exists on the real world. What is actually in my mind is an operation like modus pollens, which when applied on her and her smoking habit, allows me to deduce she is a smoker. Perhaps the rule of whether she is a smoker is encoded in prior exposure to social dynamics of friends and smokers. Here, the encoding procedure, the manipulation of social dynamics and smoking habits can all be thought of as programs.

Another source of beauty with programs is that one doesn't need to write rules, but rather a domain-specific language with grammar rules for the machine to use it as its language of thought to interpret incoming data. As an analogy, I like to think of it as defining a programming language, where a machine learns to understand the world by manipulating code written in it.
Another source of beauty with programs is that one doesn't need to write rules, but rather a domain-specific language with grammar rules for the machine to use it as its language of thought to interpret incoming data. As an analogy, I like to think of it as defining a programming language, where a machine learns to understand the world by writing and editing its own code, a form of self-reference.

Furthermore, I hypothesize that perhaps we need a vector embedding for primitive programs and their compositions. Empirically, it has been observed that continuous representation of discrete symbols/concepts tend to speed up their manipulation, such as searching over them.
Furthermore, I hypothesize that perhaps we need a vector embedding for primitive programs and their compositions. Empirically, it has been observed that continuous representation of discrete symbols/concepts tend to speed up their manipulation by exploiting GPU parallelism, such as searching over them.

Interesting work revolving on program synthesis include [neural module networks](https://arxiv.org/pdf/1511.02799.pdf), and an [improvement](https://arxiv.org/abs/1704.05526) over this architecture.

<!-- Can this task
Prior knowledge on concepts , length of solution -->
Expand Down Expand Up @@ -147,9 +149,9 @@ I think it depends on the context. For proving mathematical theorems, humans arg

A member of the audience appealed to the bird-airplane thought experiment: will something similar happen to the brain-computer relationship, i.e., we can design an algorithm ran on a computer that is devoid of the flaws of the brain, such as the brain's limited lifespan, biological constraints, and obtain a tool that is far more powerful than the brain itself. This is analogous to how an airplane, despite having its architecture inspired by the phenotype of the bird, is many scales more efficient at flying and carrying cargo than any existing bird.

First, I think the analogies are flawed at its core if we want to discuss on the topic of intelligence. While the airplane succeeds at a particular task of flying over long distances, it is devoid of intelligence and is dependant on the human flying it. Without a human operator, an airplane that is about to crash, or goes low on fuel, has absolutely no way to adapt to changes, and formulate a plan to get out of its predicament by itself. Birds, on the other hand, can engage in simple problem solving concerning survival, reproduction and caring for the young, all while adapting to circumstantial events in their environment. One striking example that is often cited by [Prof. Song Zhu-Chun](https://arxiv.org/pdf/2004.09044.pdf) is of a crow that can take advantage of incoming cars to [break nut shells](https://www.youtube.com/watch?v=NenEdSuL7QU) and eat them.
First, I think the analogies are flawed at its core if we want to discuss on the topic of intelligence. While the airplane succeeds at a particular task of flying over long distances, it is devoid of intelligence and is dependant on the human flying it. In some regard, intelligence perhaps is _not just about solving a task_. Without a human operator, an airplane that is about to crash, or goes low on fuel, has absolutely no way to adapt to changes, and formulate a plan to get out of its predicament by itself. Birds, on the other hand, can engage in simple problem solving concerning survival, reproduction and caring for the young, all while adapting to circumstantial events in their environment. One striking example that is often cited by [Prof. Song Zhu-Chun](https://arxiv.org/pdf/2004.09044.pdf) is of a crow that can take advantage of incoming cars to [break nut shells](https://www.youtube.com/watch?v=NenEdSuL7QU) and eat them.

However, once we have grasped the dark matter of intelligence in a model, then I think scaling it up will lead to wonders that we can now only envision: supercomputers able to [manage cities](http://www.wadjeteyegames.com/games/technobabylon/), accelerate scientific discovery or take us to Mars.
However, once we have grasped the dark matter of intelligence in a model, then I think scaling it up will lead to wonders that we can now only envision: supercomputers able to [manage cities](http://www.wadjeteyegames.com/games/technobabylon/), accelerate scientific discovery such as in the biological sciences and take us to Mars.

## What is hardcoded in the brain?

Expand Down
8 changes: 3 additions & 5 deletions _posts/2023-10-31-sgd.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@ tags: food-for-thought creative-work
categories: blog-post
disqus_comments: true
related_posts: true
thumbnail: /assets/img/sgd4life.png
thumbnail: /assets/img/sgd4life.PNG.png
toc:
sidebar: left

# related_publications: einstein1950meaning, einstein1905movement
---

# A tale of a scientist
Expand Down Expand Up @@ -40,7 +38,7 @@ The stochasticity of gradient descent is mainly derived from the order and the s
Given these precedents, I wanted to depict my desire to see in the future a kind of "structured" gradient descent (which in English coincides with the same acronym as stochastic gradient descent):

<figure>
<img src="/assets/img/sgd4life.png" alt="Sorry, an unanticipated error occured and the image can't load." width="100%" height="auto">
<img src="/assets/img/sgd4life.PNG.png" alt="Sorry, an unanticipated error occured and the image can't load." width="100%" height="auto">
<figcaption id="sgd"> Not an actual depiction of structured gradient descent, but nonetheless a cool logo. </figcaption>
</figure>

Expand All @@ -54,6 +52,6 @@ The unanswered questions that follow would include:
- Most importantly, would more intricately labeled data compensate for low data?

<figure>
<img src="/assets/img/structuredgd.png" alt="Sorry, an unanticipated error occured and the image can't load." width="100%" height="auto">
<img src="/assets/img/structuredgd.PNG.png" alt="Sorry, an unanticipated error occured and the image can't load." width="100%" height="auto">
<figcaption id="sgd"> A (hopefully) cool logo of SGD</figcaption>
</figure>
47 changes: 47 additions & 0 deletions _posts/2023-11-3-biophysical.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
layout: distill
title: "Review of the paper Learning biophysical determinants of cell fate with deep neural networks"
date: 2023-11-02 19:42
description: comments on a paper that leverages deep learning to classify epithelium cell fate by observing its live image trajectory.
tags: research
categories: blog-post
disqus_comments: true
related_posts: true
authors:
- name: Xuelong An
toc:
- name: Brief summary
- name: My comments and future research directions
thumbnail: /assets/img/biophysical-mitosis.png
bibliography: deep-med.bib
---

# Brief summary

The paper by <d-cite key="dsa_2023_prediction"></d-cite> leverages deep learning architectures to solve a pentanary classification task given either a cell's tabular features or images. The five independent classes are one healthy control and four subtypes of Parkinson's Disease: familial proteinopathy (SNCA), environmental proteinopathy ($$\alpha$$-Syn oligomer), and two subtypes characterized by different mitochondria dysfunction pathways. These pathologies were chemically induced on stem cells. Fifty-six phenotypical features of them were extracted automatically and recorded as tabular data, along with images of the cells extracted via microscopy. Both data modalities were labeled with one of the five classes.

The research team trained separately a dense feedforward neural network (DNN) to classify on the tabular data, as well as a convolutional neural network (CNN) to classify on image data. The test classification accuracy achieved by the DNN reached around 83%, while the CNN 95%.

<figure>
<img src="/assets/img/parkinson.png" alt="Sorry. Image couldn't load." width="100%" height="auto">
<figcaption id="bottleneck">Two separate models are trained on different datasets on the same task of Parkinson subtype classification. Figure extracted from the original research article at https://www.nature.com/articles/s42256-023-00702-9</figcaption>
</figure>

# My comments and future research directions

Generally, in the deep learning literature, it is acknowledged that the usage of DNNs comes at the expense of poor explainability. Despite achieving high classification accuracy, these models are black-boxes. Nonetheless, there are ways to identify what are the features that the neural networks pay the most attention when deciding on a classification label, mainly by looking at its last layer's activation and tracing back to the input space which input feature is associated to it. In CNNs, the technique employed by the research team is called the ShAP (SHapley Additive exPlanations) method.

The authors managed to identify in both the DNN and CNN that the mitochondria, lysosome and the interaction of both features mainly contributed to the classification decisions of both models.

One future research direction I am interested is exploring whether by integrating both data sources can improve performance and yield explainability, because the original work trains separate models, trained on different datasets.

One source of inspiration is from <d-cite key="li_2023_v1t"></d-cite>, where they integrate image data along with a mouse's behavioral features to predict its neural responses collected from neural recordings. Another source of inspiration is drawn from concept-bottleneck models <d-cite key="koh_2020_concept"></d-cite>. There, a CNN in charge of processing images doesn't learn to output a classification label, but instead to output features that are relevant to the image. These features, in turn, are annotations of the image stored in tabular:

<figure>
<img src="/assets/img/bottleneck.png" alt="Sorry. Image couldn't load." width="100%" height="auto">
<figcaption id="bottleneck">A depiction of the pipeline of a concept-bottleneck model. The first half outputs a set of concepts given an image, which can be learnt from intricate annotations, or metadata, of the image. The second half outputs a classification label. Figure extracted from the original paper </figcaption>
</figure>

Altogether, with regards to the work by <d-cite key="dsa_2023_prediction"></d-cite>, one interesting extension to their CNN is to have it not predict a Parkinson subtype, but rather learn to predict the cell's physiological features stored as tabular data given image input. Subsequently, use the features to train a multi-class regressor using standard softmax to output a classification label. The prospect is that this hybrid model can leverage the high accuracy prediction of the CNN, whilst being explainable thanks to the logistic regressor.

As a further improvement, we can use a [Slot Transformer](https://arxiv.org/abs/2210.11394) instead of the CNN with the hope of learning a disentangled representation given the image with its annotations. However, the architecture will be more computationally expensive. A pretrained Slot Transformer that already learnt to disentangle CLEVR-Scenes may be more powerful than training it from scratch.
11 changes: 11 additions & 0 deletions _posts/bayesianism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# una breve historia

En secundaria, uno puede ser introducido al teorema de Bayes.
# Obteniendo la distribución posterior


# Muestreo de una distribución
# Study tips

- What are the types of variables?
- function, scalar, vector, matrix, or else
Loading

0 comments on commit b26cc13

Please sign in to comment.