logbook_first_half.txt

--------------------------------------------------------------------
In the folder, /lunarc/nobackup/projects/snic2020-6-41/salma-files/ImageProject/
I have the following folders:
1) Courses (The material of the course I have taken (e.g., FMAN45, DNA sequening, ...))
2) Game (contains Sonja's annotation and main and updated version of the game)
	In the Sonja's annotation folder, I have some scripts for extracting labels from sql files, remove the name and 	convert them to training databases,...)
3) selected_images_folder (contain the raw, 8-bit png, 16-bit png files of 89099 selected images (Very precious dataset))
4) master_notebooks_python_sripts (all the master projects files and folders including unsupervised analysis and notebooks)
5) objects (Contains the objects from Jon's script for all channels in png form)


--------------------------------------------------------------------
June 22, 2020
I am working on the final version of the  game.
1. I am going to add so many examples for practicing and help part.
2. I separated the Sonja's annotated imageand add them to  help pages of the game.
3. I will add counter  to the number of images that people have annotated.
--------------------------------------------------------------------
June  23-26 working on the game and individual study plan.

--------------------------------------------------------------------
June 26
1. Help part done.
2. Counter done.
3. Images and labels are bigger now and easier to annotate.
Game folders are now two folders called:
	Web_game_26_June_CompleteHelpntain 500 image for eacch channel)
	Web_game_26_June_10000each  (cin 10000 image for each channel)
4. Upload new versions for master students

--------------------------------------------------------------------
June29
1. Completeing the individual study plan (finish)
2. working on students feedbacks about game (Annie's feedback)
--------------------------------------------------------------------
June30
1. Working on how to add lightning text on the labels by only put mouse over them
2. solve the problem by adding title attributed to label checkboxes
--------------------------------------------------------------------
July1
I took an off day
--------------------------------------------------------------------
July2
NLP meeting
1. finish the help part and lightening text over the labels
--------------------------------------------------------------------
July3
1. Finish training part of the game based on Sonja's previous annotations
2. Game finished with sqlite databases
--------------------------------------------------------------------
July6
1. Eugloh summer school started
2. I received an email from lunarc people as follows for gpu problem:
	1) you cannot use "interactve" in a batch job
	2) there are 2 partiotions (queues) with gpus
	-p gpu
	-p gpuk20
	the first one is quite loaded so you must be prepared to wait.
	if you use the lu partition (-p lu) you will never get a gpu.
3. working on gpu problem solving in lunarc
--------------------------------------------------------------------
July7
Second day of Eugloh school
here are the link of three good lectures on
1)biomedical image_processing
https://www.youtube.com/watch?v=SGKej5ZovVI


2)Patient iPSC-derived brain cells as a precision model for stratifying cellular phenotypes and
developing therapies
https://www.youtube.com/watch?v=hgBSPd8xxwY

I uploaded new game for Sonja-- I need accurate labels.

I correct the commands for using lunarc's gpus and update students
--------------------------------------------------------------------
July 8


Unsupervised Machine Learning for Gene Expression Analysis - Part 1 (Pedro Gabriel Dias Ferreira)
https://www.youtube.com/watch?v=MY88Jz4f8lU

Unsupervised Machine Learning for Gene Expression Analysis - Part 2 (Pedro Gabriel Dias Ferreira)
https://www.youtube.com/watch?v=B0109uFoT_I


Meeting with Sonja
Schedule until 1st of August

1) Game and documentation

2) Writing manuscript about game(Only outline of the paper)

3) find journal where to submit? (Bioinformatics...dataset of images?..

4) Create Annoatate agreement table (50 images)

5) Solve GPU problem of lunarc and Kebnekaise

6)Take the credits of EUGLOH summer school

------------------------------------------------------------------------
July 9
Summer school lectures on Economy and epidemiological aspects of COVID-19

A good lecture from Anders Widell a virologist from LUND univerity

SARS-Cov-2 And COVID-19 (Joakim Esbjörnsson, Anders Widell)
https://www.youtube.com/watch?v=LUOInNx4q_Q


---------------------------------------------------------------------------
July 10

Summer school was finished.
Test is done.
A good lecture on Molecular biology and
immunology of the SARS CoV-2 infection
link: https://www.youtube.com/watch?v=_wjLK4_csOs

---------------------------------------------------------------------------
July 13
Shared 89088 8-bit png images on snic2020-6-41 with students
Finish gpu tutorial and shared with students(does not work on lunarc)
start working with kebnekaise (login through terminal, thinlinc, ...)
through thinlinc:  server:  kebnekaise-tl.hpc2n.umu.se
through terminal : domain: ssh yourusername@abisko.hpc2n.umu.se
or 			   ssh yourusername@kebnekaise.hpc2n.umu.se
Solved mariam's problems with Game
---------------------------------------------------------------------------
July 14
Trying to solve gpu and torch.cuda problem in lunarc and kebnekaise (seems unsolvable :(()

---------------------------------------------------------------------------
July 15, 16
I tried to run three different scripts on lunarc and make a connection to gpus

1) The first one was NER_by_Flair_NCBI.py
################################################
I tried the following job, first. However, It is still pending after 48 hours
#SBATCH -A lu2020-2-10
#SBATCH -p gpu
#SBATCH --gres=gpu:2
#SBATCH -n 1
#SBATCH --mail-user=sa5202ka-s@student.lu.se
#SBATCH --mail-type=END
#SBATCH -J Flair_model_on_NCBI_disease
#SBATCH -t 40:00:00
#SBATCH -o NCBI_disease.out
#SBATCH -e NCBI_disease.err
#SBATCH --mem-per-cpu=11000

python3  ../notebooks/python-scripts/NER_by_Flair_NCBI.py > NCBI_log.txt


****** The good news is after it started, it used gpus and it took only 00:59:22 minutes
to run while it took 13 hours on cpu.
In this case The devices was shown as
Device: cuda:0
It means that this if became true finally:
if torch.cuda.is_available():

################################################
Then I tried the following:


#SBATCH -A lu2020-2-10
#SBATCH -p gpuk20
#SBATCH -n 1
#SBATCH --mail-user=sa5202ka-s@student.lu.se
#SBATCH --mail-type=END
#SBATCH -J Flair_model_on_NCBI_disease
#SBATCH -t 40:00:00
#SBATCH -o NCBI_disease.out
#SBATCH -e NCBI_disease.err
#SBATCH --mem-per-cpu=11000

python3  ../notebooks/python-scripts/NER_by_Flair_NCBI.py > NCBI_log.txt

#################################################
The run has started but again it was on CPU mode for the following part of code (It didnot run on gpu!!):
if torch.cuda.is_available():
	device = torch.device('cuda:0')
	print('gpu')
else:
	device = torch.device('cpu')
	print('cpu')

#################################################
I tried interactive mode by the following command in terminal:

interactive -A LU 2020-2-10 -p gpu --gres=gpu:2 -t 1:00:00

It is still pending after 48 hours.

and the following one also didn't work, although it started immediately for one hour.
interactive -A LU2020-2-10 -p gpuk20 -t 1:00:00
###############################################################################
The second and third scripts were the following scripts and I got different errors for each of them.

2) /snic2020-6-41/salma-files/NLPProject/Flair/jobs/gpu-test.py
It is for testing numba which is a jit compiler but I was not successful.....
In the jupyter notebook snic2020-06-41/salma-files/NLPProject/Flair/Regex_cuda_test.ipynb I have some notes on numba and jit and.......


3) /snic2020-6-41/salma-files/ImageProject/Courses/FMAN45/L14_files/torch_mnist_cuda.py
I could run this on Marcus's system.
The code has the following part to copy the data on gpu:
# Load network and send to GPU
    c = ConvNet()
    print(summary(c, torch.zeros((1,1,28,28))))
    c.cuda()
*****I received the following error while I was trying to run it on cpu in lunarc
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

And I am waiting for gpu results on lunarc


While I was trying to run NER_by_Flair_NCBI.py on marcus system I got the following error:
ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found
(required by /mnt/fastdisk/BioNLP/anaconda3/lib/python3.7/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-37m-x86_64-linux-gnu.so)

by typing the following command I got the version of GLIBCXX_3.4. which was:

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_DEBUG_MESSAGE_LENGTH

I should contact marcus for updating the version since I don't have the admin privilages in his system

--------------------------------------------------------------------
July 17, July 20

A thorough support is on:
https://www.hpc2n.umu.se/documentation/guides/beginner-guide


Work with kebnekaise gpu:
a sample for kebnekaise job script is:

###################################################################
#!/bin/bash

# Put in actual SNIC number
#SBATCH -A snic2020-9-99
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -J torch_mnist
#SBATCH --time=00:15:00

###SBATCH -p largemem

#For OpenFOAM version 6
#ml purge > /dev/null 2>&1 # Ignore warnings from purge
#ml icc/2018.1.163-GCC-6.4.0-2.28  impi/2018.1.163
#ml ifort/2018.1.163-GCC-6.4.0-2.28  impi/2018.1.163
#ml OpenFOAM/6

# to change the default platforms directory of OpenFOAM
#source /pfs/nobackup/home/m/morteza/etc/settings.sh

# run the program
#decomposePar -force >& log.decomposePar
#srun -n 32 pelletReactingFoam -parallel >& log.pelletReactingFoam
#reconstructPar -newTimes >& log.reconstructPar

python ../torch_mnist.py
####################################################################
--------------------------------------------------------------------
July 21
I am working on a tutorial for using kebnekaise and abisko
for log in, save files, submit a job and use gpus

It is stored as /snic2020-06-41/salma_files/Tutorials/kebnekaise_abisko_short_tutorial.ipynb
I also completed other tutorials and store them in the same directory as /snic2020-06-41/salma_files/Tutorials/
--------------------------------------------------------------------
July 22
submitting a job on cpu and also gpu works on kebnekaise and abisko now.
A copy of tutorial is sent to Malou.

For running our image processing scripts on kebnekaise we need the storage project,
since we have only 25GB of memory on /pfs/nobackup/ space. However, we have access to
large memry project on kebnekaise which provides us 3072000MB memory for our job.
""If your job requires more than 126000MB / node on Kebnekaise, there is a limited number of nodes with 3072000MB memory, which you may be allowed to use (you apply for it as a separate resource when you make your project proposal in SUPR). They are accessed by selecting the largemem partition of the cluster. You do this by setting: -p largemem.""
--------------------------------------------------------------------
July 23, 24
I make a run on gpu nodes of kebnekaise. I had some error yesterday for my scripts
as follows:

AssertionError:
The NVIDIA driver on your system is too old (found version 10010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

I tried to change the version of pytorch(torch and torchvision) and then I got another error as:

/bin/bash: /hpc2n/eb/software/lmod/lmod/init/bash: Transport endpoint is not connected
python: error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

I emailed the problem to support people and I got this answer:
You should load the appropriate python in your submit file using the following
after the SBATCH commands and before actually using python, you can find the
available versions of python using ml spider python/
see the following for more information
https://www.hpc2n.umu.se/documentation/environment/lmod


ml purge 2>/dev/null >/dev/null
ml GCCcore/8.3.0
ml Python/3.7.4

-----------------------------------------------------------------------
July 27 (off day)
-----------------------------------------------------------------------
July 28

I downgraded the version of torch and torchvision to be campatible with the NVIDIA driver version.

My previous version of torch was torch==1.5.1 and torchvision==0.6.0.

I had to reinstall them and install lower versions as (command="pip freeze"):
torch==1.3.0
torchvision==0.4.0

I also had to load "cuDNN" and "CUDA" modules. And for loading these modules I had to load their dependencies
which I can find out with "ml spider module_name".


Finally, I could run my "torch_mnist_cuda.py" script as a job on kebnekaise on K80 node without error in newEnv environment.

The list of modules I loaded were: (command== "ml")

Currently Loaded Modules:
  1) systemdefault       (S)   7) GCC/8.3.0        13) libffi/3.2.1
  2) snicenvironment     (S)   8) ncurses/6.1      14) bzip2/1.0.8
  3) iccifort/2019.5.281       9) libreadline/8.0  15) SQLite/3.29.0
  4) GCCcore/8.3.0            10) Tcl/8.6.9        16) Python/3.7.4
  5) zlib/1.2.11              11) XZ/5.2.4         17) CUDA/10.1.243
  6) binutils/2.32            12) GMP/6.1.2        18) cuDNN/7.6.4.38


For training flair model on kebnekaise I submitted the job. However, I got this error:
PermissionError: [Errno 13] Permission denied: '/home/s/salmak/.flair/embeddings/pubmed-2015-fw-lm.pt'

The steps:
ml GCCcore/8.3.0
ml Python/3.7.4
source /pfs/nobackup/$HOME/NLPenv/bin/activate

by running in terminal I got the memory error that quota exceeds.

------------------------------------------------------------------------
July 29, 30
I annotated images again to compare with Sonja's and Jon and Mariam annotations.

I assigned a number to each label and if I had the same label for each image, the numbers would sum up together..
I assign a column to each label and show the results in a table in
snic2020-6-41/salma-files/ImageProject/Game/Annotations/Annotation_comparison.ipynb

------------------------------------------------------------------------
July31
Iran shared images with us on lu box.
All images are transfered to lunarc and kebnekaise for some analysis.

I searched a little bit on a target journal for game manuscript.
I think bioinformatcs journal is good.

https://academic.oup.com/bioinformatics/pages/instructions_for_authors

Application Notes (up to 2 pages; this is approx. 1,300 words or 1,000 words plus one figure): Applications Notes are short descriptions of novel software or new algorithm implementations, databases and network services (web servers, and interfaces). Software or data must be freely available to non-commercial users. Availability and Implementation must be clearly stated in the article. Authors must also ensure that the software is available for a full two years following publication. Web services must not require mandatory registration by the user. Additional supplementary data can be published online-only by the journal. This supplementary material should be referred to in the abstract of the Application Note. If describing software, the software should run under nearly all conditions on a wide range of machines. Web servers should not be browser specific. Application Notes must not describe trivial utilities, nor involve significant investment of time for the user to install. The name of the application should be included in the title.


--------------------------------------------------------------------------
Aug03-07
working on Iran's images.
1) Read them and convert them to a numpy arrays categorized in two classes: Control, and lps
2) Cut images to (224,224) tiles to fed into a dense network.
3) The total images were 140 conrol images and 70 lps images.
4) Now, we have 2520 control tiles and 1820 lps images. the size of lps ones are different.

-------------------------------------------------------------------------

Aug10

train a DenseNet with the images and test new data.
I can find the DenseNet paper from following link:


All the files are stored on snic2020-6-41/salma-files/ImageProject/Collabrations directory.

There are still some issues with the result.
-------------------------------------------------------------------------
Aug11

Continue the analysis of iran images

Complete Annotation_comparison notebook and upload it to AitsLab/Microscopy_image_analysis_folder

Upload Tutorials directory to AitsLab/Infrastructure


------------------------------------------------------------------------
Aug12-13

Debugging analysis
train network over and over again

still bad test results
------------------------------------------------------------------------
Sep 06

I trained a VGG16model + two different layers on top of that. All analysis is on snic2020-6-41/salma-files/ImageProject/Collabrations/
directory.
There are four jupyter notebooks that are summarized in one for sharing with Darsi group and will be presented on Sep 09 with them.
The main codes are in python-script/3_Sep_VGG_data_4 and python-script/3_Sep_VGG_batch_normalization_data_4 that is the final results on data_4 set.
In which 80 percent of original images are separated for training dataset and 10, 10 for validation and test. Then those images are cropped to smaller tiles (224,224).


I am trying to use gradcam to find the gradients of output loss with respect to the last conv layer to see the network is trained on what part of the image.
However, for the tensorflow version problem I got multiple error.
I had to create new conda environment as following:


conda create -n tfgpu tensorflow python=3.6.8

conda install tensorflow-gpu==1.13.1

to test my code again. I am working on it to solve the errors.
------------------------------------------------------------------------
Sep 14

I am working on the LPS/Ctrl images. I am trying to work with original images. extract some images.
separate them in 80, 10, 10 ratio. Add blurred images with kernel size=3 and K=5 to the main data. Change the brighness of images randomly.
train the new network (VGG16) on them and check the result.

We have the  new storage project on kebnekaise.

-----------------------------------------------------------------------
Sep 16-17

I used pytorch for training the vgg16 classifiers on darcy' project and ran on marcus's system.
The results were not good enough. working on that....


----------------------------------------------------------------------
Sep 18-30

vacation
----------------------------------------------------------------------
October 1- Nov 09
parental leave
----------------------------------------------------------------------
Nov 07

on Nov07 I had problem by lunarc system.
My pocketpass token expired by October 26:th.
I had to Follow the instructions at
https://lunarc-documentation.readthedocs.io/en/latest/authenticator_howto/#checking-the-validity-of-your-token
to register and activate a new one.
I start to listen to Stanford NLP course lectures.
first lecture on youtube:
https://www.youtube.com/watch?v=8rXD5-xhemo
----------------------------------------------------------------------
Nov 10
Stanford NLP course
----------------------------------------------------------------------
Nov13
The first lecture is summarized in /snic2020-06-41/salma-files/NLPProject/CS224N/lecture1/cs224n.ipynb
the theoritical optimization problem+gensim package
gloVe embedding + word2vec


A small cell death lecture is summarized in  /snic2020-06-41/salma-files/Biology/Cell_death/ directory as apoptosis.txt

A lecture on NLP (analysing the text of stand-up comedians transcripts..) with
full steps are summarized in /snic2020-06-41/salma-files/
----------------------------------------------------------------------
Dec 1
Come back from parental leave
work 25% (mornings)
A short talk with Sonja: what to do next?
what is going on:
1)Augustin and ludwig are working on classification of histology screens
2)Peter_Alexander are working on biobert NLP relation extraction
what should I do:
Game:
Compare results (new results)
Share with Rafsan
Transfer everything to git (change notebooks to .py files)
write "make-files" for scripts
control version with git
.json hyperparameter for each model
write readme.txt for each directory
send email to carl for system biology course
deep learning_ journal club in 2021
a facebook page hubAI
deep learning course (get material from Sonja)
Read Augustin and ludwiq's notebook in onenote
----------------------------------------------
Dec 02
Start working at 18:30 (for around 2 hours)
Update the annotation databases to skip the first 100 images
I used "update annotate_table set first_label ='skip,salma' where id in (select id from annotate_table limit 100 );" command.
Copied the new game for Sonja and Rafsan
---------------------------------------------
Dec 03
start 9:30
try to fix the game for Rafsan, still error and it is about the conda environment.
histology meeting(guys trained a 3 class classifier)
They shared grad-cam code and the version of packages.
Tensorflow version 2.3.0
keras version 2.4.3
---------------------------------------------
Dec 08,

Fix the grad-cam (results are not good)
Fix Rafsan's game
---------------------------------------------
Dec 14
start 10:30
check Rafsan's and Sonja's annotation(not done yet)
---------------------------------------------
Dec 21
Things to do:
1) Binary game
2) Check scores of Iran's images
3) Grad-cam completion and uplaod
4) Game Draft and binary draft
5) Ask for Malous' code for new cutouts

----------------------------------------------
Dec 22-Jan 21
Working on Iran's dataset
train Vgg16 and resnet50 regression model for average scores
train Vgg16 and resnet50 regression model for individual scores
Working on kebnekaise
---------------------------------------------
Jan 22
Receive Iran's new dataset for ECMO, MV, and ECMO+LPS treatments
Want to test previous models on these Images
Meeting with Darcy and Iran on 22 of Jan

Image processing course of Michigan University still going on!
Paper from Thomas group for weekend
--------------------------------------------
5th of Feb
Annual meeting with Sonja
What we discussed:
agile,virtal board: doing, to be done, three of us!
make contacts with industry
Build a network, journal club
After a seminra discuss what we have learned
every two weeks,
one in a month,
Technical groups
PhD course
put you in contact with others
practice on papers

Lets do pytorch
grad-com
docker
NLP course
May (NLP course)
Teaching course...
Spark

8 papers (small papers)
1. good routines work together
2. publish papers
3. histology paper
4. Mariam projects
5. This year Sonja writes
6. read lots of papers
7. grant writing
8. Co-supervisor
9. This paper
10. Malou's annotations
11. Swedish NLP (NER for swedish symptoms)
12. Swedish Sapcy and bert (Flair)
-----------------------------------------------
Feb 9, 10
Working on training with new paremeters
Change learning rate : doesnot work
Add BN: works but not well
Add CLR:  /proj/nobackup/aits_storage/salma-files/NLPenv/bin/python -m pip install CLR
or  /proj/nobackup/aits_storage/salma-files/NLPenv/bin/python -m pip install --upgrade pip first
I have to mention the PATH since it is  originally installed on /pfs/nobackup/home/s/salmak/NLPenv/lib/python3.7/site-packages

-----------------------------------------------
Feb12
Read the neurodegenerative thesis and comment on DE analysis part
----------------------------------------------
Feb13, 14
Review CS224 course first lecture (Math and script part)
----------------------------------------------
Feb15
Working on Pytorch version of regression model
Review first lecture of deep learning in computer vision course
Learning Pytorch!!!(started from this link: https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
The steps are summerized in salma-files/Courses/Pytorch/Tutorial.ipynb notebook
-----------------------------------------------
Feb 15-Feb28
Finilizing histology project's results

----------------------------------------------
March first Week : knowledge in collabration course
----------------------------------------------
March Second Week: Statistics I course
----------------------------------------------
March third week: Off days
----------------------------------------------
March Fourth week: Qualitative resarch course
----------------------------------------------
April first week: Working on graphical visualization of scores on the images for histology project
----------------------------------------------
April second week: Off days
---------------------------------------------
April third and fourth week : Statistics II course
---------------------------------------------
April 25-May 3
Train EfficientNet0 on total score (not better than Vgg16)
*************Future plan: Train EfficientNet4 and 7.
--------------------------------------------
May 4-May 10 Research ethics Course
-------------------------------------------
May 6 jouranl club : 
SPICE paper
************Future plan to run it over 890000 images
-------------------------------------------
May 12
Meeting with Johanna
*************Write script fot parsing the pdf journals
in R or Python
Visualize relation graph in R and Cytoscape
install.packages("BiocManager")
library(BiocManager)
BiocManager::install(version = "3.12")
BiocManager::install("paxtoolsr")
library(paxtoolsr)
BiocManager::install("rJava")
library(rJava)
help.search("paxtoolsr")

install.packages("igraph")
library(igraph)

results <- readSif("tab_example.sif")
g <- loadSifInIgraph(results)
g
plot(g)
-----------------------------------------------
May 13
Johanna added us to the github repository
**************Add flair model to the NLP pipeline
----------------------------------------------
May 14-23
Work 50%
Done this week:
*Debug Mariam's Grad-cam code
*Write the Script for parsing pdf file in R (only first phase that extracts patient info+ aktuellt + huvud... info)
*train B4, and B7 on three-groups dataset and also five-group dataset (for B7 I changed batch-size to 8)
waiting for results
   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
          13215312    single Regressi   salmak  R       6:44      1 b-cn0123 
          13215298    single Regressi   salmak  R       8:24      1 b-cn0123 
          13215230    single Regressi   salmak  R      12:18      1 b-cn0343 
          13215206    single Regressi   salmak  R      13:39      1 b-cn0343 
          13215202    single Regressi   salmak  R      17:17      1 b-cn0847 
Going to do:
Start Coursera NLP course
----------------------------------------------
May 24-29 
plan:
Finish histology runs and complete the manuscript
Finisg NLP course
----------------------------------------------
June 1-25
working 50 %
Working on histology project
Finding a bug in 5-fold dataset
slide 28 exists in both MV and Control and copied to both folds where MV was validation and where Control was validation by mistake
Correcting that

Filling the information in the manuscript
---------------------------------------------
June 29
*******making new environment
pip install opencv-python-headless
--------------------------------------------
Parental leave on July (alost 100 %)
-------------------------------------------
9th of July Group meeting
Results:
Salma:
	1. Histology
	2. Biobert project
Goals:
Train the large version on HunNer corpora
Save the model in .pb format for further predictions
Use clusters (Alvis)
Make it compatible with TF.02
Finish the manuscript 


Sonja:
Mariam’s project
Gene, symbols, identifier
To resolve them to a single 
Conversion tool, uniport has
Gene ids could be chosen but uniport (manually reviewed part and 
Another part automatically part (unreviewed part))
Some don’t match (still updating)
Sonja Update Mariam’s code (pandas instead of for loops)
Theresa is doing master project in August

--------------------------------------------
Aug 03,
I realized I was not running my codes on gpus. There was this error that 
"Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.1.."

I did  ml CUDAcore/11.3.1

And It is solved. I also 
ml 2) cuDNN/8.2.1.32-CUDA-11.3.1   3) CUDA/10.1.243   4) CUDAcore/11.3.1
Now the code is running on gpu :)

------------------------------------------
Nov 01 2021
start working after Two month full leave
1. This week I will work on histology manuscript
2. I also try to run the biobert pytorch models on gold standard
(only for gene and protein)
3. Add my binary part to Iran's manuscript
-----------------------------------------
Nov02, 03, and 04 vab (Noura was at home)
I added all biobert results to NER_results excel sheet.

checking the data (Hunflair data and tokenization)

I have one from Adam-ola github page which is 
https://github.com/Aitslab/BioNLP/tree/master/Adam_Ola/ner_inputs/HunFlair_NER_gene/gene_all_combined/train_dev.tsv


And one from Marcus Klang as HUNER_DATASET.zip on 
ner_inputs directory 
----------------------------------------
Nov 08
-All the dates of folders are checked by stat command 
-All the reported results were done on Adam-ola dataset Added all to excel sheet

**Start to add Flair results

After this all the done tasks will shown by -
After this all the ongoing tasks will shown by **

-Updated logbook added to /salmaviolet/Microscopy_image_analysis_folder/ github repo
Following steps were done:
-git clone https://github.com/salmaviolet/Microscopy_image_analysis_folder.git
-git add logbook.txt
-git commit -m 'Update Nov 08'

then in Setting - developer setting - generate token - copy the token

-git push -u origin master or -git push
username: salmaviolet
paste: paste token

tips: For moving curser to the end of vim editor:
ESC   
then 
Shift + G
---------------------------------------
Nov 16
*Flair embeddings are not working
*The link is now
https://nlp.informatik.hu-berlin.de/resources/embeddings/flair/

-I had to download embedding files as *.pt and pass the path to FlairEmbeddings('.pt path') function.
---------------------------------------


Nov 17 -- Parental leave
---------------------------------------
Nov 18

- Request for barzelius account and sign the agreement
- username: x_salka
 For using gpu in scripts
 add --gpus 4 (e.g.,)
 and also run this 
 
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch 
*****This did not work 
 
It is also possible to use gpu in front end interactive mode
interactive --gpus=1
*** This also worked


*****This did
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia


------------------------------------- 
 Nov 19
 work 50%
 Noura was sick
 Finish Flair evaluation on Gold standard
 
 train the model
 save the checkpoints
 load the checkpoints and resume the training
 add all the results to Excel sheet
 -------------------------------------
 Nov 22- Noura was sick
 -------------------------------------
 Nov 23- come back to office
 read Iran's manuscript
 Answer all comments
 -------------------------------------
 Nov 24-25
 I was sick
 -------------------------------------
 Nov 26
 Working on manuscript
 -------------------------------------
 Nov 29- Dec 03
 Oral communication course
 -------------------------------------
 Dec 06
 A little work on Kaggle new compettition data
 Being in contact with Malou
 She train U-Net on the data 15 epochs
 The results shows F1-score of 0.16 (the best score is around 0.339 now)
 write the manuscript
 -------------------------------------
Dec 07
-Still on Kaggle data

* I will write the binary part of histology manuscript today
--------------------------------------
Dec 08
For creating Malou's environment

First need to 
conda update --all 
conda env export --no-builds > env.yml 
conda env create -f env.yml
- also need to change the name of env in env.yml
-------------------------------------

Dec 09
Got gpu work
still some error

tensorflow = '1.14.0'
keras = 2.2.4
cudatoolkit = 10.0.130
cudnn    = first 7.3  then 7.6.5 (got gpu to work but still not running the model)

---------------------------------------		

Dec 10
Gpu works for Malou's code on berzelius
Still some bugs in evaluation (have to fixed)

Meeting with Sonja and rafsan

---------------------------------------
Dec 13
Working on a new model from https://bitbucket.org/t_scherr/cell-segmentation-and-tracking/src/master/
did not work

Looking for a simple algorithm called Watershed from cv package
or run UNET again on predicted masks tomorrow

-Working on the github repo Rafsan shared with me
-Add Flair to the Pipeline
----------------------------------------
Dec 14
Put aside segmentation project for now.
I did not get Malou's result anyway.*************************************
Meeting with Iran
-----------------------------------------
Dec 15-Jan 14
Working on Binary paper- histology
Rerun everything
Add all files to OneDrive/manuscripts/histo1
Working on Manuscript
-----------------------------------------
Jan 16
Start adding Flair to pipleline
-----------------------------------------
Jan 17-27

Correcting Iran's plot
*** Adding three-class classifier instead of binary to the first manuscript

Adding plots to the figure file
Adding Flair to pipleline

presented Journal club paper
-----------------------------------------
Jan 29
Finish 3-class classifier runs
Finisg adding Flair model to Pipleline

-----------------------------------------
Feb 03
running 3 class classifier on Alvis

did not work on my own laptop

no clusters
only Alvis
******************************************************
I realized I was not running my codes on gpus. There was this error that 
"Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.1.."

I did  ml CUDAcore/11.3.1

And It is solved. I also 
ml 2) cuDNN/8.2.1.32-CUDA-11.3.1   3) CUDA/10.1.243   4) CUDAcore/11.3.1
Now the code is running on gpu :)
****************************************************
-----------------------------------------
Feb 04, 2022
   193855     alvis Ctrl_lps   salmak  R       0:03      1 alvis1-16 
   193854     alvis Ctrl_lps   salmak  R       1:29      1 alvis1-15 
   193853     alvis Ctrl_lps   salmak  R       2:51      1 alvis1-14 
   193852     alvis Ctrl_lps   salmak  R       5:33      1 alvis1-13 

four runs on Alvis for 3 class classifier.

EffNetB4, Vgg16 for 3-fold and 5-fold cross validation
------------------------------------------
March 17
I was finalizing the Histology papers results during the past few weeks.

Binary classification of MV+LPS and control slides.
Three-class classification of all slides.
And regression models.
notebooks and excel results of total score is now added to OneDrive. 

Individual scores and visualization did not finalized (I will do after I came back on 14th)
During next weeks I will write the histology papers (first Iran's paper and second my own).

Today I checked Sonja's new images on Swestore. Only names and list them.

Berzelius is now has two step authorization with TOPT app on phone.

Something I should do is the LUBI seminar (20th of April)
Ellite focus group stuff


Flair model and all embeddings that I downloaded from ftp server are on berzelius shared with Rafsan.

I did a small unsupervised clustering on histology images. I took the features from classifiers and regression models and feed them into unsupervised clustering algorthms. The results shows separate clusters in some cases. results are on OneDrive. I want to train a 5 class classifier and do unsupervised clustering on those for focus ellite group.


***After one month 

1) you should update your Mahara,
2) Read the hackathon of Focus Ellite
3) Write the histo 1 paper
4)Write the histo 2 paper
5) any idea on unsupervised clustering
6)Test SHap value

--------------------------------------------
19th Of April
return from 4 weeks vacation

Working on Histology project
train a 5 class classification VGG16 based model to feed the feature vector into an unsupervised algorithm

submit abstract for LUBI seminar

reading hachathon immunotherapy stuff


---------------------------------------------
20th Of April to 16th May
participate in ellite focus period event
hachathon, workshop, course
course link: https://canvas.education.lu.se/courses/17746

projector.tensorflow.org (Links to an external site.)

UMAP video for kindergarten 1Links to an external site.

UMAP video for kindergarten 2 Links to an external site.

UMAP video for high schoolLinks to an external site.

Understanding UMAP - blog with demosLinks to an external site.

DISTILL https://distill.pub/2016/misread-tsne/Links to an external site.

Kernel Methods in Machine Learning, Hoffman et al 2008.

Kernel course by Julien Marial and Jean-Philippe Vert : https://members.cbio.mines-paristech.fr/~jvert/svn/kernelcourse/course/2021mva/index.htmlLinks to an external site.

Slides on Data Analysis with Positive Definite Kernels, Fukumizu 2010


Causal Inference
Elements of Causal Inference  Download Elements of Causal Inference, book by Peters, Janzing and Schölkopf

See also lecture 5 and 6 in this course on Modeling and Learning from Data

There are also nice MIT lectures by Jonas Peters on youtube (Links to an external site.)

The kernel based Hilbert  Schmidt Independence Criterion (Links to an external site.) is often used to in this field


The lectures are also saved in \Documents\Med_faculty_courses.

----------------------------------------------------------------------
On May 16
I had seminar on hitology project.
continue working on histology

weekly meeting with Darcy and Iran

Following Giovanni's seminar

new ideas!!!!!!!!!!!!!!! soon after histology papers
new papers saved in /C:\ImageProject_files\Ellite_focus\Giovanni

remove batch_effect and noise from images


---------------------------------------------------------------------
during this time we could train hovernet for both cells and nuclei
pytorch version works!
for convrting our labels to .mat file I used Marisa and LEO's script in MATALB.

for creating environment for pytorch version after I installed the conda env they mentioned I had to update my env as:
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Then it works.


For cell images since some target patches are full of objects and there is no background we should change two scripts 
first in 
models/hovernet/target.py 


as : 
#######for some images that there is no background (0)
if 0 in inst_list:
	inst_list.remove(0) # 0 is background #plt.imshow(crop_ann)
	#save(str(random.randint(0,1000))+'_.npy',asarray(crop_ann)) #print(inst_list) for inst_id in inst_list:
inst_map = np.array(fixed_ann == inst_id, np.uint8)
inst_box = get_bounding_box(inst_map) # expand the box by 2px
# Because we first pad the ann at line 207, the bboxes
# will remain valid after expansion
inst_box[0] -= 2
inst_box[2] -= 2
inst_box[1] += 2
inst_box[3] += 2 inst_map = inst_map[inst_box[0] : inst_box[1], inst_box[2] : inst_box[3]] if inst_map.shape[0] < 2 or inst_map.shape[1] < 2:
continue # instance center of mass, rounded to nearest pixel
inst_com = list(measurements.center_of_mass(inst_map))


#and hereeeeeeeeeeeeeeeeeeeee to make NAN a number
if math.isnan(inst_com[0]) or math.isnan(inst_com[1]):
	inst_com[0] = 40
	inst_com[1] = 40
 
 
 and also I had to change dataloader/Aug.py as  
 
 
def fix_mirror_padding(ann):
"""Deal with duplicated instances due to mirroring in interpolation
during shape augmentation (scale, rotation etc.).
"""
current_max_id = np.amax(ann)
inst_list = list(np.unique(ann))
if 0 in inst_list:
	inst_list.remove(0) # 0 is background


for inst_id in inst_list:
inst_map = np.array(ann == inst_id, np.uint8)
remapped_ids = measurements.label(inst_map)[0]
remapped_ids[remapped_ids > 1] += current_max_id
ann[remapped_ids > 1] = remapped_ids[remapped_ids > 1]
current_max_id = np.amax(ann)
return ann
----------------------------------------------------------------------------------------------------
For predicting the labels the run_infer.py should be run as:
python run_infer.py --model_path=logs/00/net_epoch\=10.tar --model_mode='original' --gpu='0,1' tile --input_dir=../dataset/Malou/annotatedimages/Green_Stain_channel/normalized_images/  --output_dir=output/lysosome_new_10_valid

--model_mode is important and I set all other plotting and saving paramters to True inside the code.


---------------------------------------------------------------------------------------------------
May 19
hovernet is not working so well on cell images
I should try other models on cell images

--------------------------------------------------------------------------------------------------
May 20 - May29
working on Histology manuscripts

read Giovanni's paper

--------------------------------------------------------------------------------------------------
June-July 23
Working on histology manuscript

Finalized results for 3 class classifier for 3fold-cv

Finish paper
New result on augmented dataset
--------------------------------------------------------------------------------------------------
From July 23 to Aug 15 (Vacation)

--------------------------------------------------------------------------------------------------
Aug 23-24 Compute retreat
--------------------------------------------------------------------------------------------------
Aug 24-25 

Meeting with Sonja

Finishing Histo 1 paper
Finishing Histo 2 paper


Running Malou's models on all data (check one)

Or running Hovernet on all data


--------------------------------------------------------------------------------------------------
 
 29 Aug 
 start Histo 2 paper (The goal is to finish the first version until end of this week)
 
 -------------------------------------------------------------------------------------------------
 Sep 02
 
 I had one meeting with Iran during the previous week
 
 I wrote almost 4 pages of Histo 2 paper (I have binary and 3 class finalized results)
 while regression features are not finalized
 
 I had to train Flair models on HunFlair
 
 Cell_line and Species and Disease Done!
 
 But Protein and Chemical were stopped due to Time limit(10 Hours)
 I wanted to resume it.
 It took a while.
 
 They have changed Trainer.load_checkpoint(checkpoint, corpus)
 and Trainer.train
 
 to 
 Trained_model = SequenceTagger.load(any model (best or checkpoint))
 
 and trainer.resume(Trained_model)
 
 Now it is running on Berzelius (Will check on Monday).
 
 Continue writing the Histo 2 paper-
 Meeting with Sonja and Iran next week
 
 Share Flair results with Rafsan.
 
 ****Check Hovernet models to run over some cells. Lets seeee
 ------------------------------------------------------------------------------------------------
 
 Sep 05-
 Chemical and Protein not finished
 Resume them again on Berzelius
 
 Continue writing Histo 2 paper
 
 ----------------------------------------------------------------------------------------------
 Sep 15
 I will be off on sep 16, and 19.
 
 I continue writing Histo 2 paper.
 Have a chat with Iran regarding new figures.
 
 I started new analysis on multilable multi class classification on features.
 
 After that I will continue on label distribution instead of one point label.
 The goal is to finish these two analysis until end of Sep.
 finish Histo 2 paper first week of October.
 
 I decide to correct unsupervised part also!!
 ------------------------------------------------------------------------------------------------
 October 25
 During the past 6 weeks I did all this regarding Histology project.
 I first took the latent space vector of classification models and feed them into unsupervised algorithms. At first I could see three different clusters and
 I realized that is because of different folds I had and each fold vectors had totally different mean and frequency and I tried to normalized them to same mean and frequency but still three clusters because of folds not classes. Then I run only one model over all of them and I could see that different classes has clustered separately...
 
 
 I also run gradcam over three class classification models and I realized some images got 0 after fc1 layer because of relu function and I changed the relu function 
 to sigmoid for the raw models and two best models to check the results.
 
 On Alvis there was this error of endlock for model.fit(callbacks...) and I had to add this line to my job script and it solved the error:
 
 Export HDF5_USE_FILE_LOCKING=FALSE
 
 I also aggregate the results of different models on slide level....
 
 
 Now what to do next:
 Check new results and add them to first paper. 1 day
 Write about all this in first paper.2 days
 Check all results on second histo paper.7 days
 Add all to second histo paper. 7 days
 Write tutotial for klara 3 days.
 
 The runs I changed:
 
 vgg_raw_3folds_softmax
 not third_one (make another run for that, increase the time of script)
 
 vgg_rnd_aug_3folds_softmax
 not 2nd and 3rd, increase the time of run.
 
 effnet_raw_3folds_softmax
 done
 
 effnet_rnd_aug_3folds_softmax
 running for 3rd part
 
 
 --------------------------------------------------------------------------------------------
 17 November 2022
 
 run hovernet on berzelius over 10 images copied from Saved_selected_.._8_bit images from lunarc
 
 Klara copied them
 she run chmod +w folder to gave write and copy access to me
 
 docopt package only pip3 install docopt works
 imgaug and Shapely! same
 
 -------------------------------------------------------------------------------------------
 14 December
 In the last month I was working on Segmentation project.
 
 We trained hovernet with 130 images (100 from BBBC and 30 from Aits training images)
 
 We also test and validated that with 10 and 10 test and validation set. 
 
 Then for testing Malou's models we had problem
 The model predict some corrupted images.
 We get Malou's Model_3, Model_14, Model_12, Model_39.
 from https://onedrive.live.com/?authkey=%21AsFmim3HgeIiZTQ&cid=CD17748C94D4252F&id=CD17748C94D4252F%2158036&parId=CD17748C94D4252F%2158033&o=OneUp
 
 I tried with TF v1 as TF=1.14,
 
 but not nice results-
 
 Given task to Klara to make the same env as Malou to try again.
 
 In the meanwhile, I trained the model again on CPU for 30 epochs on the same 130 images and tes
 t and validate on 10 images. Results were not that good
 So, I tried again with GPU on TF v2.
 I had to change the model to use TF 2 and all keras... to tf.keras...
 Layer packages also had to be changed..
 
 I saw a very low value as inside and boundary precision and recall.
 
 Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
 
 changing TF 2.1 to 2.2 solved this error but still we got intrinsic_precision and recall as 25%.
 
 In the mean time Malou shared her 4 models which is 3, 14, 12, and 39.
 making same environment was very important.
 
 Before adding any package I had to run the following commands to make the package and envs directories on the shared_direcrtory.
 
 
 conda config --add envs_dirs /big_partition/users/user/.conda/envs
 conda config --add pkgs_dirs /big_partition/users/user/.conda/pkgs
 Then the following packages with following versions were installed!!
 Again, the versions are very important!!
 versions:
Python - 3.6.13
Keras - 2.2.4
tensorflow - 1.13.1
scikit-image - 0.14.1
cudatoolkit - 10.0.130


and numpy should be 15.1.0

-------------------------------------------------------------
19 December till 5th Jan
Gave the task for klara to run Malou's two other models on model 12, and model 39.

Copied the notebook in her directory to calculate the score for Malou's models 

And practice the extraction, conversion code on more images.

my images are darker on UNET. Malou's images are brighter in Segm folder. (Should fix that)
 
 copying everything to onedrive
 
 warmup
 shap values
 and gradcam and adding more layers
 
 
 -------------------------------------------------------------
 15 of Jan
 During the last week, I was writing histo 1.5 paper.
 Klara was working on Sonja's comments on hovernet and UNEt evaluation.
 
 There was some error in segmentation error cell.
 I need to fix it.
 
 I trained UNET but only got 25% intristic accuracy. (Something wrong with Dataset I think)
 
 There was some bug regarding number of true objects. Sonja found it and I fixed it in hovernet notebook in klara's directory.
 
 EVerywhere the number of truely found objects should be subtracted by 1.
 even in F1_score and False_negatives and False_positive functions.
 And segmentation error is missed ones (red points not blue not purple!!)
 
 -------------------------------------------------------------------
 Feb 13
 In the last week I was working on the Histo 1.5 Gradcam part which I found something myabe not correct.
 
 In Histo 1.5 paper The first unsupervised part is almost done and only formulas should be written. 13 of Feb done!
 
 In the gradcam part different activation functions are tested. I saw some problems with relu or elu or leakyrelu+relu. Leaky Relu had the best performance
 
 I started to write the mathematic behind it. However, in gradcam there is nothing regrading those functions and the gradient is calculated from output w.r.t. last
 conv layer.
 
 I tried to rewrite the code and calculate grads again. because I could not trust the result of tf.gradient!
 
 However, It took too much time. tf.gradient is correct
 --------------------------------------------------------------------------
 March 17
 My half-time is set to 4/20 9 o' clock.
 I am working on summary of projects, portfolio and manuscripts.
 I want to finish my hist 1.5 and histo 2 and segmentation.
 
 Klara is working on plates.
 She run the in parallel. I will add that part soon.
 
 Now I am just finishing the Game part,
 I just add only 100 images for each channel 
 
 delete from annotate_table where rowid > (select rowid from annptate_table limit 99,1);
 Also make all labels empty by
 update annotate_table set first_label='' where first_label!='';
 The code will be deposited to github.
 
 
 ---------------------------------------------------------------------------
 20 of March
 Aim:
 plotting results for plates with Klara
 finishing portfolio
 and summary today.
 
 
 Notes:
we were checking control columns.
There are three negative control columns (non-targeting siRNA, transfection agent only, and  non-transfected cells)
so basically non-transfected cells are normal without anything.
transfection agent is the agent prepare cells to take siRNA but no siRNA
and non-targeting siRNA has siRNA with transfection agent but the siRNA target no gene (no sequence.)

There are 3 positive controls (LIPA for lysosomal cell death  it is for )
two siRNA (siPLK1 and KIF11). 
 
 ----------------------------------------------------------------------------
22 of March
I finished my portfolio and shared with Sonja.
Portfolio:
https://maharaportfolio.medicin.lu.se/view/view.php?t=ff4eff80c95d08c5e8a4
for creating this link we 
press three lines/Pages and collections/ click on the correct page/Edit/share/ create secret URL

------------------------------------------------------------------------------
About projects:
1) Segmentation Project with Klara.
We are wroking on predicting more plates.
All files are on 
/proj/berzelius-2021-21/users/klara/Segmentation/
which has all new prediction results in plate_Script and the notebooks including notebook for counting all the cells and mean of size of cells (nuclei) in notebooks
directory.
The raw downloaded data from Swestore is saved in raw_data folder.
We have  Barcode_plates_new.txt which has all the barcodes and A, B , C, D tag in which A, and B are without oxidative stress and C and D are with oxidative stress.
B and D are the repetition of A and C.
The data from Aits Lab and also BBBc all are in Klara's folder.
The rest of the files are in 
OneDrive - Lund University\Segmentation_UNET_HoverNET directory which has the manuscript,
files from Sonja for the experiment and the notebook for extracting barcodes and associated plates for each barcode number.
Note from Sonja:
Sonja aits:

you can find the correct plate as I described above: so, basically the steps are: 1. take list in "final plates" tab in screen qc.xlsx file 2. for each plate get the Unique Plate Descriptor from the file by matching the PlateId listed found in step 1 with the Unique Plate Descriptor in the file sonja_image_list.csv. If there are multiple Unique Plate Descriptors matching the PlateId, use the one where the Protocol column says "sonja_screen_spots20170524(v03)". Normally, there should be only one Unique Plate Descriptor that has this. However, in some rare cases a plate may have been run in two pieces, so then half of the plate has one Unique Plate Descriptor and the other half has another one. If you find a plate with more than one Unique Plate Descriptor, just double check with me. 3. With the Unique Plate Descriptor you can find the image folder with the same name. 4. To find the genes that were knocked down on this specific plate, compare the number in the PlateId to the Plate numbers in the file VCFG_Hu_siGENOME. For the control columns check the file "controls_sonja_aits_primary_screen.txt
so, look at the Protocol column. It should say "sonja_screen_spots20170524(v03)"
if you find more than one with this protocol, check with me


------------------------------------------------------------------------------
2) Game_project

For the rest of the day I am going to finish my Summary the game part and its manuscript as much as I can and push the code to 
https://github.com/Aitslab/Online_Game

with 100 images for each level for now.

The manuscript is on manuscripts directory shared on One-drive. 
-------------------------------------------------------------------------------
March 29
NLP project
I cleaned Histo1 and 1.5 data and put datasets created from Histo main data in lunarc shared folder in data_created_Salma/Histology_data
Then I wanted to test my flair models on gold standard data and also Rafsan wanted Flair based models (trained on HunFlair) results on CRAFT dataset
which was chemical , gene, and species.

I spend 29th of March on Flair model and evaluting it on CRAFT and GS.
I was trying to use predict function first.

So basically define a SequenceTagger object from SequenceTagger class and then load the weights from model which was on the model repo:
https://nlp.informatik.hu-berlin.de/resources/models/

or just Sequencetagger.load('tag_name')## e.g. 'hunflair-chemical' mentioned in flair.models in sequence_tagger.py

after that I realized predict funtion need the data to be in Sentence object (If I fed tokenized text such as corpus.test).
So, after I worked with predict function I even try to feed the test.tsv file in foor loop of sentences.
for k in corpus.test.datasets[0][0:5]:
    print(k)
results:
Sentence[15]: "Complex trait analysis of the mouse striatum : independent QTLs modulate volume and neuron number"
Sentence[1]: " Abstract"
Sentence[1]: " Background"
Sentence[45]: " The striatum plays a pivotal role in modulating motor activity and higher cognitive function . We analyzed variation in striatal volume and neuron number in mice and initiated a complex trait analysis to discover polymorphic genes that modulate the structure of the basal ganglia ."
Sentence[1]: " Results"
Sentence[187]: " Brain weight , brain and striatal volume , neuron-packing density and number were estimated bilaterally using unbiased stereological procedures in five inbred strains ( A/J , C57BL/6J , DBA/2J , BALB/cJ , and BXD5 ) and an F2 intercross between A/J and BXD5 . Striatal volume ranged from 20 to 37 mm3 . Neuron-packing density ranged from approximately 50,000 to 100,000 neurons/mm3 , and the striatal neuron population ranged from 1.4 to 2.5 million . Inbred animals with larger brains had larger striata but lower neuron-packing density resulting in a narrow range of average neuron populations . In contrast , there was a strong positive correlation between volume and neuron number among intercross progeny . We mapped two quantitative trait loci ( QTLs ) with selective effects on striatal architecture . Bsc10a maps to the central region of Chr 10 ( LRS of 17.5 near D10Mit186 ) and has intense effects on striatal volume and moderate effects on brain volume . Stnn19a maps to distal Chr 19 ( LRS of 15 at D19Mit123 ) and is associated with differences of up to 400,000 neurons among animals ."


So, basically feed corpus.test[0] in a for loop and took the output and convert it to tokens and get the tags.
It is still maybe a good and accurate way.

But!


There was also other problem with ELMOembeddings. ELmoembeddings is not supported anymore in newer version of allennlp.
only in allennlp=0.9.0 and less

and this version has conflict with transformers>3.

So, I think ELmoEmbeddings should be removed and we can not use them anymore.

#################################################
The other thing is I used evalute function instead of prediction.
In which the format to use it was 
from flair.data import Sentence
from flair.models import SequenceTagger
# load tagger
tagger = SequenceTagger.load('hunflair-chemical')
result = tagger.evaluate(corpus.test, mini_batch_size=32, out_path=f"predictions.txt", gold_label_type="ner")
print(result.detailed_results)

I am waiting to check the results of CRAFT chemical, CRAFT gene, CRAFT species and GS chemical , disease, and protein.
-----------------------------------------------------------------------------
March 30-2023
Half time
Game manuscript done and I shared it with Sonja and Rafsan on manuscripts directory on Onedrive,

I started to finish Histo2 and Histo 1.5 manuscript, both on manuscript folder on Onedrive.

I finished summary of projects and shared it with Sonja. done

Segmentation Project
me and Klara finished 8 plates (1A,1B,1C,1D) and (2A,2B,2C,2D).
I have downloaded 3 series for Klara and she is working on them.
We had the simple plot for all wells before. But now we have control wells and we can plot the same plot as Sonja shared with us with the number of cells
and also the area of cells for all wells and negative and positive control wells.

All segmentation files are on berzelius /proj/berzelius-2021-21/klara/Segmentation directory.
I download all new plates in raw_data folder where we keep also barcode information and corresponding plates for each barcode and we follow it.

The first part of segmentation manuscript is finished which was comparison of hovernet and unet, we are working on the second part which is the result on the whole screen (as much as we can!)

----------------------------------------------------------------------------
April 3, 2023
Histology Project
I updated all figures in histo 2 paper. 
The models were on lunarc/Salma-files/kebnekaise-data/Salma_files/ctrl_lps_analysis
both for VGG16 and Resnet50 for individual models


For total score of regression models also models were there for VGG, resnet, and EffnetB0-B4

The codes in notebooks are on Onedrive and my own computer. I will update and clean files after finishing manuscript.
Today I am adding scores to tiles and create main image and after that finishing writing the manuscript histo 2.


I copied weights for best model of total score for 5 fold cross validation which was VGG16_LReLU to my computer in 
Image_project/Histology_binary/jobs/regression_models/...


I have all histology updated notebooks in my computer categorized into four binary, 3class, unsupervised and regression models.

However, for histo 1.5 I continued and corrected unsupervised analysis based on 3class models and save in 3class section on my computer. So, it would be binary and regression models, in histo 2 paper, 
3class models in histo 1 paper, 
and unsupervised + gradcam + shap in histo 1.5. Files are saved in 3class part.

I have four category weights:
3 class runs, 
warm up runs
change of activation function such as elu, and ..
train more layers


I will do the same for regression models 
Total score
and individual scores

and each preprcoeesing steps including taking Median or Average as gold score. Adding that to the names of images and creating different folds from data.
The files are now on Lunarc, berzelius, and alvis and my computer.
----------------------------------------------------------------------------------------------------
April 20 
I have had my half-time defence.
I had some comments from opponents.
I will continue my logbook in another file for the second half of my PhD and I will work more on documentation.
----------------------------------------------------------------------------------------------------


------------------------------------------------------------------------


----------------------------------------------------------------------