Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerise #16

Merged
merged 4 commits into from
Apr 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,14 @@ replay_pid*

.idea
/dependency-reduced-pom.xml

# python
__pycache__/
*.pyc
.venv/

# test html reports
*.html~

# app properties that contains credentials
/application.properties
45 changes: 45 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
FROM maven:3-eclipse-temurin-21 as jar_builder

# Set the working directory in the Maven image
WORKDIR /app

# Copy the java source files and the pom.xml file into the image
COPY src ./src
COPY pom.xml .

# Build the application
RUN mvn clean package -DskipTests

FROM maven:3-eclipse-temurin-21

# download system dependencies first to take advantage of docker caching
RUN apt-get update; apt-get install -y --no-install-recommends \
build-essential \
default-mysql-client \
default-libmysqlclient-dev \
python3 \
python3-setuptools \
python3-dev \
python3-pip \
unzip \
perl \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install wheel

# Install any needed packages specified in requirements.txt
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

RUN ln -s $(which python3) /usr/local/bin/python || true

COPY --from=jar_builder /app/core-*.jar /
COPY scripts/ scripts/
RUN chmod -R a+x /scripts/

# Set the working directory in the container
WORKDIR /scripts/

ENV PORTAL_HOME=/

# This file is empty. It has to be overriden by bind mounting the actual application.properties
RUN touch /application.properties
87 changes: 87 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,90 @@ Use Maven to run the integration tests. Ensure you are in the root directory of
```
mvn integration-test
```

## Development

### Prerequisites
To contribute to `cbioportal-core`, ensure you have the following tools installed:

- Python 3: Required for study validation and orchestration scripts. These scripts utilize the underlying loader jar.
- Perl: Specify the version required based on script compatibility. Necessary for data loading scripts interfacing with lookup tables.
- JDK 21: Essential for developing the data loader component.
- Maven 3.8.3: Used to compile and test the loader jar. Review this [issue](https://github.com/cBioPortal/cbioportal-core/issues/15) before starting.

### Setup

1. Create a Python virtual environment (first-time setup):
```bash
python -m venv .venv
```

2. Activate the virtual environment:
```bash
source .venv/bin/activate
```

3. Install required Python dependencies (first-time setup or when dependencies have changed):
```bash
pip install -r requirements.txt
```

### Building and Testing

After you are done with the setup, you can build and test the project.

1. Execute tests through the provided script:
```bash
source test_scripts.sh
```

2. Build the loader jar using Maven (includes testing):
```bash
mvn clean package
```
*Note:* The Maven configuration is set to place the jar in the project's root directory to ensure consistent paths in both development and production.

### Configuring Application Properties

The loader requires specific properties set to establish a connection to the database. These properties should be defined in the application.properties file within your project.

#### Creating the Properties File

1. Begin by creating your application.properties file. This can be done by copying from an example or template provided in the project:
```bash
cp application.properties.example application.properties
```

2. Open application.properties in your preferred text editor and modify the properties to match your database configuration and other environment-specific settings.

#### Setting the PORTAL_HOME Environment Variable

The PORTAL_HOME environment variable should be set to the directory containing your application.properties file, typically the root of your project:
```
export PORTAL_HOME=$(pwd)
```
Ensure this command is run in the root directory of your project, where the application.properties file is located. This setup is crucial for the loader to correctly access the required properties.

#### maven.properties
TODO: Document role of `maven.properties` file.

### Script Execution with Loader Jar

To run scripts that require the loader jar, ensure the jar file is in the project root.
The script will search for `core-*.jar` in the root of the project:
```bash
python scripts/importer/metaImport.py -s tests/test_data/study_es_0 -p tests/test_data/api_json_unit_tests -o
```

## Running in docker

Build docker image with:
```bash
docker build -t cbioportal-core .
```

Example of how to start the loading:
```bash
docker run -it -v $(pwd)/data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core python importer/metaImport.py -s /data/study_es_0 -p /data/api_json -o
```

5 changes: 5 additions & 0 deletions application.properties.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
spring.datasource.url=jdbc:mysql://localhost:3306/cbioportal?useSSL=false
spring.datasource.username=cbio
spring.datasource.password=P@ssword1
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.jpa.database-platform=org.hibernate.dialect.MySQL5InnoDBDialect
11 changes: 11 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,11 @@
<resource>
<directory>${project.basedir}</directory>
<includes>
<!-- FIXME Remove these includes when running loader from the cbioportal docker image is deprecated. Loading should be done from cbioportal-core docker image directly instead. -->
<!-- Includes python/perl/shell scripts with the jar to transfer to cbioportal project -->
<!-- See here https://github.com/cBioPortal/cbioportal/blob/14e00a6c5580700b96152c45f34baf150e652c89/docker/web-and-data/Dockerfile#L48 -->
<include>requirements.txt</include>
<include>scripts/</include>
</includes>
</resource>
</resources>
Expand Down Expand Up @@ -323,6 +327,13 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<outputDirectory>${project.basedir}</outputDirectory>
</configuration>
</plugin>
</plugins>
</build>

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 7 additions & 7 deletions src/main/resources/scripts/env.pl → scripts/env.pl
Original file line number Diff line number Diff line change
Expand Up @@ -39,18 +39,18 @@
}

# Set up Classpath to use the scripts jar
sub locate_src_root {
sub locate_root {
# isolate the directory this code file is in
my ($volume, $script_dir, undef) = File::Spec->splitpath(__FILE__);
# go up from cbioportal/core/src/main/scripts/ to cbioportal/
my $src_root_dir = File::Spec->catdir($script_dir, (File::Spec->updir()) x 4);
# go up one level
my $root_dir = File::Spec->catdir($script_dir, File::Spec->updir());
# reassamble the path and resolve updirs (/../)
return abs_path(File::Spec->catpath($volume, $src_root_dir));
return abs_path(File::Spec->catpath($volume, $root_dir));
}
$src_root = locate_src_root();
@jar_files = glob("$src_root/scripts/target/scripts-*.jar");
$root_dir = locate_root();
@jar_files = glob("$root_dir/core-*.jar");
if (scalar @jar_files != 1) {
die "Expected to find 1 scripts-*.jar, but found: " . scalar @jar_files;
die "Expected to find 1 core-*.jar, but found: " . scalar @jar_files;
}
$cp = pop @jar_files;

Expand Down
File renamed without changes.
14 changes: 7 additions & 7 deletions src/main/resources/scripts/envSimple.pl → scripts/envSimple.pl
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,18 @@
}

# Set up Classpath to use the scripts jar
sub locate_src_root {
sub locate_root {
# isolate the directory this code file is in
my ($volume, $script_dir, undef) = File::Spec->splitpath(__FILE__);
# go up from cbioportal/core/src/main/scripts/ to cbioportal/
my $src_root_dir = File::Spec->catdir($script_dir, (File::Spec->updir()) x 1);
# go up one level
my $root_dir = File::Spec->catdir($script_dir, File::Spec->updir());
# reassamble the path and resolve updirs (/../)
return abs_path(File::Spec->catpath($volume, $src_root_dir));
return abs_path(File::Spec->catpath($volume, $root_dir));
}
$src_root = locate_src_root();
@jar_files = glob("$src_root/core-*.jar");
$root_dir = locate_root();
@jar_files = glob("$root_dir/core-*.jar");
if (scalar @jar_files != 1) {
die "Expected to find 1 scripts-*.jar, but found: " . scalar @jar_files;
die "Expected to find 1 core-*.jar, but found: " . scalar @jar_files;
}
$cp = pop @jar_files;

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -501,10 +501,11 @@ def locate_jar():
"""
# get the directory name of the currently running script,
# resolving any symlinks
script_dir = Path(__file__).resolve().parent
# go up from core/scripts/importer/ to core/
src_root = script_dir.parent.parent
jars = list((src_root ).glob('core-*.jar'))
this_file = Path(__file__).resolve()
importer_dir = this_file.parent
scripts_dir = importer_dir.parent
root_dir = scripts_dir.parent
jars = list((root_dir).glob('core-*.jar'))
if len(jars) != 1:
raise FileNotFoundError(
'Expected to find 1 scripts-*.jar, but found ' + str(len(jars)))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -997,7 +997,9 @@ def run_java(*args):
java_command = os.path.join(java_home, 'bin', 'java')
else:
java_command = 'java'
process = Popen([java_command] + list(args), stdout=PIPE, stderr=STDOUT,
full_cmd = [java_command] + list(args)
print(">", " ".join(full_cmd))
process = Popen(full_cmd, stdout=PIPE, stderr=STDOUT,
universal_newlines=True)
ret = []
while process.poll() is None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -957,7 +957,7 @@ def load_chromosome_lengths(reference_genome, logger):
'downloadChromosomeSizes.py.')

logger.debug("Retrieving chromosome lengths from '%s'",
chrom_size_file)
os.path.basename(chrom_size_file))

try:
chrom_size_dict = chrom_sizes[reference_genome]
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions test_scripts.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pushd tests/; PYTHONPATH=../scripts:$PYTHONPATH python -m unittest *.py; popd
Loading
Loading