This repository contains the complete analysis pipeline for the research project: "Deciphering Asymmetric Regulatory Logic in 17q Amplified Breast Cancer via Boolean Modeling and BRCA1 Stratification."
Chromosomal amplification at 17q is a driver of breast cancer, yet linear correlation methods fail to capture the asymmetric ("if-then") regulatory logic governing this system. This project utilizes a Boolean Implication Network approach to map these directed interactions across transcriptomic (RNA-seq) and epigenomic (DNA Methylation) layers.
By stratifying patients based on 17q Copy Number Status (GAIN vs. DIS) and BRCA1 Expression (Up vs. Down), this workflow uncovers hidden topological architectures—such as "lock-in" regulatory loops and network bottlenecks—that predict therapeutic vulnerabilities.
The code is organized by analytical stage, processing data from raw TCGA downloads to graph database visualization.
Directory: tcga-brca-integration/
-
tcga_brca_integration.r: Orchestrates the retrieval and integration of multi-omics data:-
RNA-seq & Methylation: Downloads TCGA-BRCA data via
TCGAbiolinks. - Copy Number Variation (CNV): Integrates putative arm-level CNV data retrieved separately from cBioPortal.
-
Preprocessing: Normalizes RNA-seq to Log2TPM and Methylation
$\beta$ -values to M-values. -
Global Stratification: Segregates samples into GAIN-Chr17q (
$n=106$ ) vs. DIS-Chr17q ($n=331$ ).
-
RNA-seq & Methylation: Downloads TCGA-BRCA data via
Directory: differential-analysis/
Performs limma-based differential expression (DEG) and methylation (DMP) analysis using a MaxRowVariance feature selection strategy to prioritize biologically dynamic features on Chromosome 17.
-
Global Contrasts:
deg_chr17q_script.randdmp_chr17q_script.ranalyze GAIN vs. DIS vs. Control. -
BRCA1 Stratification:
deg_brca1_variant_script.randdmp_brca1_variant_script.ranalyze upBRCA1 ($n=16$ ) vs. downBRCA1 ($n=17$ ) subsets to isolate BRCA1-dependent regulatory programs.
Directories: gene-probe-anno/ & pre-stepminer/
gene_annotation_refactor.r: Maps Ensembl and Probe IDs to gene symbols and chromosomal locations.pre-stepminer/: Refactors and formats expression/methylation matrices for specific cohorts (GAIN, DIS, upBRCA1, downBRCA1) to match the input requirements of theStepMineralgorithm.
Directory: stepminer–booleannet-workflow/
The core computational engine for discretizing continuous data and inferring logic.
StepMiner_algorithm.ipynb&stepminer-1.1.jar: Discretizes continuous omics data into Boolean states (Low, Intermediate, High).booleannet_pipeline.sh: Shell script orchestrating theBooleanNetalgorithm to identify asymmetric implications (e.g., High → High, Low → Low).extract_exp.pl/extract_met.pl: Core parsers that mine significant Boolean implication rules from raw BooleanNet logs, tailored for Expression and Methylation layers.
Directory: permutation-test/
Implements rigorous permutation testing (
permutation_test_script_exp.sh/_met.sh: Validation orchestrators that automate sample randomization. They compute empirical FDR based solely on process exit codes and implication counts, remaining agnostic to internal data headers or formatting.extract_permutation.pl: The parsing engine for null model generation. It adapts the extraction logic to permuted datasets, ensuring flexible rule counting regardless of column consistency.
Directories: interlayer-construction/ & network_analysis/
- Interlayer Construction: Scripts like
brca1_interlayer_integration.rmap DNA methylation nodes to gene expression nodes (Same-Gene Interlayer Mapping) to model cis-regulatory effects. - Network Analysis: Scripts like
brca1_network_ud_contrast.randgain_dis_network_gd_contrast.ranalyze topology (centrality, degree distribution) across the different biological contrasts.
Directory: neo4j-cypher/
core_neo4j_query_script.cypher: Cypher queries to load nodes/edges into Neo4j and perform graph-based queries (e.g., identifying Ego-Networks centered on BRCA1).pre-prosessing-neo4j.txt: Guidelines for formatting CSVs for Neo4j import.
To run the full pipeline, the following tools are required:
- R (v4.5.2):
TCGAbiolinks,limma, etc. - Python (v3.12) & Jupyter: Required for running the
StepMinernotebook. - Java Runtime Environment (JRE): Required for
stepminer-1.1.jar. - Perl: Required for output parsing scripts (
.pl). - Neo4j: For graph database management and visualization.
- Asymmetric Dominance: The regulatory landscape of 17q-amplified tumors is dominated by asymmetric subset implications (High → High), reflecting a "lock-in" of oncogenic states.
- BRCA1 Topology:
- upBRCA1 networks preserve regulatory heterogeneity.
- downBRCA1 networks show program consolidation, serving as a topological proxy for Homologous Recombination Deficiency (HRD).
- Amplification-Driven Decoupling: We identified a regulatory paradox where 17q gene amplification overrides canonical epigenetic silencing. Genes like BRIP1, MED13, and MAPT exhibit transcriptional upregulation despite promoter hypermethylation, driven by massive gene dosage pressure.
- Critical Bottlenecks: Network analysis identified MAPT (Betweenness Centrality 4.533) as a critical bottleneck mediating signal propagation from BRCA1 to peripheral gene clusters.
...
This project is licensed under the MIT License - see the LICENSE file for details.