Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug with tbl_graph #250

Closed
Gian77 opened this issue Jun 18, 2020 · 8 comments
Closed

possible bug with tbl_graph #250

Gian77 opened this issue Jun 18, 2020 · 8 comments

Comments

@Gian77
Copy link

Gian77 commented Jun 18, 2020

Hello,
I am working with coral microbiome and I have a dataset that includes Fungi, Bacteria, Apicomplexa and Symbiodinium. I am trying to build a network out form nodes and edges data frames and plot with facets in ggraph to display connections between different organisms (e.g. Inter-Kingdorm = Fungi-Fungi and Intra-Kingdom = Bacteria-Fungi connections). I have generated the two networks, I want to compare, in SpecEasi. I pooled out nodes and edges data frames form both. I put them together using rbind.data.frame and then put them in the same tbl_graph. Please see the code below and the attached image and datasets.

My problem is that the facet do not really work. When I try to facet my graph by edges Inter-Kingdorm and Intra-Kingdom connections I have all the edges plotted wrongly in the two panels (see figure). In the Intra-Kigdom panel I am supposed to see only connections that connect nodes form the same Kingdom. I have checked in the data frames, before tbl_graph and they look fine. I presume there is a bug and when tbl_graph tries to match edges names to nodes, it messes the order up. I am including the code as wee was the data frames so it can be reproduced (see below).

# putting all nodes and edges together -----------------------
edges_total <- rbind.data.frame(edge_bl_Acro, edge_un_Acro)
head(edges_total)
# rename columns
colnames(edges_total)[1] <- "from"
colnames(edges_total)[2] <- "to"

nodes_total <- rbind.data.frame(node_bl_Acro, node_un_Acro)
head(nodes_total)
dim(nodes_total)
# rename rows
rownames(nodes_total) <- seq(1:228)

# creatign the table_graph -----------------------------
net_ggraph <- tbl_graph(nodes = nodes_total, edges = edges_total, node_key = "OTU")
net_ggraph

# inspecting the graph ---------------------------------
net_ggraph %>% activate(nodes) %>% as.data.frame() 
net_ggraph %>% activate(edges) %>% as.data.frame() # the edges number do not correspond to nodes names!

net_ggraph %>%
  #activate(edges) %>%
  ggraph(layout = "dh") +
  #ggraph(layout = "centrality", cent = graph.strength(net_ggraph_new, mode = "all"))+
  geom_edge_link0(aes(edge_width = weight, edge_colour = Direction)) +
  geom_node_point(aes(fill = Kingdom, size = Abundance, shape = Kingdom, colour = Kingdom)) + 
  #geom_node_text(aes(filter = Abundance >= 10,label = OTU), color="black", size =2) +
  facet_edges(~InterKing) +
  #facet_nodes(~Kingdom) +
  scale_fill_manual(values = c("#F0E442", "#D55E00","#009E73","#56B4E9"))+
  scale_color_manual(values = c("#F0E442", "#D55E00","#009E73","#56B4E9"))+
  scale_shape_manual(values = c(21,22,23,24)) +
  scale_edge_color_manual(values = c("red","grey")) + 
  scale_edge_width_continuous(range = c(0.1, 1.2))+
  scale_size(range = c(0.5, 6)) +
  theme_graph() +
    guides(size = FALSE) +
  theme(legend.key.height = unit(0.2, "cm"), legend.key.width = unit(0.3, "cm")) +
  theme(legend.title = element_text(size = 8, face = "bold"), legend.text = element_text(size = 7)) +
  theme(legend.position = "right") -> test_graph

test_graph

Thank you so much for your help!

Gian

test.pdf
node_edges_data.zip

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8   
 [6] LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] oaqc_1.0            snahelper_1.1.0     graphlayouts_0.7.0  tidygraph_1.2.0     igraph_1.2.5        ggraph_2.0.2.9000   ggplot2_3.3.1      
 [8] SpiecEasi_1.0.7     Biostrings_2.54.0   XVector_0.26.0      IRanges_2.20.1      S4Vectors_0.24.1    BiocGenerics_0.32.0

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                matrixStats_0.55.0          bit64_0.9-7                 RColorBrewer_1.1-2          GenomeInfoDb_1.22.0        
 [6] tools_3.6.3                 backports_1.1.7             R6_2.4.1                    rpart_4.1-15                Hmisc_4.3-0                
[11] DBI_1.1.0                   colorspace_1.4-1            nnet_7.3-14                 withr_2.2.0                 tidyselect_1.1.0           
[16] gridExtra_2.3               DESeq2_1.26.0               bit_1.1-14                  compiler_3.6.3              Biobase_2.46.0             
[21] htmlTable_1.13.3            DelayedArray_0.12.2         colourpicker_1.0            scales_1.1.1                checkmate_1.9.4            
[26] genefilter_1.68.0           stringr_1.4.0               digest_0.6.25               foreign_0.8-76              base64enc_0.1-3            
[31] jpeg_0.1-8.1                pkgconfig_2.0.3             htmltools_0.4.0             fastmap_1.0.1               htmlwidgets_1.5.1          
[36] rlang_0.4.6                 rstudioapi_0.11             huge_1.3.4                  RSQLite_2.2.0               VGAM_1.1-2                 
[41] shiny_1.4.0                 generics_0.0.2              farver_2.0.3                BiocParallel_1.20.1         acepack_1.4.1              
[46] dplyr_1.0.0                 RCurl_1.95-4.12             magrittr_1.5                GenomeInfoDbData_1.2.2      Formula_1.2-3              
[51] Matrix_1.2-18               Rcpp_1.0.4.6                munsell_0.5.0               viridis_0.5.1               lifecycle_0.2.0            
[56] stringi_1.4.6               MASS_7.3-51.6               SummarizedExperiment_1.16.1 zlibbioc_1.32.0             grid_3.6.3                 
[61] blob_1.2.0                  promises_1.1.0              ggrepel_0.8.2               crayon_1.3.4                miniUI_0.1.1.1             
[66] lattice_0.20-41             cowplot_1.0.0               splines_3.6.3               annotate_1.64.0             locfit_1.5-9.1             
[71] knitr_1.28                  pillar_1.4.4                GenomicRanges_1.38.0        pulsar_0.3.6                geneplotter_1.64.0         
[76] XML_3.98-1.20               glue_1.4.1                  latticeExtra_0.6-29         data.table_1.12.8           httpuv_1.5.2               
[81] png_0.1-7                   vctrs_0.3.1                 tweenr_1.0.1                gtable_0.3.0                purrr_0.3.4                
[86] polyclip_1.10-0             tidyr_1.1.0                 xfun_0.13                   ggforce_0.3.1               mime_0.9                   
[91] xtable_1.8-4                later_1.0.0                 survival_3.1-12             viridisLite_0.3.0           tibble_3.0.1               
[96] AnnotationDbi_1.48.0        memoise_1.1.0               cluster_2.1.0               ellipsis_0.3.1             
@thomasp85
Copy link
Owner

Hi - can you try to condense the issue down to a simpler case that does not require me to understand your scientific question. That would make it much easier to follow and debug

@Gian77
Copy link
Author

Gian77 commented Dec 11, 2020

Hello @thomasp85 and thanks for your reply,

I think the problem is more related to a network layout than other things, not sure how to solve it though. It could lead to misleading interpretation anyways.
The code below is a small reproducible example:

rstat_nodes <- data.frame(name = c("1", "2", "3", "4", "5"))
rstat_nodes$Kingdom <- c("Fungi", "Bacteria", "Bacteria", "Fungi", "Bacteria")
rstat_nodes$degree <- c(7, 12.5, 40, 18, 70)
rstat_nodes

rstat_edges <- data.frame(from = c(1, 1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5),
                          to = c(2, 3, 4, 1, 1, 2, 1, 2, 3, 1, 4, 3))
rstat_edges$tax1 <- c("F","F","F","B","B","B","F","F","F", "B","B", "B")
rstat_edges$tax2 <- c("B","B","F","F","F","B","F","B","B", "F", "F", "B")
rstat_edges$interkind <- paste(rstat_edges$tax1, rstat_edges$tax2, sep = "_")
rstat_edges$InterKing <- ifelse(rstat_edges$tax1 == rstat_edges$tax2,
                         "Intra", "Inter")
rstat_edges

ggraph_test <- tbl_graph(nodes = rstat_nodes, edges = rstat_edges, directed = FALSE)
ggraph_test

ggraph_test %>%
  activate(edges) %>%
  ggraph(layout = "dh") +
  geom_node_point(aes(size = degree, color = Kingdom)) +
  geom_edge_link0() +
  geom_node_text(aes(label = name), size =5) +
  facet_edges(~InterKing) 

If you look at the graph (attached) the vertex 1 is a Fungus and should not be connected with 3 an 2 in the Intra panel on the right. I think it is not but graphically it seems so. Do you see a way to fix this?

I hope this help to clarify.
Best,
Gian

test.pdf

@Gian77
Copy link
Author

Gian77 commented Dec 11, 2020

I am not totally sure though that is the only issue, if you look at this graph you can see FOTU_115 is connected to FOTU_201, but that connection does not exist in the data frames I used to generate the network.

> edges_Pop[edges_Pop$to=="FOTU_115",]
         from       to     weight   Crop Direction V1_taxa V2_taxa    InterKing InterKingTaxa
108 FOTU_1646 FOTU_115 0.02301645 Poplar  positive   Fungi   Fungi Intrakingdom   Fungi_Fungi
154   FOTU_84 FOTU_115 0.00801609 Poplar  positive   Fungi   Fungi Intrakingdom   Fungi_Fungi
> edges_Pop[edges_Pop$to=="FOTU_201",]
        from       to     weight   Crop Direction V1_taxa V2_taxa    InterKing InterKingTaxa
96 FOTU_1440 FOTU_201 0.01108665 Poplar  positive   Fungi   Fungi Intrakingdom   Fungi_Fungi

Additionally, even if existed, it should be in the panel Fungi_Fungi, since connects to fungal otus.
Some problems with POTU_159, connected to two other POTUs in the facet that should show just Fungi_Fungi connections.

Please have a look at test4.pdf

Thanks again,
Gian

@thomasp85
Copy link
Owner

The example you show is def a visual artefact of the DH layout - try using a different layout that doesn't have this characteristic... As for the other thing it is har to comment on - my guess is that something has gone wrong in the data-preprocessing but it is hard to tell. I'm pretty sure ggraph would not draw an edge that doesn't exist in the dataset

@Gian77
Copy link
Author

Gian77 commented Dec 12, 2020

Thanks for the prompt reply @thomasp85,

I believe you. I am not saying it is wrong, I am just trying to find a solution as I want to really use it in my analyses.
I am kind of lost since now I tried to just subset the nodes in a way that there are only FOTU - POTU edges. However, I am obtaining weird results. In the network there are POTU - POTU connections, bot how it is possible? what am I missing in here?

This is what I did

ggraph_test <- tbl_graph(nodes = nodes_test, edges = edges_test, node_key = "OTU_ID")
ggraph_test

ggraph_test %>%
  activate(edges) %>%
  ggraph(layout = "dh") +
  geom_node_point(aes(size = Abundance, color = Kingdom)) +
  geom_edge_link0() +
  scale_edge_width_continuous(range = c(0.1, 1.2)) +
  geom_node_text(aes(label = OTU_ID), size =2)

These are the two datasets for nodes and edges

edges_and_nodes.zip

and this is graph I obtained.
net1.pdf

Can you please try to reproduce and see if you get the same?
Thanks so much,

Gian

@thomasp85
Copy link
Owner

As I said, your graph is encoded wrong. How and when that happened I don't know but the column that encode inter/intra status is not correct. I'm sorry, but I do not have time to help you with the analysis since this does not appear to be an issue with one of my packages

@noelzach
Copy link

noelzach commented Dec 14, 2020

Hi, @thomasp85 and @Gian77. I think I have figured out a fix to the problem experienced by @Gian77. I was able to reproduce the issue with the code provided by @Gian77 and the data. The first problem was a visual artifact created by the layout "dh". However, the second problem seems to be related to adding the node_key option not matching the edges correctly. For example, using the data contained in the edges_and_nodes.zip file in the last comment by @Gian77:

colnames(nodes_Pop_inter) <- c("name", "Abundance", "Kingdom")
ggraph_test <- tbl_graph(nodes = nodes_Pop_inter, edges = edges_Pop_inter, directed = FALSE, node_key = "name")
ggraph_test

ggraph_test %>%
  #activate(edges) %>%
  ggraph(layout = "kk") +
  geom_node_point(aes(size = Abundance, color = Kingdom)) +
  geom_edge_link0() +
  scale_edge_width_continuous(range = c(0.1, 1.2)) +
  geom_node_text(aes(label = name), size =2) 

clearly has edges connecting fungi to fungi (blue dots to blue dots) and prokaryotes to prokaryotes (red to red) when those edges are not in the original edge table (edges_Pop_inter)
Rplot02

However, when I use the function as_tbl_graph to create a graph object with just the edge data, then left_join the associated metadata to the graph's nodes, everything is match up correctly.

ggraph_test <- as_tbl_graph(edges_Pop_inter)
ggraph_test

ggraph_test2 <- ggraph_test %>%
  activate(nodes) %>%
  left_join(nodes_Pop_inter, by = c(name = "OTU_ID"))

ggraph_test2 %>%
  #activate(edges) %>%
  ggraph(layout = "kk") +
  geom_node_point(aes(size = Abundance, color = Kingdom)) +
  geom_edge_link0() +
  scale_edge_width_continuous(range = c(0.1, 1.2)) +
  geom_node_text(aes(label = name), size =2)

Rplot01

So, as @Gian77 has mentioned, not sure if this is a bug with the node_key option in tbl_graph or if we are not using tbl_graph properly. If we are not using tbl_graph properly then an explanation of why would be helpful, and I think this issue can then be closed.

@thomasp85
Copy link
Owner

Veeery belated but this was/is an issue in tidygraph that did not correctly match edge and node id if the edge ids are encoded as factors instead of characters. Will push a fix to tidygraph shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants