Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cytoscape includes strains from the entire database even with the --include-files option #309

Open
sydelstan opened this issue Apr 25, 2024 · 5 comments

Comments

@sydelstan
Copy link

Version
poppunk version: 2.5.0

Commands
poppunk_assign --db GPS_v4 --query qfile.txt --output poppunk_clusters --threads 8 --external-clustering meta.csv --update-db

poppunk_visualise --ref-db poppunk_clusters --output grapetree_X --grapetree --include-files strains.csv --external-clustering meta.csv
poppunk_visualise --ref-db poppunk_clusters --output phandango_X --phandango --include-files strains.csv --external-clustering meta.csv
poppunk_visualise --ref-db poppunk_clusters --output cytoscape_X --cytoscape --network-file poppunk_clusters_refs_graph.gt --include-files strains.csv --external-clustering meta.csv

Output
Graph-tools OpenMP parallelisation enabled: with 1 threads
PopPUNK: visualise
Loading previously refined model
Completed model loading
Reading existing tree from grapetree_Ia/grapetree_X_core_NJ.nwk
Writing grapetree output
Parsed data, now writing to CSV
Unable to write phylogeny to grapetree_X/grapetree_X_core_NJ.nwk

Done

Graph-tools OpenMP parallelisation enabled: with 1 threads
PopPUNK: visualise
Loading previously refined model
Completed model loading
Reading existing tree from phandango_Ia/phandango_X_core_NJ.tree
Writing phandango output
Parsed data, now writing to CSV
Unable to write phylogeny to phandango_Ia/phandango_X_core_NJ.tree

Done

Graph-tools OpenMP parallelisation enabled: with 1 threads
PopPUNK: visualise
Loading previously refined model
Completed model loading
Writing cytoscape output
Network loaded: 3616 samples
Parsed data, now writing to CSV

Describe the bug
I was hoping to obtain grapetree, phandango, and cytoscape visualizations that only included the strains listed in strains.csv. Without the --include-files option, the visualization contains strains from the entire database. This option works for grapetree and phandago, but not cytoscape. The cytoscape figrure doesn't include the entire database with this option (without the option it includes the full database), but it still includes thousands of samples not listed in strains.csv

@johnlees
Copy link
Member

Can you try with the latest version (v2.6.5) and confirm whether you still get this issue there? Looking at the release history I see we've made multiple changes to the visualisation code since v2.5.0

@sydelstan
Copy link
Author

I ran the same lines with v2.6.5 and I ended up with even more strains included in the cytoscape network. Am I calling the correct reference database and network file? Is it appropriate to simply remove the extraneous strains from the final network or is there something weird going on with the network generation overall?

@johnlees
Copy link
Member

Thanks for the report and re-running. Looking at the code, where we (try) to do this is in these two places:

I can't see an obvious issue so would need to try and reproduce.

It would also help if you could let me know the numbers of files included in the database, visualisation and subset file; and also give an example of a strain that's include in the visualisation but not the subset file.

Am I calling the correct reference database and network file?

Those commands look alright to me, with the possible exception that if you are running --update-db you need the full database not just the references (I see around ~3600 loaded from one of the messages) – but I can't tell for sure that's a problem from the output above. Did you use the full or reference DB?

Is it appropriate to simply remove the extraneous strains from the final network or is there something weird going on with the network generation overall?

Yes that would be fine as that's exactly what the code is supposed to do.

@sydelstan
Copy link
Author

It would also help if you could let me know the numbers of files included in the database, visualisation and subset file; and also give an example of a strain that's include in the visualisation but not the subset file.

I believe the database contains 40K strains. For version v2.6.5, running the below command got me the closest:

poppunk_visualise --ref-db poppunk_clusters --output cytoscape_Ia --cytoscape --network-file poppunk_clusters.refs_graph.gt --include-files strains.csv --external-clustering meta.csv

the network has 5,090 strains, and the subset file has 1,693 strains. Sorry, I can't tell which strains from the database are included in the network. Only the ID's I provided for the strains I used to update the database carry over to cytoscape, the database strains are given a number.

Those commands look alright to me, with the possible exception that if you are running --update-db you need the full database not just the references (I see around ~3600 loaded from one of the messages) – but I can't tell for sure that's a problem from the output above. Did you use the full or reference DB?

For the visualization, I am using the database that is the output of this command:

poppunk_assign --db GPS_v4 --query qfile.txt --output poppunk_clusters --threads 8 --external-clustering meta.csv --update-db

I am not sure if this is considered the full or reference database.

Yes that would be fine as that's exactly what the code is supposed to do.

Okay, if it is okay to delete extraneous/nodes strains while preserving the network architecture I will probably just do that then!

@sydelstan
Copy link
Author

I am also having issues with assigning GPSCs -- the external clustering file labels every strain as "NA" even though the GPSC of most of these strains have been previously established. My cytoscape network also produces a lot more distinct clusters than I'd expect given how closely related the strains are, so I wonder if these issues are related at all to the previous one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants