cellmaps_hierarchyeval package
Submodules
cellmaps_hierarchyeval.analysis module
- class cellmaps_hierarchyeval.analysis.Assembly(node_id=None, gene_names=None)[source]
Bases:
object
Represents assembly in a hierarchy
Constructor
- class cellmaps_hierarchyeval.analysis.FakeGeneSetAgent(random_seed=None, attribute_name_prefix=None)[source]
Bases:
GenesetAgent
Fake geneset agent that generates random numbers for values
Constructor :param random_seed:
- class cellmaps_hierarchyeval.analysis.GenesetAgent(attribute_name_prefix=None)[source]
Bases:
object
Represents a Gene set analysis agent whose job is to consume a list of gene names and return a term name, confidence score, and analysis
Constructor
- GENE_SET_TOKEN = 'GENE_SET'
- class cellmaps_hierarchyeval.analysis.Hierarchy(hierarchy=None, interactome=None, ndex_username=None, ndex_password=None)[source]
Bases:
object
Represents an assembly of proteins in a Hierarchy
Constructor :param hierarchy: Hierarchy :type hierarchy:
CX2Network
:param interactome: Parent interactome :type interactome:CX2Network
:param ndex_username: NDEx username to use when connectingto NDEx to obtain interactomes from hierarchy
- Parameters:
ndex_password (str) – NDEx password to use when connecting to NDEx to obtain interactomes from hierarchy
- class cellmaps_hierarchyeval.analysis.OllamaCommandLineGeneSetAgent(prompt=None, model='llama2:latest', ollama_binary='/usr/local/bin/ollama', attribute_name_prefix=None)[source]
Bases:
GenesetAgent
Runs
Constructor
- Parameters:
prompt (str) – Prompt to pass to LLM put @@GENE_SET@@ into prompt to denote where gene set should be inserted. If
None
default internal prompt is used
- DEFAULT_PROMPT_FILE = 'default_prompt.txt'
- annotate_gene_set(gene_names=None)[source]
Using prompt passed in via constructor, this call invokes the LLM specified by model set in constructor
- Parameters:
gene_names (list) – Genes to analyze
- Raises:
CellmapshierarchyevalError – If LLM failed to run
- Returns:
(‘process name (score)’, full output from LLM)
- Return type:
- class cellmaps_hierarchyeval.analysis.OllamaRestServiceGenesetAgent(prompt=None, model='llama2:latest', username=None, password=None, rest_url=None, temperature=0, max_tokens=1000, seed=42, attribute_name_prefix=None, max_retries=5, timeout=120, retry_wait=10)[source]
Bases:
GenesetAgent
Calls LLM via REST service. Derived from ServerModel_LLM in https://github.com/idekerlab/agent_evaluation llm.py
Constructor
- Parameters:
prompt (str) – Prompt to send to LLM
model (str) – Name of model
username (str) – Username to send via Basic Auth to service
password (str) – Password to send via Basic Auth to service
rest_url (str) – URL for service, should end with api/generate
temperature
max_tokens
seed
attribute_name_prefix
max_retries (int) – Number of times to retry failed query
timeout (int or float) – Time in seconds to wait for response from service
retry_wait (int or float) – Time in seconds to wait between retries for failed query
- annotate_gene_set(gene_names=None)[source]
Using prompt passed in via constructor, this call invokes the LLM specified by model set in constructor
- Parameters:
gene_names (list) – Genes to analyze
- Raises:
CellmapshierarchyevalError – If LLM failed to run
- Returns:
(‘process name (score)’, full output from LLM)
- Return type:
cellmaps_hierarchyeval.perturb module
- class cellmaps_hierarchyeval.perturb.PerturbSeqAnalysis(hierarchy, hierarchy_parent=None)[source]
Bases:
object
Contains utilities to compare Perturbation data against hierarchy passed in via constructor
Constructor
- Parameters:
hierarchy (
CX2Network
)hierarchy_parent (
CX2Network
)
- static compare_cluster_root_similarities(cluster_functional_data_similarity, root_functional_data_similarity)[source]
Performs a rank-sum test to compare the distribution of functional data similarity scores between a specific cluster and gene pairs in root. This test helps determine if the similarity scores in the cluster are statistically significantly greater than those in the root.
- Parameters:
cluster_functional_data_similarity (numpy.array) – An array of similarity scores within a specific cluster.
root_functional_data_similarity (list) – A list of non-NaN similarity scores for gene pairs not directly related in the root.
- Returns:
A tuple containing the test statistic and the p-value of the rank-sum test.
- Return type:
- get_cluster_similarity(functional_data_similarity, hier_system_node_id)[source]
Retrieves the upper triangle similarity scores for genes within a specific cluster of a hierarchy. The scores are extracted from a DataFrame that contains scaled cosine similarity scores for genes that overlap between communities direct to root and Perturb-seq data.
- Parameters:
functional_data_similarity (
pandas.DataFrame
) – A DataFrame of scaled cosine similarity scores for overlapping genes in communities direct to root and Perturb-seq data.hier_system_node_id (int) – The identifier for a specific node within a hierarchy.
- Returns:
An array of similarity scores from the upper triangle portion of the matrix for the specified cluster.
- Return type:
- get_heatmap_for_given_hierarchy_system(hier_system_node_id, perturbseq_df, num_perturb_seq=25)[source]
Given an id for a system in hierarchy hier_system_node_id and Perturb-seq data perturbseq_df create a heatmap of X most variable Perturb-seq proteins.
This is done by filtering perturbseq_df for rows that match genes in given system and then keeping num_perturb_seq most variable columns
- Parameters:
hier_system_node_id (int) – node id system to analyze
perturbseq_df (
pandas.DataFrame
)num_perturb_seq (int)
- Returns:
heat map table
- Return type:
- static get_root_functional_data_similarity(functional_data_similarity, overlap_root_pairs)[source]
- Extracts and returns a list of functional similarity scores for gene pairs that are not in the same community,
based on a filtered upper triangle extraction of the similarity matrix (ensures that only unique, non-redundant gene pair comparisons are considered).
- Parameters:
functional_data_similarity (
pandas.DataFrame
) – A DataFrame of scaled cosine similarity scores for overlapping genes in communities direct to root and Perturb-seq data.overlap_root_pairs (
pandas.DataFrame
) – A DataFrame of root-associated similarity scores, filtered to only include overlapping genes. A score of 0 indicates a direct relation (same community) and scores greater than 0 indicate no direct relation
- Returns:
A list of non-NaN similarity scores for gene pairs that are not directly related.
- Return type:
- get_root_gene_pair_similarities()[source]
Calculates similarity scores between gene pairs in the root node of a hierarchy. Genes in the same community linked to the root node are marked with a similarity of 0, indicating they are directly related, while all other pairs are set to 1, suggesting no direct relation.
- Returns:
A DataFrame with genes as both rows and columns, populated with similarity scores.
- Return type:
- static get_root_overlapping_pair_similarities(root_pairs, perturbseq_df)[source]
Get similarity scores from perturbseq_df that match genes attached to the root node of the hierarchy
- Parameters:
root_pairs (
pandas.DataFrame
) – A DataFrame representing similarity scores between all genes in the root node, where genes within the same community connected to the root have a score of 0, indicating direct relation, and all other pairs have a score of 1, indicating no direct relation.perturbseq_df (
pandas.DataFrame
)
- Returns:
A tuple containing: - A DataFrame of scaled cosine similarity scores for overlapping genes in communities direct to root
and Perturb-seq data.
A DataFrame of root-associated similarity scores, filtered to only include overlapping genes.
- Return type:
cellmaps_hierarchyeval.cellmaps_hierarchyevalcmd module
- cellmaps_hierarchyeval.cellmaps_hierarchyevalcmd.get_model_prompt_from_string(o_prompt)[source]
Given argument from –ollama_prompts flag extract model and prompt which can be in following formats:
<MODEL> or <MODEL>,<PROMPT>
Where <MODEL> will always just be a string, but <PROMPT> can be a string or a path to a file
- cellmaps_hierarchyeval.cellmaps_hierarchyevalcmd.get_ollama_geneset_agents(ollama='/usr/local/bin/ollama', ollama_prompts=None, username=None, password=None)[source]
Parses ollama_prompts from argparse and creates geneset agents
- cellmaps_hierarchyeval.cellmaps_hierarchyevalcmd.main(args)[source]
Main entry point for program
- Parameters:
args (list) – arguments passed to command line usually
sys.argv[1:]()
- Returns:
return value of
cellmaps_hierarchyeval.runner.CellmapshierarchyevalRunner.run()
or2
if an exception is raised- Return type:
cellmaps_hierarchyeval.exceptions module
cellmaps_hierarchyeval.runner module
- class cellmaps_hierarchyeval.runner.BaseNetworkHelper(hierarchy_path)[source]
Bases:
object
Base class for network helpers.
Constructor.
- Parameters:
hierarchy_path (str) – File system path where the hierarchy network data is stored.
- class cellmaps_hierarchyeval.runner.CORUM_EnrichmentTerms(terms=None, term_name=None, hierarchy_genes=None, min_comp_size=4)[source]
Bases:
EnrichmentTerms
This class extends the EnrichmentTerms class to handle terms specific to CORUM.
Constructor. Sets the parameters and initializes the term genes.
- Parameters:
terms (
NiceCXNetwork
or None) – The terms to be processed.term_name (str or None) – Name of the term.
hierarchy_genes (list or None) – Genes in the hierarchy.
min_comp_size (int) – Minimum number of genes in a term for it to be considered.
- class cellmaps_hierarchyeval.runner.CX2NetworkHelper(hierarchy_path)[source]
Bases:
BaseNetworkHelper
Helper class for CX2 network data manipulation that extends the BaseNetworkHelper class with CX2-specific logic.
Constructor.
- Parameters:
hierarchy_path (str) – File system path where the CX2 hierarchy network data is stored.
- static dump_to_file(hierarchy, hierarchy_out_file)[source]
Save the hierarchy to a CX2 formatted JSON file.
- Parameters:
hierarchy (CX2Network) – The hierarchy to save.
hierarchy_out_file (str) – The file path where the hierarchy should be written.
- static get_format()[source]
Get string format identifier for CX2 network data.
- Returns:
The format identifier for CX2.
- Return type:
- get_hierarchy()[source]
Create and return a CX2 network object from the hierarchy path.
- Returns:
An instance of the CX2Network class.
- Return type:
CX2Network
- static get_hierarchy_real_ids(hierarchy=None, hierarchy_size=None)[source]
Retrieve the real identifiers of nodes within the hierarchy.
- Parameters:
hierarchy (CX2Network) – The hierarchy from which to extract node IDs.
hierarchy_size – Not used, but specified for compatibility.
- Returns:
A list of node identifiers.
- Return type:
- static get_nodes(hierarchy)[source]
Retrieve the nodes from the hierarchy.
- Parameters:
hierarchy (CX2Network) – The hierarchy from which to retrieve nodes.
- Returns:
A dictionary of nodes.
- Return type:
- class cellmaps_hierarchyeval.runner.CellmapshierarchyevalRunner(outdir=None, hierarchy_dir=None, min_comp_size=4, max_fdr=0.05, min_jaccard_index=0.1, corum='633291aa-6e1d-11ef-a7fd-005056ae23aa', go_cc='6722d74d-6e20-11ef-a7fd-005056ae23aa', hpa='68c2f2c0-6e20-11ef-a7fd-005056ae23aa', ndex_server='http://www.ndexbio.org', geneset_agents=None, name=None, organization_name=None, project_name=None, input_data_dict=None, skip_term_enrichment=False, skip_logging=True, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, geneset_annotator=<cellmaps_hierarchyeval.runner.GeneSetAgentAnnotator object>, provenance=None)[source]
Bases:
object
Class to run Hierarchy evaluation
Constructor
- Parameters:
outdir (str) – Output directory where results will be written
hierarchy_dir (str) – Directory containing the hierarchy network (output of cellmaps_generate_hierarchy)
min_comp_size (int) – Minimum number of genes required to evaluate a node or term (default: 4)
max_fdr (float) – Maximum adjusted p-value (FDR) to consider an enrichment result significant (default: 0.05)
min_jaccard_index (float) – Minimum Jaccard index required for an enrichment result to be accepted (default: 0.1)
corum (str) – UUID of the CORUM dataset on NDEx for enrichment comparison
go_cc (str) – UUID of the GO Cellular Component dataset on NDEx
hpa (str) – UUID of the Human Protein Atlas dataset on NDEx
ndex_server (str) – NDEx server URL to fetch enrichment datasets from (default: http://www.ndexbio.org)
geneset_agents (list or None) – Optional list of
GeneSetAgent
instances for gene set annotationname (str) – Optional name to assign to this evaluation run
organization_name (str) – Optional name of the organization running the tool
project_name (str) – Optional name of the project to associate with this analysis
input_data_dict (dict) – Dictionary of input arguments, used for provenance tracking and command-line logging
skip_term_enrichment (bool) – If True, disables built-in CORUM, GO_CC, and HPA term enrichment
skip_logging (bool) – If True disables logging, otherwise writes logs to output directory
provenance_utils (py:class:cellmaps_utils.provenance.ProvenanceUtil) – ProvenanceUtil object to use for FAIRSCAPE registration
geneset_annotator (
GeneSetAgentAnnotator
) – Object for applying GeneSetAgent annotations to hierarchy nodesprovenance (dict) –
Optional provenance dictionary if RO-Crate metadata is unavailable Example:
{ 'name': 'Example input dataset', 'organization-name': 'CM4AI', 'project-name': 'Example' }
- CORUM = '633291aa-6e1d-11ef-a7fd-005056ae23aa'
- GO_CC = '6722d74d-6e20-11ef-a7fd-005056ae23aa'
- HPA = '68c2f2c0-6e20-11ef-a7fd-005056ae23aa'
- MAX_FDR = 0.05
- MIN_COMP_SIZE = 4
- MIN_JACCARD_INDEX = 0.1
- NDEX_SERVER = 'http://www.ndexbio.org'
- get_annotated_hierarchy_as_nodelist_dest_file()[source]
Creates file path prefix for hierarchy
Example path:
/tmp/foo/hierarchy
- Returns:
Prefix path on filesystem to write Hierarchy Network
- Return type:
- get_annotated_hierarchy_dest_file()[source]
Creates file path prefix for hierarchy
Example path:
/tmp/foo/hierarchy
- Returns:
Prefix path on filesystem to write Hierarchy Network
- Return type:
- get_hierarchy_parent_network_dest_file()[source]
Creates file path prefix for hierarchy parent network
Example path:
/tmp/foo/hierarchy_parent
:return:
- class cellmaps_hierarchyeval.runner.EnrichmentResult(term=None, pval=None, jaccard_index=None, overlap_genes=None)[source]
Bases:
object
Base class for representing the results of enrichment analysis. It generates a hierarchy that is output in the CX format following the CDAPS style.
Constructor
- Parameters:
- set_accepted(min_jaccard_index, max_fdr)[source]
Sets the accepted status of the enrichment result based on Jaccard index and FDR criteria.
- class cellmaps_hierarchyeval.runner.EnrichmentTerms(terms=None, term_name=None, hierarchy_genes=None, min_comp_size=4)[source]
Bases:
object
Base class for implementations that generate term databases for enrichment (i.e., HPA, CORUM, GO)
Constructor
- Parameters:
terms (
NiceCXNetwork
or None) – The terms to be processed.term_name (str or None) – Name of the term.
hierarchy_genes (list or None) – Genes in the hierarchy.
min_comp_size (int) – Minimum number of genes in a term for it to be considered.
- class cellmaps_hierarchyeval.runner.GO_EnrichmentTerms(terms=None, term_name=None, hierarchy_genes=None, min_comp_size=4)[source]
Bases:
EnrichmentTerms
This class extends the EnrichmentTerms class to handle terms specific to Gene Ontology (GO).
Constructor. Sets the parameters and initializes the term genes and term description.
- Parameters:
terms (
NiceCXNetwork
or None) – The terms to be processed.term_name (str or None) – Name of the term.
hierarchy_genes (list or None) – Genes in the hierarchy.
min_comp_size (int) – Minimum number of genes in a term for it to be considered.
- class cellmaps_hierarchyeval.runner.GeneSetAgentAnnotator[source]
Bases:
object
Annotates hierarchy with results from one or more
GeneSetAgent
objectsConstructor
- annotate_hierarchy(geneset_agent=None, hierarchy=None)[source]
Annotates hierarchy with
GeneSetAgent
by adding new node attributes :param geneset_agent: :param hierarchy: :return:
- class cellmaps_hierarchyeval.runner.HPA_EnrichmentTerms(terms=None, term_name=None, hierarchy_genes=None, min_comp_size=4)[source]
Bases:
EnrichmentTerms
This class extends the EnrichmentTerms class to handle terms specific to the Human Protein Atlas (HPA).
Constructor
- Parameters:
terms (
NiceCXNetwork
or None) – The terms to be processed.term_name (str or None) – Name of the term.
hierarchy_genes (list or None) – Genes in the hierarchy.
min_comp_size (int) – Minimum number of genes in a term for it to be considered.
- class cellmaps_hierarchyeval.runner.HiDeF_EnrichmentTerms(terms=None, term_name=None, hierarchy_genes=None, min_comp_size=4)[source]
Bases:
EnrichmentTerms
This class extends the EnrichmentTerms class to handle terms specific to HiDeF output.
Constructor. Sets the parameters and initializes the term genes.
- Parameters:
terms (
NiceCXNetwork
or None) – The terms to be processed.term_name (str or None) – Name of the term.
hierarchy_genes (list or None) – Genes in the hierarchy.
min_comp_size (int) – Minimum number of genes in a term for it to be considered.
- class cellmaps_hierarchyeval.runner.NiceCXNetworkHelper(hierarchy_path)[source]
Bases:
BaseNetworkHelper
Helper class for NiceCX network data manipulation that extends the BaseNetworkHelper class with CX-specific logic.
Constructor.
- Parameters:
hierarchy_path (str) – File system path where the NiceCX hierarchy network data is stored.
- static dump_to_file(hierarchy, hierarchy_out_file)[source]
Save the hierarchy to a CX formatted JSON file.
- Parameters:
hierarchy (ndex2.nice_cx_network.NiceCXNetwork) – The hierarchy to save.
hierarchy_out_file (str) – The file path where the hierarchy should be written.
- static get_format()[source]
Get the string format identifier for CX data.
- Returns:
The format identifier for NiceCX.
- Return type:
- get_hierarchy()[source]
Create and return a NiceCXNetwork object from the hierarchy path.
- Returns:
An instance of the NiceCX network class.
- Return type:
- static get_hierarchy_real_ids(hierarchy=None, hierarchy_size=None)[source]
Generate a list of real IDs for a given hierarchy size.
- static get_node_genes(hierarchy=None, node=None)[source]
Extract the set of gene identifiers from a given node in the hierarchy.
- Parameters:
hierarchy (ndex2.nice_cx_network.NiceCXNetwork) – The hierarchy containing the node.
node (int) – The node from which to extract gene identifiers.
- Returns:
A set of gene identifiers.
- Return type:
- static get_nodes(hierarchy)[source]
Retrieve the nodes from the hierarchy.
- Parameters:
hierarchy (ndex2.nice_cx_network.NiceCXNetwork) – The hierarchy from which to retrieve nodes.
- Returns:
A dictionary of nodes.
- Return type:
- static get_suffix()[source]
Get the file suffix associated with CX files.
- Returns:
The suffix for NiceCX file types.
- Return type:
- static write_as_nodelist(hierarchy, dest_path)[source]
Write the nodes of the hierarchy to a specified file path as a tab-delimited list.
- Parameters:
hierarchy (ndex2.nice_cx_network.NiceCXNetwork) – The hierarchy containing the nodes to write.
dest_path (str) – The destination file path for the nodelist.
Module contents
Top-level package for cellmaps_hierarchyeval.