Developmental Guide
Core Module APIs
Gene Harmonizer
- pemt.utils.get_hgnc_id() Dict[str, str] [source]
Mapping dictionary for HGNC symbol to HGNC identifiers
- pemt.utils.hgnc_to_chembl(chemical_mapper: Dict[str, str], uniprot_mapper: Dict[str, str], hgnc_symbol: str) str | None [source]
Mapping HGNC symbol to ChEMBL identifiers.
- Parameters:
chemical_mapper – A dictionary mapping the UNIPROT identifiers to ChEMBL
uniprot_mapper – A dictionary mapping the HGNC identifiers to UNIPROT
hgnc_symbol – A HGNC symbol
- pemt.utils.uniprot_to_chembl(chemical_mapper: dict, uniprot_id: str) str | None [source]
Mapping UniProt identifiers to ChEMBL identifiers.
- Parameters:
chemical_mapper – A dictionary mapping the UNIPROT identifiers to ChEMBL
uniprot_id – UNIPROT identifier of a protein
- pemt.utils.get_chemical_names(chembl_id: str) str [source]
Method to get chemical name from ChEMBL id.
- Parameters:
chembl_id – ChEMBL identifier of a compound
Chemical Extractor
- pemt.chemical_extractor.experimental_data_extraction.extract_chemicals()[source]
Enrich genes with chemical data from CheMBL bioassays.
- Parameters:
analysis_name – The name of the analysis you want to run. This name would be used to save the resultant file
gene_list – The list of gene you want to extract chemicals for.
gene_file_path – The path of the gene file
file_separator – The separator used within the file. This can be ‘comma’, ‘tab’, or ‘semicolon’. By default,
the file separator is set to csv. :param is_uniprot: A boolean value indicating whether the given gene list or file containing uniprot ids or HGNC symbols. By default, the value is set to False indicating that a “symbol” column is present with the respective HGNC symbols. If set to True, the file with “uniprot” column is expected.
Chemical Harmonizer
- pemt.patent_extractor.patent_chemical_harmonizer.harmonize_chemicals()[source]
Method that allows mapping from ChEMBL to SureChEMBL identifiers.
- Parameters:
analysis_name – The name of the analysis you want to run. This name would be used to save the resultant file.
from_genes – Boolean indicating where the process needs to get chemicals based on genes or not.
Patent Extractor
- pemt.patent_extractor.patent_enrichment.extract_patent()[source]
Extract and store all valid patent document metadata.
- Parameters:
analysis_name – Name of the analysis.
os_system – The OS on which the code is running. It can be either of these: linux, mac, window.
chrome_driver_path – The path of the chrome driver is located.
patent_year – The cutt-off year for searching the patent documents