Developmental Guide

Core Module APIs

Gene Harmonizer

pemt.utils.get_hgnc_id() Dict[str, str][source]

Mapping dictionary for HGNC symbol to HGNC identifiers

pemt.utils.hgnc_to_chembl(chemical_mapper: Dict[str, str], uniprot_mapper: Dict[str, str], hgnc_symbol: str) str | None[source]

Mapping HGNC symbol to ChEMBL identifiers.

Parameters:
  • chemical_mapper – A dictionary mapping the UNIPROT identifiers to ChEMBL

  • uniprot_mapper – A dictionary mapping the HGNC identifiers to UNIPROT

  • hgnc_symbol – A HGNC symbol

pemt.utils.uniprot_to_chembl(chemical_mapper: dict, uniprot_id: str) str | None[source]

Mapping UniProt identifiers to ChEMBL identifiers.

Parameters:
  • chemical_mapper – A dictionary mapping the UNIPROT identifiers to ChEMBL

  • uniprot_id – UNIPROT identifier of a protein

pemt.utils.get_chemical_names(chembl_id: str) str[source]

Method to get chemical name from ChEMBL id.

Parameters:

chembl_id – ChEMBL identifier of a compound

pemt.utils.uniprot_to_chembl()[source]

Mapping UniProt identifiers to ChEMBL identifiers.

Parameters:
  • chemical_mapper – A dictionary mapping the UNIPROT identifiers to ChEMBL

  • uniprot_id – UNIPROT identifier of a protein

pemt.utils.hgnc_to_chembl()[source]

Mapping HGNC symbol to ChEMBL identifiers.

Parameters:
  • chemical_mapper – A dictionary mapping the UNIPROT identifiers to ChEMBL

  • uniprot_mapper – A dictionary mapping the HGNC identifiers to UNIPROT

  • hgnc_symbol – A HGNC symbol

Chemical Extractor

pemt.chemical_extractor.experimental_data_extraction.extract_chemicals()[source]

Enrich genes with chemical data from CheMBL bioassays.

Parameters:
  • analysis_name – The name of the analysis you want to run. This name would be used to save the resultant file

  • gene_list – The list of gene you want to extract chemicals for.

  • gene_file_path – The path of the gene file

  • file_separator – The separator used within the file. This can be ‘comma’, ‘tab’, or ‘semicolon’. By default,

the file separator is set to csv. :param is_uniprot: A boolean value indicating whether the given gene list or file containing uniprot ids or HGNC symbols. By default, the value is set to False indicating that a “symbol” column is present with the respective HGNC symbols. If set to True, the file with “uniprot” column is expected.

Chemical Harmonizer

pemt.patent_extractor.patent_chemical_harmonizer.harmonize_chemicals()[source]

Method that allows mapping from ChEMBL to SureChEMBL identifiers.

Parameters:
  • analysis_name – The name of the analysis you want to run. This name would be used to save the resultant file.

  • from_genes – Boolean indicating where the process needs to get chemicals based on genes or not.

Patent Extractor

pemt.patent_extractor.patent_enrichment.extract_patent()[source]

Extract and store all valid patent document metadata.

Parameters:
  • analysis_name – Name of the analysis.

  • os_system – The OS on which the code is running. It can be either of these: linux, mac, window.

  • chrome_driver_path – The path of the chrome driver is located.

  • patent_year – The cutt-off year for searching the patent documents