Handling master data ==================== The ``PyHPO`` package relies on several master data files provided by the HPO Consortium. The package always includes those files in the ``data`` subfolder. Even though I try to update ``PyHPO`` with every HPO data update, I might be behind sometimes and can't guarantee long-term support. Here you will find the easiest procedures to update the data yourself. ``PyHPO`` requires three data files * ``HPO_ONTOLOGY``: This is the ``obo`` file describing the HPO Ontology. Let's all hope that the file format will never change. This file is mandatory * ``HPO_GENE``: This is a custom file provided by the HPO consortium that contains links between HPO-Terms and genes. * ``HPO_PHENO``: This is a custom file provided by the HPO consortium that contains links between HPO-Terms and diseases. ``HPO_GENE`` and ``HPO_PHENO`` files are not mandatory per-se. The ontology itself will work without them, but the HPO Terms will not be annotated. That means, you won't be able to calculate the information content, similarity and some other features. ``Auto update`` ******************** You can try to auto-update the data from the HPO Jenkins servers and OBO-Library via the built-in script ``update_data.py``. .. code:: python from pyhpo.update_data import download_data download_data() Error handling --------------- If the URLs of the files change, you will need to modify the URLS dict in the ``update_data`` module. .. code:: python from pyhpo.update_data import download_data download_data.URLS['HPO_ONTOLOGY'] = 'https://custom-url.com' download_data() Sometimes, the HPO-Disease associations file is improperly generated and the header start with ``#``. During Annotation parsing, ``PyHPO`` removes all outcomment rows. So you might have to manually change the file from:: #description: HPO annotations for rare diseases [7801: OMIM; 47: DECIPHER; 3958 ORPHANET] #date: 2020-08-11 #tracker: https://github.com/obophenotype/human-phenotype-ontology #HPO-version: http://purl.obolibrary.org/obo/hp.obo/hp/releases/2020-08-11/hp.obo.owl #DatabaseID DiseaseName Qualifier HPO_ID Reference Evidence Onset Frequency Sex Modifier Aspect Biocuration to:: #description: HPO annotations for rare diseases [7801: OMIM; 47: DECIPHER; 3958 ORPHANET] #date: 2020-08-11 #tracker: https://github.com/obophenotype/human-phenotype-ontology #HPO-version: http://purl.obolibrary.org/obo/hp.obo/hp/releases/2020-08-11/hp.obo.owl DatabaseID DiseaseName Qualifier HPO_ID Reference Evidence Onset Frequency Sex Modifier Aspect Biocuration ``Manual update`` ******************** Of course you can manually download the files and replace them in the ``data`` subfolder. However, this is not recommended, as it might cause issues and is not easy to undo. Instead, you can download the files and store them somewhere in your home folder. Upon initilizing the :class:`pyhpo.ontology.OntologyClass`, you can specify the path to the files. .. code:: python from pyhpo.ontology import Ontology _ = Ontology(data_folder='/path/to/master/data')