Package 

Class PHGdbAccess

  • All Implemented Interfaces:
    java.lang.AutoCloseable , net.maizegenetics.pangenome.db_loading.PHGData , net.maizegenetics.pangenome.db_loading.PHGDataWriter

    
    public class PHGdbAccess
     implements PHGDataWriter, AutoCloseable
                        

    Access methods for postgres or SQL PHG dbs. WHen making changes, or adding new methods, test the SQL statements in both PostgreSQL and SQL to ensure compatibility.

    • Constructor Detail

      • PHGdbAccess

        PHGdbAccess(Connection dbConnection)
    • Method Detail

      • updateReadMappingHash

         void updateReadMappingHash()

        This prompts a call to private method loadReadMappingHash() to update this hash table

      • getRefRangeIDFromString

         int getRefRangeIDFromString(String refData)

        Returns reference range id

        Parameters:
        refData - needs to be of form chr1:startpos:endPos
      • getHaplotypeIDFromFastaIDLine

         int getHaplotypeIDFromFastaIDLine(String idLine, String methodName)

        Returns the haplotype_id from the haplotypes table based on the gamete_grp_id (calculated from the taxa list), the ref_range_id (calculated from the ref coordinates part of the idline) and the method.

        Parameters:
        idLine - expected form: refchr:refStartPos:refEndPos;taxa_hapnumber:taxa_hapnumber:etc
        methodName - This is the method used for creating the haplotypes
      • getMethodIdFromName

         int getMethodIdFromName(String method_name)

        Returns method_id given a method name. or 0 if no id found for the given name

      • getGameteGroupIDFromTaxaList

         int getGameteGroupIDFromTaxaList(List<String> gametes)

        Takes a list of taxa and returns the corresponding gamete_group_id or 0

        Parameters:
        gametes - Items on the list must be of the form taxon_hapNumber
      • getGenomeFileIdFromGenoid

         int getGenomeFileIdFromGenoid(int genoid)

        Gets the id from the genome_file_data table based on the genoid .

      • getHapidForGenoidHapNumber

         int getHapidForGenoidHapNumber(int genoid, int hap_number)

        Returns hapid for given line name and hap_number

      • getHapidHapNumberLineNamesForLines

         Map<Integer, String> getHapidHapNumberLineNamesForLines(List<String> lineNames)

        Retrieves line name and hapnumber, and returns them with the hapid. hapid is the key in the map. THe line name and hap number are concatenated with an underscore and returned as the string value for each hapid.

      • getChromNamesForHaplotype

         List<String> getChromNamesForHaplotype(String line_name, int hap_number, String version)

        Method grabs a list of distinct chromosome names for a genome_interval version

        Parameters:
        version - Version name for anchors
      • getReadMappingId

         int getReadMappingId(String line_name, String method_name, String file_group_name)

        Method uses line_name, method_name and file_Group_name to fetch a read_mapping_id. -1 returned if the this combination is not present in the db.

      • getHaplotypeList

         List<Integer> getHaplotypeList(int haplist_id)

        Given a haplotype_list_id, query the haplotype_list table and return a list of haplotypes stored for that ID, or an empty list if there were none.

      • getReadMappingsForId

         Pair<Array<byte>, Integer> getReadMappingsForId(int readMappingId)

        Method to get the Read Mapping data from the DB using the read_mapping_id. Generally this is returned from PHGData.getReadMappingId(String line_name, String method_name, String file_group_name)

      • isFileGroupNew

         boolean isFileGroupNew(String taxon, String fileGroupName, String methodName)

        Method to check to see if a given taxon and fileGroup are already in the readMapping table of the DB.

      • getRefRangesForMethod

         List<Integer> getRefRangesForMethod(String methodName)

        Method to return all reference ranges associated with a specified method name

        Parameters:
        methodName - name of method group for which the user wants reference range ids
      • getDbTaxaNames

         List<String> getDbTaxaNames()

        Gets a list of the taxa currently in the database. Taxa are identified by the line_name field of the genotypes table

      • getTaxaGroupIDFromName

         int getTaxaGroupIDFromName(String group_name)

        Returns taxa_grp_id given a taxa group name. or 0 if no id found for the given name

      • putMethod

         int putMethod(String name, DBLoadingUtils.MethodType type, Map<String, String> descriptionMap)

        Adds a method, its type and its description to the anchor_methods table These are used to identify how sequences were created,how the were combined into consensus sequences, how haplotype counts were scores, how paths through the graph were create or how an edge was created. The "type" field identifies the table to which the method belongs.

      • putGameteGroupAndHaplotypes

         boolean putGameteGroupAndHaplotypes(List<String> gametes)

        Takes a list of gametes and stores to the gamete_groups and gamete_haplotypes table Skips if this grouping already exists

        Parameters:
        gametes - list consisting of taxa/gamete number in the form taxaName_gameteNumber
      • putAllAnchors

         boolean putAllAnchors(List<AnchorDataPHG> adata, int refGroupMethodID)

        Stores chrom, start pos, end pos to referenece_ranges table isFocus identifies focus intervals from the user's bed file

        Parameters:
        refGroupMethodID - method_id used for creating this ref_range_group
      • putRefRangeRefRangeMethod

         void putRefRangeRefRangeMethod(int method_id, List<Integer> refRangeIDList)

        Takes a method id and a list of reference ranges. Populates the ref_range_ref_range_method table.

      • putRefAnchorData

         boolean putRefAnchorData(String line_name, int hapnumber, List<AnchorDataPHG> anchorData, int hapMethod, Set<String> refGrpMethods, String gvcf, String variant_list, int genomeFileId, int gvcfFileId)

        Fills in the haplotypes table for the reference ranges.

        Parameters:
        hapMethod - Name of method used to create anchors.
        gvcf - String - name of gvcf file
        variant_list - String - name of file containing list of variants
        genomeFileId - int - id for ref fasta entry in the genome_file_data table
        gvcfFileId - int - id for ref gvcf file enty in the genome_file_data table
      • putHaplotypesForMultipleGroups

         void putHaplotypesForMultipleGroups(Multimap<Position, Tuple<AnchorDataPHG, String>> mapWithGroupHash, int method_id)

        Add data to the haplotypes table. Entries on the map are for different gamete groups. The key is a Position item identifying the genome_interval id The value is a Tuple consisting of (x) AnchorDataPHG object with sequence, gvcf, etc; and (y) a List of taxa represented by the ANchorDataPHG sequence

        Parameters:
        method_id - Id in the methods table for this group of sequences
      • putHaplotypesForGamete

         void putHaplotypesForGamete(int gamete_grp_id, int method_id, Map<Integer, AnchorDataPHG> anchorSequences, int genomeFileId, int gvcfFileId)

        Stores gamete sequence data to the haplotypes table This method associates all entries with the single gamete_grp_id which is passed in. It is used when loading reference_ranges sequences or haplotype sequences for a single line. THe gidToVariantDataMap map is used to create the variant list blob for the db

        Parameters:
        genomeFileId - - genome_file_data table id for the fasta file this haplotype, -1 means none.
        gvcfFileId - - genome_file_data table id for the gvcf file for this haplotype, -1 mean none
      • putHaplotypesData

         void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId)

        Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome and a genomeFileId. Starts the process of storing table data for the haplotypes to the db. This will set maxEntries to 10000 and will call the putHaplotypesData version below putHaplotypesData(int gamete_grp_id, int method, MapanchorSequences, String chromosome, int genomeFileId, gvcfFIleId)

      • putHaplotypesData

         void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId, int maxEntries)

        Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome, genomeFileId, and a number of MaxEntries. Starts the process of storing table data for the haplotypes to the db

      • putConsensusSequences

         void putConsensusSequences(Multimap<Position, Tuple<AnchorDataPHG, List<String>>> consensusMap, int method_id)

        This method takes a map of consensus data, finds the anchorIds based on Position, finds the hapids of the taxa whose sequences at the specified anchorID map to the consensus. Adds the gamete_group and sequence data to the haplotpes table ; adds entries to gamete_groups and gamete_hapltoypes. *

        Parameters:
        consensusMap - Multimap
      • putHaplotypeCountsData

         void putHaplotypeCountsData(String method, Map<String, String> methodDetails, String taxonName, String fastqFile, Array<byte> counts)

        This method adds data to the haplotype_counts table. The "data" is a Snappy compressed byte buffer of a 3xn array, found in parameter "counts" To see how this data is stored, examine DBLoadingUtils.encodeHapCountsArrayFromFile(), DBLoadingUtils.encodeHapCountsArrayFromMultiset() and DBLoadingUtils.decodeHapCountsArray()

      • putPathsData

         int putPathsData(String method, Map<String, String> methodDetails, String taxon, List<Integer> readMappingIds, Array<byte> pathBytes, boolean isTestMethod)

        This method stores paths data to the paths table.

        Parameters:
        method - - Method Name for Path detemination process
        methodDetails - Details of how these paths were created
        taxon - Name of line for which data is being added
        readMappingIds - List of read_mapping_ids
        pathBytes - Compressed byte array of paths data
        isTestMethod - Indicates if the method type should be PATHS ot TEST_PATHS
      • putReadMappingData

         int putReadMappingData(String method, Map<String, String> methodDetails, String taxon, String file_group_name, Array<byte> mapping_data, boolean isTestMethod, int haplotypeListId)

        Takes a method name, method details string, taxon name (should exist in the genotypes table), file_group_name, and a byte array of read mapping data. This is stored to the PHG read_mapping table

        Parameters:
        isTestMethod - : indicates if method type should be set to a TEST method
        haplotypeListId - : id from the haplotype_list table
      • putHalotypeListData

         int putHalotypeListData(List<Integer> hapids)

        put data to the haplotype Lists table.

        Parameters:
        hapids - - list of integers representing haplotype ids
      • putGenomeFileData

         int putGenomeFileData(String genome_server_path, String genome_file, int genoid, int type)

        Method takes information on a genome fasta file, stores to the PHG db, returns the genome_file_data entry id created for the table entry.

        Parameters:
        genome_file - local path to file name used for MD5 calculation
        genoid - genoid associated with this genome data
        type - the type of file, ie FASTA or GVCF from DBLoadingUtils.
      • putTaxaGroupName

         int putTaxaGroupName(String group_name)

        Creates an entry in the taxa_groups table. If one already exists with the specified name, the id for it is returned.

      • putTaxaTaxaGroups

         void putTaxaTaxaGroups(String group_name, List<String> taxaList)

        Takes a taxa group name and a list of taxa. Populates the taxa_groups and taxa_groups_genoid tables.

      • putAssemblyInterAnchorSequences

         boolean putAssemblyInterAnchorSequences(String line_name, int hapNumber, String method, Multimap<Integer, AnchorDataPHG> anchorSequences)

        Adds inter-anchor sequences for the specified assembly to the anchor_sequences and anchor_haplotypes table. This method takes a multi-map as assembly. Inter-anchors that do not map to a reference inter-anchor are all given the anchorid 0.

      • deletePaths

         void deletePaths(String method, List<String> taxa)

        Deletes paths based on a method name and taxa. It allows for either method or taxa to be null, but not both. Entries are deleted from both the read_mapping_paths and the paths

        Parameters:
        method - delete paths which have this method
        taxa - list of taxa for which the paths should be deleted
      • deleteReadMappingsCascade

         boolean deleteReadMappingsCascade(List<Integer> readMappingIds)

        Delete read_mappings from the read_mapping table based on provided ids. This will also delete entries from the read_mapping_paths and paths table that are associated with these read_mappings

      • deleteMethodByName

         int deleteMethodByName(String method)

        Takes a method name and deletes the entry for it from the methods table.