Package 

Interface PHGDataWriter

  • All Implemented Interfaces:
    net.maizegenetics.pangenome.db_loading.PHGData

    
    public interface PHGDataWriter
     implements PHGData
                        
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
    • Field Summary

      Fields 
      Modifier and Type Field Description
    • Constructor Summary

      Constructors 
      Constructor Description
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
    • Method Summary

      Modifier and Type Method Description
      abstract boolean putAllAnchors(List<AnchorDataPHG> anchorData, int refGroupMethodID) Stores chrom, start pos, end pos to referenece_ranges table isFocus identifies focus intervals from the user's bed file
      abstract boolean putGenoAndHaploTypeData(GenoHaploData ghData) Stores required data to the genotypes and haplotypes tables for each entry on the list.
      abstract boolean putRefAnchorData(String line_name, int hapnumber, List<AnchorDataPHG> adata, int hapMethod, Set<String> refGroupMethod, String gvcf, String variant_list, int genomeFileId, int gvcfFileId) Fills in the haplotypes table for the reference ranges.
      abstract int putMethod(String name, DBLoadingUtils.MethodType type, Map<String, String> description) Adds a method, its type and its description to the anchor_methods table These are used to identify how sequences were created,how the were combined into consensus sequences, how haplotype counts were scores, how paths through the graph were create or how an edge was created.
      abstract boolean putAssemblyInterAnchorSequences(String line_name, int hapNumber, String method, Multimap<Integer, AnchorDataPHG> anchorSequences) Adds inter-anchor sequences for the specified assembly to the anchor_sequences and anchor_haplotypes table.
      abstract void putConsensusSequences(Multimap<Position, Tuple<AnchorDataPHG, List<String>>> consensusMap, int methodId) This method takes a map of consensus data, finds the anchorIds based on Position, finds the hapids of the taxa whose sequences at the specified anchorID map to the consensus.
      abstract boolean putGameteGroupAndHaplotypes(List<String> gametes) Takes a list of gametes and stores to the gamete_groups and gamete_haplotypes table Skips if this grouping already exists
      abstract void putHaplotypesForGamete(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, int genomeFileId, int gvcfFileId) Stores gamete sequence data to the haplotypes table This method associates all entries with the single gamete_grp_id which is passed in.
      abstract void putHaplotypesForMultipleGroups(Multimap<Position, Tuple<AnchorDataPHG, String>> mapWithGroupHash, int method_id) Add data to the haplotypes table.
      abstract void putHaplotypeCountsData(String method, Map<String, String> methodDetails, String taxonName, String fastqFile, Array<byte> counts) This method adds data to the haplotype_counts table.
      abstract int putPathsData(String method, Map<String, String> methodDetails, String taxon, List<Integer> readMappingIds, Array<byte> pathBytes, boolean isTestMethod) This method stores paths data to the paths table.
      abstract void putRefRangeRefRangeMethod(int group_method_id, List<Integer> refRangeList) Takes a method id and a list of reference ranges.
      abstract void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId) Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome and a genomeFileId.
      abstract void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId, int maxEntries) Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome, genomeFileId, and a number of MaxEntries.
      abstract int putReadMappingData(String method, Map<String, String> methodDetails, String taxon, String file_group_name, Array<byte> mapping_data, boolean isTestMethod, int haplotypeListId) Takes a method name, method details string, taxon name (should exist in the genotypes table), file_group_name, and a byte array of read mapping data.
      abstract void updateReadMappingHash() This prompts a call to private method loadReadMappingHash() to update this hash table
      abstract int putGenomeFileData(String genome_path, String genome_file, int genoid, int type) Method takes information on a genome fasta file, stores to the PHG db, returns the genome_file_data entry id created for the table entry.
      abstract int putTaxaGroupName(String group_name) Creates an entry in the taxa_groups table.
      abstract void putTaxaTaxaGroups(String group_name, List<String> taxaList) Takes a taxa group name and a list of taxa.
      abstract void deleteReadMappingsById(List<Integer> readMappingIds) Deletes from the read_mapping table based on the ids in the input List
      abstract void deleteReadMappingPathsById(List<Integer> readMappingIds) Deletes from the read_mapping_paths table the ids in the ReadMappingIds List
      abstract void deletePaths(String method, List<String> taxa) Deletes paths based on a method name and taxa.
      abstract int deleteMethodByName(String method) Takes a method name and deletes the entry for it from the methods table.
      abstract boolean deleteReadMappingsCascade(List<Integer> readMappingIds) Delete read_mappings from the read_mapping table based on provided ids.
      abstract int putHalotypeListData(List<Integer> hapids) put data to the haplotype Lists table.
      • Methods inherited from class net.maizegenetics.pangenome.db_loading.PHGData

        getAllTaxaGroupNames, getChromNamesForHaplotype, getDbTaxaNames, getGameteGroupIDFromTaxaList, getGenoidFromLine, getGenomeFileHashFromGenoidandFile, getGenomeFileIdFromGenoid, getGenomeFileIdFromGenoidAndFile, getHapCountsIDAndDataForVersionMethod, getHapCountsIDAndPathsForMethod, getHapidForGenoidHapNumber, getHapidHapNumberLineNamesForLines, getHapidMapFromLinenameHapNumber, getHapidsForGenoid, getHaplotypeIDFromFastaIDLine, getHaplotypeList, getHaplotypeListIDfromHash, getIntervalRangesWithIDForChrom, getLineNameHapNumberFromHapid, getMethodDescriptionFromName, getMethodIdFromName, getPathIdsForReadMappingIds, getPathsForTaxonMethod, getReadMappingId, getReadMappingIdsForTaxaMethod, getReadMappingsForId, getReadMappingsForMethod, getRefRangeIDFromString, getRefRangesForMethod, getTaxaForPathMethod, getTaxaForTaxaGroup, getTaxaGroupIDFromName, getTaxonPathsForMethod, isFileGroupNew
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

    • Method Detail

      • putAllAnchors

         abstract boolean putAllAnchors(List<AnchorDataPHG> anchorData, int refGroupMethodID)

        Stores chrom, start pos, end pos to referenece_ranges table isFocus identifies focus intervals from the user's bed file

        Parameters:
        refGroupMethodID - method_id used for creating this ref_range_group
      • putRefAnchorData

         abstract boolean putRefAnchorData(String line_name, int hapnumber, List<AnchorDataPHG> adata, int hapMethod, Set<String> refGroupMethod, String gvcf, String variant_list, int genomeFileId, int gvcfFileId)

        Fills in the haplotypes table for the reference ranges.

        Parameters:
        adata - Anchor data, including chrom, start/end positions
        hapMethod - Name of method used to create anchors.
        refGroupMethod - List of methods used to create the ref_range_group
        gvcf - String - name of gvcf file
        variant_list - String - name of file containing list of variants
        genomeFileId - int - id for ref fasta entry in the genome_file_data table
        gvcfFileId - int - id for ref gvcf file enty in the genome_file_data table
      • putMethod

         abstract int putMethod(String name, DBLoadingUtils.MethodType type, Map<String, String> description)

        Adds a method, its type and its description to the anchor_methods table These are used to identify how sequences were created,how the were combined into consensus sequences, how haplotype counts were scores, how paths through the graph were create or how an edge was created. The "type" field identifies the table to which the method belongs.

        Parameters:
        description - - a map of pluginParameter name to value(as String)
      • putAssemblyInterAnchorSequences

         abstract boolean putAssemblyInterAnchorSequences(String line_name, int hapNumber, String method, Multimap<Integer, AnchorDataPHG> anchorSequences)

        Adds inter-anchor sequences for the specified assembly to the anchor_sequences and anchor_haplotypes table. This method takes a multi-map as assembly. Inter-anchors that do not map to a reference inter-anchor are all given the anchorid 0.

      • putConsensusSequences

         abstract void putConsensusSequences(Multimap<Position, Tuple<AnchorDataPHG, List<String>>> consensusMap, int methodId)

        This method takes a map of consensus data, finds the anchorIds based on Position, finds the hapids of the taxa whose sequences at the specified anchorID map to the consensus. Adds the gamete_group and sequence data to the haplotpes table ; adds entries to gamete_groups and gamete_hapltoypes. *

        Parameters:
        consensusMap - Multimap
        methodId - method used for collapsing anchors
      • putGameteGroupAndHaplotypes

         abstract boolean putGameteGroupAndHaplotypes(List<String> gametes)

        Takes a list of gametes and stores to the gamete_groups and gamete_haplotypes table Skips if this grouping already exists

        Parameters:
        gametes - list consisting of taxa/gamete number in the form taxaName_gameteNumber
      • putHaplotypesForGamete

         abstract void putHaplotypesForGamete(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, int genomeFileId, int gvcfFileId)

        Stores gamete sequence data to the haplotypes table This method associates all entries with the single gamete_grp_id which is passed in. It is used when loading reference_ranges sequences or haplotype sequences for a single line. THe gidToVariantDataMap map is used to create the variant list blob for the db

        Parameters:
        genomeFileId - - genome_file_data table id for the fasta file this haplotype, -1 means none.
        gvcfFileId - - genome_file_data table id for the gvcf file for this haplotype, -1 mean none
      • putHaplotypesForMultipleGroups

         abstract void putHaplotypesForMultipleGroups(Multimap<Position, Tuple<AnchorDataPHG, String>> mapWithGroupHash, int method_id)

        Add data to the haplotypes table. Entries on the map are for different gamete groups. The key is a Position item identifying the genome_interval id The value is a Tuple consisting of (x) AnchorDataPHG object with sequence, gvcf, etc; and (y) a List of taxa represented by the ANchorDataPHG sequence

        Parameters:
        method_id - Id in the methods table for this group of sequences
      • putHaplotypeCountsData

         abstract void putHaplotypeCountsData(String method, Map<String, String> methodDetails, String taxonName, String fastqFile, Array<byte> counts)

        This method adds data to the haplotype_counts table. The "data" is a Snappy compressed byte buffer of a 3xn array, found in parameter "counts" To see how this data is stored, examine DBLoadingUtils.encodeHapCountsArrayFromFile(), DBLoadingUtils.encodeHapCountsArrayFromMultiset() and DBLoadingUtils.decodeHapCountsArray()

      • putPathsData

         abstract int putPathsData(String method, Map<String, String> methodDetails, String taxon, List<Integer> readMappingIds, Array<byte> pathBytes, boolean isTestMethod)

        This method stores paths data to the paths table.

        Parameters:
        method - - Method Name for Path detemination process
        methodDetails - Details of how these paths were created
        taxon - Name of line for which data is being added
        readMappingIds - List of read_mapping_ids
        pathBytes - Compressed byte array of paths data
        isTestMethod - Indicates if the method type should be PATHS ot TEST_PATHS
      • putRefRangeRefRangeMethod

         abstract void putRefRangeRefRangeMethod(int group_method_id, List<Integer> refRangeList)

        Takes a method id and a list of reference ranges. Populates the ref_range_ref_range_method table.

      • putHaplotypesData

         abstract void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId)

        Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome and a genomeFileId. Starts the process of storing table data for the haplotypes to the db. This will set maxEntries to 10000 and will call the putHaplotypesData version below putHaplotypesData(int gamete_grp_id, int method, MapanchorSequences, String chromosome, int genomeFileId, gvcfFIleId)

      • putHaplotypesData

         abstract void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId, int maxEntries)

        Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome, genomeFileId, and a number of MaxEntries. Starts the process of storing table data for the haplotypes to the db

      • putReadMappingData

         abstract int putReadMappingData(String method, Map<String, String> methodDetails, String taxon, String file_group_name, Array<byte> mapping_data, boolean isTestMethod, int haplotypeListId)

        Takes a method name, method details string, taxon name (should exist in the genotypes table), file_group_name, and a byte array of read mapping data. This is stored to the PHG read_mapping table

        Parameters:
        isTestMethod - : indicates if method type should be set to a TEST method
        haplotypeListId - : id from the haplotype_list table
      • updateReadMappingHash

         abstract void updateReadMappingHash()

        This prompts a call to private method loadReadMappingHash() to update this hash table

      • putGenomeFileData

         abstract int putGenomeFileData(String genome_path, String genome_file, int genoid, int type)

        Method takes information on a genome fasta file, stores to the PHG db, returns the genome_file_data entry id created for the table entry.

        Parameters:
        genome_path - external server path and file name for genome
        genome_file - local path to file name used for MD5 calculation
        genoid - genoid associated with this genome data
        type - the type of file, ie FASTA or GVCF from DBLoadingUtils.
      • putTaxaGroupName

         abstract int putTaxaGroupName(String group_name)

        Creates an entry in the taxa_groups table. If one already exists with the specified name, the id for it is returned.

      • putTaxaTaxaGroups

         abstract void putTaxaTaxaGroups(String group_name, List<String> taxaList)

        Takes a taxa group name and a list of taxa. Populates the taxa_groups and taxa_groups_genoid tables.

      • deletePaths

         abstract void deletePaths(String method, List<String> taxa)

        Deletes paths based on a method name and taxa. It allows for either method or taxa to be null, but not both. Entries are deleted from both the read_mapping_paths and the paths

        Parameters:
        method - delete paths which have this method
        taxa - list of taxa for which the paths should be deleted
      • deleteMethodByName

         abstract int deleteMethodByName(String method)

        Takes a method name and deletes the entry for it from the methods table.

      • deleteReadMappingsCascade

         abstract boolean deleteReadMappingsCascade(List<Integer> readMappingIds)

        Delete read_mappings from the read_mapping table based on provided ids. This will also delete entries from the read_mapping_paths and paths table that are associated with these read_mappings

      • putHalotypeListData

         abstract int putHalotypeListData(List<Integer> hapids)

        put data to the haplotype Lists table.

        Parameters:
        hapids - - list of integers representing haplotype ids