-
- All Implemented Interfaces:
-
net.maizegenetics.pangenome.db_loading.PHGData
public interface PHGDataWriter implements PHGData
-
-
Method Summary
Modifier and Type Method Description abstract boolean
putAllAnchors(List<AnchorDataPHG> anchorData, int refGroupMethodID)
Stores chrom, start pos, end pos to referenece_ranges table isFocus identifies focus intervals from the user's bed file abstract boolean
putGenoAndHaploTypeData(GenoHaploData ghData)
Stores required data to the genotypes and haplotypes tables for each entry on the list. abstract boolean
putRefAnchorData(String line_name, int hapnumber, List<AnchorDataPHG> adata, int hapMethod, Set<String> refGroupMethod, String gvcf, String variant_list, int genomeFileId, int gvcfFileId)
Fills in the haplotypes table for the reference ranges. abstract int
putMethod(String name, DBLoadingUtils.MethodType type, Map<String, String> description)
Adds a method, its type and its description to the anchor_methods table These are used to identify how sequences were created,how the were combined into consensus sequences, how haplotype counts were scores, how paths through the graph were create or how an edge was created. abstract boolean
putAssemblyInterAnchorSequences(String line_name, int hapNumber, String method, Multimap<Integer, AnchorDataPHG> anchorSequences)
Adds inter-anchor sequences for the specified assembly to the anchor_sequences and anchor_haplotypes table. abstract void
putConsensusSequences(Multimap<Position, Tuple<AnchorDataPHG, List<String>>> consensusMap, int methodId)
This method takes a map of consensus data, finds the anchorIds based on Position, finds the hapids of the taxa whose sequences at the specified anchorID map to the consensus. abstract boolean
putGameteGroupAndHaplotypes(List<String> gametes)
Takes a list of gametes and stores to the gamete_groups and gamete_haplotypes table Skips if this grouping already exists abstract void
putHaplotypesForGamete(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, int genomeFileId, int gvcfFileId)
Stores gamete sequence data to the haplotypes table This method associates all entries with the single gamete_grp_id which is passed in. abstract void
putHaplotypesForMultipleGroups(Multimap<Position, Tuple<AnchorDataPHG, String>> mapWithGroupHash, int method_id)
Add data to the haplotypes table. abstract void
putHaplotypeCountsData(String method, Map<String, String> methodDetails, String taxonName, String fastqFile, Array<byte> counts)
This method adds data to the haplotype_counts table. abstract int
putPathsData(String method, Map<String, String> methodDetails, String taxon, List<Integer> readMappingIds, Array<byte> pathBytes, boolean isTestMethod)
This method stores paths data to the paths table. abstract void
putRefRangeRefRangeMethod(int group_method_id, List<Integer> refRangeList)
Takes a method id and a list of reference ranges. abstract void
putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId)
Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome and a genomeFileId. abstract void
putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId, int maxEntries)
Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome, genomeFileId, and a number of MaxEntries. abstract int
putReadMappingData(String method, Map<String, String> methodDetails, String taxon, String file_group_name, Array<byte> mapping_data, boolean isTestMethod, int haplotypeListId)
Takes a method name, method details string, taxon name (should exist in the genotypes table), file_group_name, and a byte array of read mapping data. abstract void
updateReadMappingHash()
This prompts a call to private method loadReadMappingHash() to update this hash table abstract int
putGenomeFileData(String genome_path, String genome_file, int genoid, int type)
Method takes information on a genome fasta file, stores to the PHG db, returns the genome_file_data entry id created for the table entry. abstract int
putTaxaGroupName(String group_name)
Creates an entry in the taxa_groups table. abstract void
putTaxaTaxaGroups(String group_name, List<String> taxaList)
Takes a taxa group name and a list of taxa. abstract void
deleteReadMappingsById(List<Integer> readMappingIds)
Deletes from the read_mapping table based on the ids in the input List abstract void
deleteReadMappingPathsById(List<Integer> readMappingIds)
Deletes from the read_mapping_paths table the ids in the ReadMappingIds List abstract void
deletePaths(String method, List<String> taxa)
Deletes paths based on a method name and taxa. abstract int
deleteMethodByName(String method)
Takes a method name and deletes the entry for it from the methods table. abstract boolean
deleteReadMappingsCascade(List<Integer> readMappingIds)
Delete read_mappings from the read_mapping table based on provided ids. abstract int
putHalotypeListData(List<Integer> hapids)
put data to the haplotype Lists table. -
Methods inherited from class net.maizegenetics.pangenome.db_loading.PHGData
getAllTaxaGroupNames, getChromNamesForHaplotype, getDbTaxaNames, getGameteGroupIDFromTaxaList, getGenoidFromLine, getGenomeFileHashFromGenoidandFile, getGenomeFileIdFromGenoid, getGenomeFileIdFromGenoidAndFile, getHapCountsIDAndDataForVersionMethod, getHapCountsIDAndPathsForMethod, getHapidForGenoidHapNumber, getHapidHapNumberLineNamesForLines, getHapidMapFromLinenameHapNumber, getHapidsForGenoid, getHaplotypeIDFromFastaIDLine, getHaplotypeList, getHaplotypeListIDfromHash, getIntervalRangesWithIDForChrom, getLineNameHapNumberFromHapid, getMethodDescriptionFromName, getMethodIdFromName, getPathIdsForReadMappingIds, getPathsForTaxonMethod, getReadMappingId, getReadMappingIdsForTaxaMethod, getReadMappingsForId, getReadMappingsForMethod, getRefRangeIDFromString, getRefRangesForMethod, getTaxaForPathMethod, getTaxaForTaxaGroup, getTaxaGroupIDFromName, getTaxonPathsForMethod, isFileGroupNew
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
Method Detail
-
putAllAnchors
abstract boolean putAllAnchors(List<AnchorDataPHG> anchorData, int refGroupMethodID)
Stores chrom, start pos, end pos to referenece_ranges table isFocus identifies focus intervals from the user's bed file
- Parameters:
refGroupMethodID
- method_id used for creating this ref_range_group
-
putGenoAndHaploTypeData
abstract boolean putGenoAndHaploTypeData(GenoHaploData ghData)
Stores required data to the genotypes and haplotypes tables for each entry on the list.
-
putRefAnchorData
abstract boolean putRefAnchorData(String line_name, int hapnumber, List<AnchorDataPHG> adata, int hapMethod, Set<String> refGroupMethod, String gvcf, String variant_list, int genomeFileId, int gvcfFileId)
Fills in the haplotypes table for the reference ranges.
- Parameters:
adata
- Anchor data, including chrom, start/end positionshapMethod
- Name of method used to create anchors.refGroupMethod
- List of methods used to create the ref_range_groupgvcf
- String - name of gvcf filevariant_list
- String - name of file containing list of variantsgenomeFileId
- int - id for ref fasta entry in the genome_file_data tablegvcfFileId
- int - id for ref gvcf file enty in the genome_file_data table
-
putMethod
abstract int putMethod(String name, DBLoadingUtils.MethodType type, Map<String, String> description)
Adds a method, its type and its description to the anchor_methods table These are used to identify how sequences were created,how the were combined into consensus sequences, how haplotype counts were scores, how paths through the graph were create or how an edge was created. The "type" field identifies the table to which the method belongs.
- Parameters:
description
- - a map of pluginParameter name to value(as String)
-
putAssemblyInterAnchorSequences
abstract boolean putAssemblyInterAnchorSequences(String line_name, int hapNumber, String method, Multimap<Integer, AnchorDataPHG> anchorSequences)
Adds inter-anchor sequences for the specified assembly to the anchor_sequences and anchor_haplotypes table. This method takes a multi-map as assembly. Inter-anchors that do not map to a reference inter-anchor are all given the anchorid 0.
-
putConsensusSequences
abstract void putConsensusSequences(Multimap<Position, Tuple<AnchorDataPHG, List<String>>> consensusMap, int methodId)
This method takes a map of consensus data, finds the anchorIds based on Position, finds the hapids of the taxa whose sequences at the specified anchorID map to the consensus. Adds the gamete_group and sequence data to the haplotpes table ; adds entries to gamete_groups and gamete_hapltoypes. *
- Parameters:
consensusMap
- MultimapmethodId
- method used for collapsing anchors
-
putGameteGroupAndHaplotypes
abstract boolean putGameteGroupAndHaplotypes(List<String> gametes)
Takes a list of gametes and stores to the gamete_groups and gamete_haplotypes table Skips if this grouping already exists
- Parameters:
gametes
- list consisting of taxa/gamete number in the form taxaName_gameteNumber
-
putHaplotypesForGamete
abstract void putHaplotypesForGamete(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, int genomeFileId, int gvcfFileId)
Stores gamete sequence data to the haplotypes table This method associates all entries with the single gamete_grp_id which is passed in. It is used when loading reference_ranges sequences or haplotype sequences for a single line. THe gidToVariantDataMap map is used to create the variant list blob for the db
- Parameters:
genomeFileId
- - genome_file_data table id for the fasta file this haplotype, -1 means none.gvcfFileId
- - genome_file_data table id for the gvcf file for this haplotype, -1 mean none
-
putHaplotypesForMultipleGroups
abstract void putHaplotypesForMultipleGroups(Multimap<Position, Tuple<AnchorDataPHG, String>> mapWithGroupHash, int method_id)
Add data to the haplotypes table. Entries on the map are for different gamete groups. The key is a Position item identifying the genome_interval id The value is a Tuple consisting of (x) AnchorDataPHG object with sequence, gvcf, etc; and (y) a List of taxa represented by the ANchorDataPHG sequence
- Parameters:
method_id
- Id in the methods table for this group of sequences
-
putHaplotypeCountsData
abstract void putHaplotypeCountsData(String method, Map<String, String> methodDetails, String taxonName, String fastqFile, Array<byte> counts)
This method adds data to the haplotype_counts table. The "data" is a Snappy compressed byte buffer of a 3xn array, found in parameter "counts" To see how this data is stored, examine DBLoadingUtils.encodeHapCountsArrayFromFile(), DBLoadingUtils.encodeHapCountsArrayFromMultiset() and DBLoadingUtils.decodeHapCountsArray()
-
putPathsData
abstract int putPathsData(String method, Map<String, String> methodDetails, String taxon, List<Integer> readMappingIds, Array<byte> pathBytes, boolean isTestMethod)
This method stores paths data to the paths table.
- Parameters:
method
- - Method Name for Path detemination processmethodDetails
- Details of how these paths were createdtaxon
- Name of line for which data is being addedreadMappingIds
- List of read_mapping_idspathBytes
- Compressed byte array of paths dataisTestMethod
- Indicates if the method type should be PATHS ot TEST_PATHS
-
putRefRangeRefRangeMethod
abstract void putRefRangeRefRangeMethod(int group_method_id, List<Integer> refRangeList)
Takes a method id and a list of reference ranges. Populates the ref_range_ref_range_method table.
-
putHaplotypesData
abstract void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId)
Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome and a genomeFileId. Starts the process of storing table data for the haplotypes to the db. This will set maxEntries to 10000 and will call the putHaplotypesData version below putHaplotypesData(int gamete_grp_id, int method, MapanchorSequences, String chromosome, int genomeFileId, gvcfFIleId)
-
putHaplotypesData
abstract void putHaplotypesData(int gamete_grp_id, int method, Map<Integer, AnchorDataPHG> anchorSequences, String chromosome, int genomeFileId, int gvcfFileId, int maxEntries)
Takes a gamete_grp_id, method_id, list of haplotype sequences, a chromosome, genomeFileId, and a number of MaxEntries. Starts the process of storing table data for the haplotypes to the db
-
putReadMappingData
abstract int putReadMappingData(String method, Map<String, String> methodDetails, String taxon, String file_group_name, Array<byte> mapping_data, boolean isTestMethod, int haplotypeListId)
Takes a method name, method details string, taxon name (should exist in the genotypes table), file_group_name, and a byte array of read mapping data. This is stored to the PHG read_mapping table
- Parameters:
isTestMethod
- : indicates if method type should be set to a TEST methodhaplotypeListId
- : id from the haplotype_list table
-
updateReadMappingHash
abstract void updateReadMappingHash()
This prompts a call to private method loadReadMappingHash() to update this hash table
-
putGenomeFileData
abstract int putGenomeFileData(String genome_path, String genome_file, int genoid, int type)
Method takes information on a genome fasta file, stores to the PHG db, returns the genome_file_data entry id created for the table entry.
- Parameters:
genome_path
- external server path and file name for genomegenome_file
- local path to file name used for MD5 calculationgenoid
- genoid associated with this genome datatype
- the type of file, ie FASTA or GVCF from DBLoadingUtils.
-
putTaxaGroupName
abstract int putTaxaGroupName(String group_name)
Creates an entry in the taxa_groups table. If one already exists with the specified name, the id for it is returned.
-
putTaxaTaxaGroups
abstract void putTaxaTaxaGroups(String group_name, List<String> taxaList)
Takes a taxa group name and a list of taxa. Populates the taxa_groups and taxa_groups_genoid tables.
-
deleteReadMappingsById
abstract void deleteReadMappingsById(List<Integer> readMappingIds)
Deletes from the read_mapping table based on the ids in the input List
-
deleteReadMappingPathsById
abstract void deleteReadMappingPathsById(List<Integer> readMappingIds)
Deletes from the read_mapping_paths table the ids in the ReadMappingIds List
-
deletePaths
abstract void deletePaths(String method, List<String> taxa)
Deletes paths based on a method name and taxa. It allows for either method or taxa to be null, but not both. Entries are deleted from both the read_mapping_paths and the paths
- Parameters:
method
- delete paths which have this methodtaxa
- list of taxa for which the paths should be deleted
-
deleteMethodByName
abstract int deleteMethodByName(String method)
Takes a method name and deletes the entry for it from the methods table.
-
deleteReadMappingsCascade
abstract boolean deleteReadMappingsCascade(List<Integer> readMappingIds)
Delete read_mappings from the read_mapping table based on provided ids. This will also delete entries from the read_mapping_paths and paths table that are associated with these read_mappings
-
putHalotypeListData
abstract int putHalotypeListData(List<Integer> hapids)
put data to the haplotype Lists table.
- Parameters:
hapids
- - list of integers representing haplotype ids
-
-
-
-