CreateASMCoordsUpdateFilesPlugin

net.maizegenetics.pangenome.processAssemblyGenomes.CreateASMCoordsUpdateFilesPlugin

```
public class CreateASMCoordsUpdateFilesPlugin
```
This class traverses the db and writes, to csv files, the haplotypes_id, asm_start_coordinate, asm_end_coordinate, asm_strand values. These files will be loaded to the db into a temp table, from which updates will occur for haplotype nodes. This class is a successor to UpdateDBAsmCoordinatesPlugin.kt. There were issues with batching in the original class - only single updates, one at a time, worked.
For this plugin: Strand shouldn't change, but it will be uploaded as well in kotlin/../api/ConvertVariantContextToVariantInfo:determineASMInfo() the code determines asm coordinates based on chroms being the same, and strand being the same
This example comes from https://tapoueh.org/blog/2018/07/batch-updates-and-concurrency/
The commands you need to do in the db:
1. create the temp table via: postgres# create temp table batch (haplotypes_id INTEGER, asm_start_coordinate integer, asm_end_coordinate integer, asm_strand varchar(1));
2. Run this plugin to get the csv file for import
3. you can either load the files separately, or cat them and then load a single file In folder where the csv files live: > l -1 *.csv > csvFileList.txt > xargs cat < csvFileList.txt > all25files.csv Then sort the file to get all the header lines in 1 place, move a header line back to the top of the file, delete the other header lines. You can do that manually, or use sort, e.g. > sort all25files.csv -t, -k1,1 > all25filesSORTED.csv 3a. (NOT RUN) In the db again run: (this is if you want to dump data to a csv, then add it to a different table.) NOTE: We aren't doing this - we are getting data from this plugin, and loading that phgsmallseq=# \COPY haplotypes(haplotypes_id,asm_start_coordinate, asm_end_coordinate, asm_strand) TO '/Users/lcj34/notes_files/phg_2018/debug/zackFix_mafToGVCF_Oct2022/haplotypesPostgres_just3Cols.csv' with delimiter ',' CSV HEADER;
4. Import data from the csv files from 2. above to the temp table named "batch" created in 1. above
  
  postgres=# \copy batch from '/workdir/lcj34/zackFixMAFToGvcf_oct2022/asmCSVs/all25files.csv' with csv header delimiter ',';
5. Now you have data in the temp table - transfer it to the haplotypes table: (only 6 updated in this example as it was smallSeq) phgsmallseq=# update haplotypes set (asm_start_coordinate, asm_end_coordinate) = (batch.asm_start_coordinate, batch.asm_end_coordinate) from batch where batch.haplotypes_id = haplotypes.haplotypes_id and ( haplotypes.asm_start_coordinate, haplotypes.asm_end_coordinate) <> (batch.asm_start_coordinate,batch.asm_end_coordinate); UPDATE 6 phgsmallseq=#
This will work for postgres - not sure if it will work for sqlite For the record: I used the methods outlined above to create 13167015 records for potential updates, which resulted in 12828476 udpates. It took 10.5 minutes to do this update in maize_2_1 on 11/14/22
NOTE: the plugin took 5 hours 27 minutes to run on dc01 on Nov 10, 2022 with 86 gvcf files and lots of haplotypes. You want to run on a cbsu machine, not your laptop.

- Constructor Detail
  - CreateASMCoordsUpdateFilesPlugin
```
public CreateASMCoordsUpdateFilesPlugin(@Nullable
                                        java.awt.Frame parentFrame,
                                        boolean isInteractive)
```
    This class traverses the db and writes, to csv files, the haplotypes_id, asm_start_coordinate, asm_end_coordinate, asm_strand values. These files will be loaded to the db into a temp table, from which updates will occur for haplotype nodes. This class is a successor to UpdateDBAsmCoordinatesPlugin.kt. There were issues with batching in the original class - only single updates, one at a time, worked.
    For this plugin: Strand shouldn't change, but it will be uploaded as well in kotlin/../api/ConvertVariantContextToVariantInfo:determineASMInfo() the code determines asm coordinates based on chroms being the same, and strand being the same
    This example comes from https://tapoueh.org/blog/2018/07/batch-updates-and-concurrency/
    The commands you need to do in the db:
    1. create the temp table via: postgres# create temp table batch (haplotypes_id INTEGER, asm_start_coordinate integer, asm_end_coordinate integer, asm_strand varchar(1));
    2. Run this plugin to get the csv file for import
    3. you can either load the files separately, or cat them and then load a single file In folder where the csv files live: > l -1 *.csv > csvFileList.txt > xargs cat < csvFileList.txt > all25files.csv Then sort the file to get all the header lines in 1 place, move a header line back to the top of the file, delete the other header lines. You can do that manually, or use sort, e.g. > sort all25files.csv -t, -k1,1 > all25filesSORTED.csv 3a. (NOT RUN) In the db again run: (this is if you want to dump data to a csv, then add it to a different table.) NOTE: We aren't doing this - we are getting data from this plugin, and loading that phgsmallseq=# \COPY haplotypes(haplotypes_id,asm_start_coordinate, asm_end_coordinate, asm_strand) TO '/Users/lcj34/notes_files/phg_2018/debug/zackFix_mafToGVCF_Oct2022/haplotypesPostgres_just3Cols.csv' with delimiter ',' CSV HEADER;
    4. Import data from the csv files from 2. above to the temp table named "batch" created in 1. above
      
      postgres=# \copy batch from '/workdir/lcj34/zackFixMAFToGvcf_oct2022/asmCSVs/all25files.csv' with csv header delimiter ',';
    5. Now you have data in the temp table - transfer it to the haplotypes table: (only 6 updated in this example as it was smallSeq) phgsmallseq=# update haplotypes set (asm_start_coordinate, asm_end_coordinate) = (batch.asm_start_coordinate, batch.asm_end_coordinate) from batch where batch.haplotypes_id = haplotypes.haplotypes_id and ( haplotypes.asm_start_coordinate, haplotypes.asm_end_coordinate) <> (batch.asm_start_coordinate,batch.asm_end_coordinate); UPDATE 6 phgsmallseq=#
    This will work for postgres - not sure if it will work for sqlite For the record: I used the methods outlined above to create 13167015 records for potential updates, which resulted in 12828476 udpates. It took 10.5 minutes to do this update in maize_2_1 on 11/14/22
    NOTE: the plugin took 5 hours 27 minutes to run on dc01 on Nov 10, 2022 with 86 gvcf files and lots of haplotypes. You want to run on a cbsu machine, not your laptop.
  - CreateASMCoordsUpdateFilesPlugin
```
public CreateASMCoordsUpdateFilesPlugin()
```
    This class traverses the db and writes, to csv files, the haplotypes_id, asm_start_coordinate, asm_end_coordinate, asm_strand values. These files will be loaded to the db into a temp table, from which updates will occur for haplotype nodes. This class is a successor to UpdateDBAsmCoordinatesPlugin.kt. There were issues with batching in the original class - only single updates, one at a time, worked.
    For this plugin: Strand shouldn't change, but it will be uploaded as well in kotlin/../api/ConvertVariantContextToVariantInfo:determineASMInfo() the code determines asm coordinates based on chroms being the same, and strand being the same
    This example comes from https://tapoueh.org/blog/2018/07/batch-updates-and-concurrency/
    The commands you need to do in the db:
    1. create the temp table via: postgres# create temp table batch (haplotypes_id INTEGER, asm_start_coordinate integer, asm_end_coordinate integer, asm_strand varchar(1));
    2. Run this plugin to get the csv file for import
    3. you can either load the files separately, or cat them and then load a single file In folder where the csv files live: > l -1 *.csv > csvFileList.txt > xargs cat < csvFileList.txt > all25files.csv Then sort the file to get all the header lines in 1 place, move a header line back to the top of the file, delete the other header lines. You can do that manually, or use sort, e.g. > sort all25files.csv -t, -k1,1 > all25filesSORTED.csv 3a. (NOT RUN) In the db again run: (this is if you want to dump data to a csv, then add it to a different table.) NOTE: We aren't doing this - we are getting data from this plugin, and loading that phgsmallseq=# \COPY haplotypes(haplotypes_id,asm_start_coordinate, asm_end_coordinate, asm_strand) TO '/Users/lcj34/notes_files/phg_2018/debug/zackFix_mafToGVCF_Oct2022/haplotypesPostgres_just3Cols.csv' with delimiter ',' CSV HEADER;
    4. Import data from the csv files from 2. above to the temp table named "batch" created in 1. above
      
      postgres=# \copy batch from '/workdir/lcj34/zackFixMAFToGvcf_oct2022/asmCSVs/all25files.csv' with csv header delimiter ',';
    5. Now you have data in the temp table - transfer it to the haplotypes table: (only 6 updated in this example as it was smallSeq) phgsmallseq=# update haplotypes set (asm_start_coordinate, asm_end_coordinate) = (batch.asm_start_coordinate, batch.asm_end_coordinate) from batch where batch.haplotypes_id = haplotypes.haplotypes_id and ( haplotypes.asm_start_coordinate, haplotypes.asm_end_coordinate) <> (batch.asm_start_coordinate,batch.asm_end_coordinate); UPDATE 6 phgsmallseq=#
    This will work for postgres - not sure if it will work for sqlite For the record: I used the methods outlined above to create 13167015 records for potential updates, which resulted in 12828476 udpates. It took 10.5 minutes to do this update in maize_2_1 on 11/14/22
    NOTE: the plugin took 5 hours 27 minutes to run on dc01 on Nov 10, 2022 with 86 gvcf files and lots of haplotypes. You want to run on a cbsu machine, not your laptop.
- Method Detail
  - processData
```
@Nullable
public net.maizegenetics.plugindef.DataSet processData(@Nullable
                                                                 net.maizegenetics.plugindef.DataSet input)
```
  - findAsmUpdatesForGVCF
```
@NotNull
public java.util.List<net.maizegenetics.pangenome.processAssemblyGenomes.CreateASMCoordsUpdateFilesPlugin.UpdateData> findAsmUpdatesForGVCF(@NotNull
                                                                                                                                                     java.lang.String gvcfFile,
                                                                                                                                                     @NotNull
                                                                                                                                                     java.util.Map<java.lang.Integer,net.maizegenetics.pangenome.processAssemblyGenomes.CreateASMCoordsUpdateFilesPlugin.RefRangeData> refIdToRefRangeDataMap)
```
  - getIcon
```
@Nullable
public javax.swing.ImageIcon getIcon()
```
  - getButtonName
```
@NotNull
public java.lang.String getButtonName()
```
  - getToolTipText
```
@NotNull
public java.lang.String getToolTipText()
```
  - gvcfDirectory
```
@NotNull
public java.lang.String gvcfDirectory()
```
    Local directory holding all the gvcf files that were processed into haplotypes for the database.
    
    Returns:
    
    Gvcf Directory
  - gvcfDirectory
```
@NotNull
public CreateASMCoordsUpdateFilesPlugin gvcfDirectory(@NotNull
                                                               java.lang.String value)
```
    Set Gvcf Directory. Local directory holding all the gvcf files that were processed into haplotypes for the database.
    
    Parameters:
    
    value - Gvcf Directory
    
    Returns:
    
    this plugin
  - asmCSVdir
```
@NotNull
public java.lang.String asmCSVdir()
```
    Local directory to which the haplotype update CSV files will be written.
    
    Returns:
    
    Asm C S Vdir
  - asmCSVdir
```
@NotNull
public CreateASMCoordsUpdateFilesPlugin asmCSVdir(@NotNull
                                                           java.lang.String value)
```
    Set Asm C S Vdir. Local directory to which the haplotype update CSV files will be written.
    
    Parameters:
    
    value - Asm C S Vdir
    
    Returns:
    
    this plugin
  - configFile
```
@NotNull
public java.lang.String configFile()
```
    Config file with paraeters for database connection.
    
    Returns:
    
    Config File
  - configFile
```
@NotNull
public CreateASMCoordsUpdateFilesPlugin configFile(@NotNull
                                                            java.lang.String value)
```
    Set Config File. Config file with paraeters for database connection.
    
    Parameters:
    
    value - Config File
    
    Returns:
    
    this plugin
  - queueSize
```
public int queueSize()
```
    Size of Queue used to pass information to the DB writing thread. Increase this number to have better thread utilization at the expense of RAM. If you are running into Java heap Space/RAM issues and cannot use a bigger machine, decrease this parameter.
    
    Returns:
    
    Queue Size
  - queueSize
```
@NotNull
public CreateASMCoordsUpdateFilesPlugin queueSize(int value)
```
    Set Queue Size. Size of Queue used to pass information to the DB writing thread. Increase this number to have better thread utilization at the expense of RAM. If you are running into Java heap Space/RAM issues and cannot use a bigger machine, decrease this parameter.
    
    Parameters:
    
    value - Queue Size
    
    Returns:
    
    this plugin
  - numThreads
```
public int numThreads()
```
    Number of threads used to upload. The GVCF upload will subtract 2 from this number to have the number of worker threads. It leaves 1 thread for IO to the DB and 1 thread for the Operating System.
    
    Returns:
    
    Num Threads
  - numThreads
```
@NotNull
public CreateASMCoordsUpdateFilesPlugin numThreads(int value)
```
    Set Num Threads. Number of threads used to upload. The GVCF upload will subtract 2 from this number to have the number of worker threads. It leaves 1 thread for IO to the DB and 1 thread for the Operating System.
    
    Parameters:
    
    value - Num Threads
    
    Returns:
    
    this plugin

Class CreateASMCoordsUpdateFilesPlugin

Constructor Detail

CreateASMCoordsUpdateFilesPlugin

CreateASMCoordsUpdateFilesPlugin

Method Detail

processData

findAsmUpdatesForGVCF

getIcon

getButtonName

getToolTipText

gvcfDirectory

gvcfDirectory

asmCSVdir

asmCSVdir

configFile

configFile

queueSize

queueSize

numThreads

numThreads