Class DatasetProfile

java.lang.Object
com.whylogs.core.DatasetProfile
All Implemented Interfaces:
java.io.Serializable

public class DatasetProfile
extends java.lang.Object
implements java.io.Serializable
Representing a DatasetProfile that tracks
See Also:
Serialized Form
  • Constructor Summary

    Constructors
    Constructor Description
    DatasetProfile​(@NonNull java.lang.String sessionId, @NonNull java.time.Instant sessionTimestamp, @NonNull java.util.Map<java.lang.String,​java.lang.String> tags)
    Create a new Dataset profile
    DatasetProfile​(@NonNull java.lang.String sessionId, @NonNull java.time.Instant sessionTimestamp, java.time.Instant dataTimestamp, @NonNull java.util.Map<java.lang.String,​java.lang.String> tags, @NonNull java.util.Map<java.lang.String,​ColumnProfile> columns)
    DEVELOPER API.
    DatasetProfile​(java.lang.String sessionId, java.time.Instant sessionTimestamp)  
  • Method Summary

    Modifier and Type Method Description
    static DatasetProfile fromProtobuf​(com.whylogs.core.message.DatasetProfileMessage message)  
    java.util.Map<java.lang.String,​ColumnProfile> getColumns()  
    ModelProfile getModelProfile()  
    DatasetProfile merge​(@NonNull DatasetProfile other)
    Merge the data of another DatasetProfile into this one.
    DatasetProfile mergeStrict​(@NonNull DatasetProfile other)  
    static DatasetProfile parse​(java.io.InputStream in)  
    byte[] toBytes()  
    java.util.Iterator<com.whylogs.core.message.MessageSegment> toChunkIterator()  
    com.whylogs.core.message.DatasetProfileMessage.Builder toProtobuf()  
    com.whylogs.core.message.DatasetSummary toSummary()  
    void track​(java.lang.String columnName, java.lang.Object data)  
    void track​(java.util.Map<java.lang.String,​?> columns)  
    DatasetProfile withAllMetadata​(java.util.Map<java.lang.String,​java.lang.String> metadata)  
    DatasetProfile withMetadata​(java.lang.String key, java.lang.String value)  
    DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target)  
    DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target, java.lang.Iterable<java.lang.String> additionalOutputFields)  
    DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target, java.lang.String score)  
    DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target, java.lang.String score, java.lang.Iterable<java.lang.String> additionalOutputFields)
    Returns a new dataset profile with the same backing datastructure.
    void writeTo​(java.io.OutputStream out)  

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • DatasetProfile

      public DatasetProfile​(@NonNull @NonNull java.lang.String sessionId, @NonNull @NonNull java.time.Instant sessionTimestamp, @Nullable java.time.Instant dataTimestamp, @NonNull @NonNull java.util.Map<java.lang.String,​java.lang.String> tags, @NonNull @NonNull java.util.Map<java.lang.String,​ColumnProfile> columns)
      DEVELOPER API. DO NOT USE DIRECTLY
      Parameters:
      sessionId - dataset name
      sessionTimestamp - the timestamp for the current profiling session
      dataTimestamp - the timestamp for the dataset. Used to aggregate across different cadences
      tags - tags of the dataset
      columns - the columns that we're copying over. Note that the source of columns should stop using these column objects as they will back this DatasetProfile instead
    • DatasetProfile

      public DatasetProfile​(@NonNull @NonNull java.lang.String sessionId, @NonNull @NonNull java.time.Instant sessionTimestamp, @NonNull @NonNull java.util.Map<java.lang.String,​java.lang.String> tags)
      Create a new Dataset profile
      Parameters:
      sessionId - the name of the dataset profile
      sessionTimestamp - the timestamp for this run
      tags - the tags to track the dataset with
    • DatasetProfile

      public DatasetProfile​(java.lang.String sessionId, java.time.Instant sessionTimestamp)
  • Method Details

    • getColumns

      public java.util.Map<java.lang.String,​ColumnProfile> getColumns()
    • getModelProfile

      public ModelProfile getModelProfile()
    • withMetadata

      public DatasetProfile withMetadata​(java.lang.String key, java.lang.String value)
    • withAllMetadata

      public DatasetProfile withAllMetadata​(java.util.Map<java.lang.String,​java.lang.String> metadata)
    • track

      public void track​(java.lang.String columnName, java.lang.Object data)
    • track

      public void track​(java.util.Map<java.lang.String,​?> columns)
    • withModelProfile

      public DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target, java.lang.String score, java.lang.Iterable<java.lang.String> additionalOutputFields)
      Returns a new dataset profile with the same backing datastructure. However, this new object contains a ClassificationMetrics object
      Returns:
      a new DatasetProfile object
    • withModelProfile

      public DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target, java.lang.String score)
    • withModelProfile

      public DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target)
    • withModelProfile

      public DatasetProfile withModelProfile​(java.lang.String prediction, java.lang.String target, java.lang.Iterable<java.lang.String> additionalOutputFields)
    • toSummary

      public com.whylogs.core.message.DatasetSummary toSummary()
    • toChunkIterator

      public java.util.Iterator<com.whylogs.core.message.MessageSegment> toChunkIterator()
    • mergeStrict

      public DatasetProfile mergeStrict​(@NonNull @NonNull DatasetProfile other)
    • merge

      public DatasetProfile merge​(@NonNull @NonNull DatasetProfile other)
      Merge the data of another DatasetProfile into this one.

      We will only retain the shared tags and share metadata. The timestamps are copied over from this dataset. It is the responsibility of the user to ensure that the two datasets are matched on important grouping information

      Parameters:
      other - a DatasetProfile
      Returns:
      a merged DatasetProfile with summed up columns
    • toProtobuf

      public com.whylogs.core.message.DatasetProfileMessage.Builder toProtobuf()
    • writeTo

      public void writeTo​(java.io.OutputStream out) throws java.io.IOException
      Throws:
      java.io.IOException
    • toBytes

      public byte[] toBytes() throws java.io.IOException
      Throws:
      java.io.IOException
    • fromProtobuf

      public static DatasetProfile fromProtobuf​(com.whylogs.core.message.DatasetProfileMessage message)
    • parse

      public static DatasetProfile parse​(java.io.InputStream in) throws java.io.IOException
      Throws:
      java.io.IOException