public class Frame extends Lockable<Frame>
Vec
s, essentially an R-like Distributed Data Frame.
Frames represent a large distributed 2-D table with named columns
(Vec
s) and numbered rows. A reasonable column limit is
100K columns, but there's no hard-coded limit. There's no real row
limit except memory; Frames (and Vecs) with many billions of rows are used
routinely.
A Frame is a collection of named Vecs; a Vec is a collection of numbered
Chunk
s. A Frame is small, cheaply and easily manipulated, it is
commonly passed-by-Value. It exists on one node, and may be
stored in the DKV
. Vecs, on the other hand, must be stored in the
DKV
, as they represent the shared common management state for a collection
of distributed Chunks.
Multiple Frames can reference the same Vecs, although this sharing can
make Vec lifetime management complex. Commonly temporary Frames are used
to work with a subset of some other Frame (often during algorithm
execution, when some columns are dropped from the modeling process). The
temporary Frame can simply be ignored, allowing the normal GC process to
reclaim it. Such temp Frames usually have a null
key.
All the Vecs in a Frame belong to the same Vec.VectorGroup
which
then enforces Chunk
row alignment across Vecs (or at least enforces
a low-cost access model). Parallel and distributed execution touching all
the data in a Frame relies on this alignment to get good performance.
Example: Make a Frame from a CSV file:
File file = ... NFSFileVec nfs = NFSFileVec.make(file); // NFS-backed Vec, lazily read on demand Frame fr = water.parser.ParseDataset.parse(Key.make("myKey"),nfs._key);
Example: Find and remove the Vec called "unique_id" from the Frame, since modeling with a unique_id can lead to overfitting:
Vec uid = fr.remove("unique_id");
Example: Move the response column to the last position:
fr.add("response",fr.remove("response"));
Modifier and Type | Class and Description |
---|---|
static class |
Frame.VecSpecifier
Pair of (column name, Frame key).
|
Modifier and Type | Field and Description |
---|---|
java.lang.String[] |
_names
Vec names
|
Constructor and Description |
---|
Frame(Frame fr)
Deep copy of Vecs and Keys and Names (but not data!) to a new random Key.
|
Frame(Key key)
Creates an empty frame with given key.
|
Frame(Key key,
java.lang.String[] names,
Vec[] vecs)
Creates a frame with given key, names and vectors.
|
Frame(Key key,
Vec[] vecs,
boolean noChecks)
Special constructor for data with unnamed columns (e.g.
|
Frame(java.lang.String[] names,
Vec[] vecs)
Creates an internal frame composed of the given Vecs and names.
|
Frame(Vec... vecs)
Creates an internal frame composed of the given Vecs and default names.
|
Modifier and Type | Method and Description |
---|---|
Frame |
add(Frame fr)
Append a Frame onto this Frame.
|
void |
add(java.lang.String[] names,
Vec[] vecs) |
Vec |
add(java.lang.String name,
Vec vec)
Append a named Vec to the Frame.
|
Vec |
anyVec()
Returns the first readable vector.
|
long |
byteSize()
The
Vec.byteSize of all Vecs |
boolean |
checkCompatible(Frame fr)
Quick compatibility check between Frames.
|
protected long |
checksum_impl()
64-bit checksum of the checksums of the vecs.
|
Frame |
deepSlice(java.lang.Object orows,
java.lang.Object ocols)
In support of R, a generic Deep Copy and Slice.
|
static java.lang.String |
defaultColName(int col)
Default column name maker
|
java.lang.String[][] |
domains()
All the domains for enum columns; null for non-enum columns.
|
Frame |
extractFrame(int startIdx,
int endIdx)
Split this Frame; return a subframe created from the given column interval, and
remove those columns from this Frame.
|
int |
find(java.lang.String name)
Finds the column index with a matching name, or -1 if missing
|
int[] |
find(java.lang.String[] names)
Bulk
find(String) api |
int |
find(Vec vec)
Finds the matching column index, or -1 if missing
|
Key[] |
keys()
The array of keys.
|
Vec |
lastVec()
Convenience to accessor for last Vec
|
java.lang.String |
lastVecName()
Convenience to accessor for last Vec name
|
Frame |
makeCompatible(Frame f)
Return Frame 'f' if 'f' is compatible with 'this', else return a new
Frame compatible with 'this' and a copy of 'f's data otherwise.
|
java.lang.String |
name(int i)
A single column name.
|
java.lang.String[] |
names()
The array of column names.
|
int |
numCols()
Number of columns
|
long |
numRows()
Number of rows
|
Futures |
postWrite(Futures fs)
Allow rollups for all written-into vecs; used by
MRTask once
writing is complete. |
Vec[] |
reloadVecs()
Force a cache-flush and reload, assuming vec mappings were altered
remotely, or that the _vecs array was shared and now needs to be a
defensive copy.
|
Futures |
remove_impl(Futures fs)
Actually remove/delete all Vecs from memory, not just from the Frame.
|
Vec |
remove(int idx)
Removes a numbered column.
|
Vec[] |
remove(int[] idxs)
Removes a list of columns by index; the index list must be sorted
|
Vec |
remove(java.lang.String name)
Removes the column with a matching name.
|
Frame |
remove(java.lang.String[] names) |
Vec |
replace(int col,
Vec nv)
Replace one column with another.
|
void |
restructure(java.lang.String[] names,
Vec[] vecs)
Restructure a Frame completely
|
Frame |
subframe(java.lang.String[] names)
Returns a subframe of this frame containing only vectors with desired names.
|
Frame[] |
subframe(java.lang.String[] names,
double c)
Returns a new frame composed of vectors of this frame selected by given names.
|
void |
swap(int lo,
int hi)
Swap two Vecs in-place; useful for sorting columns by some criteria
|
java.io.InputStream |
toCSV(boolean headers,
boolean hex_string)
Convert this Frame to a CSV (in an
InputStream ), that optionally
is compatible with R 3.1's recent change to read.csv()'s behavior. |
java.lang.String |
toString() |
java.lang.String |
toString(long off,
int len) |
Vec |
vec(int idx)
Returns the Vec by given index, implemented by code:
vecs()[idx] . |
Vec |
vec(java.lang.String name)
Return a Vec by name, or null if missing
|
Vec[] |
vecs()
The internal array of Vecs.
|
Vec[] |
vecs(int[] idxs) |
delete_and_lock, delete, delete, delete, read_lock, read_lock, unlock_all, unlock, update, write_lock
clone, frozenType, read_impl, read, readExternal, readJSON_impl, readJSON, toJsonString, write_impl, write, writeExternal, writeHTML_impl, writeHTML, writeJSON_impl, writeJSON
public Frame(Vec... vecs)
public Frame(java.lang.String[] names, Vec[] vecs)
public Frame(Key key)
public Frame(Key key, Vec[] vecs, boolean noChecks)
key
- vecs
- noChecks
- public Frame(Key key, java.lang.String[] names, Vec[] vecs)
public Frame(Frame fr)
public static java.lang.String defaultColName(int col)
public boolean checkCompatible(Frame fr)
public int numCols()
public long numRows()
public final Vec anyVec()
public java.lang.String[] names()
public java.lang.String name(int i)
public Key[] keys()
public final Vec[] vecs()
DKV
.public final Vec[] vecs(int[] idxs)
public Vec lastVec()
public java.lang.String lastVecName()
public final Vec[] reloadVecs()
public final Vec vec(int idx)
vecs()[idx]
.idx
- idx of columnnull
public Vec vec(java.lang.String name)
public int find(java.lang.String name)
public int find(Vec vec)
public int[] find(java.lang.String[] names)
find(String)
apinames
arraypublic java.lang.String[][] domains()
public long byteSize()
Vec.byteSize
of all VecsVec.byteSize
of all Vecsprotected long checksum_impl()
checksum_impl
in class Keyed<Frame>
public void add(java.lang.String[] names, Vec[] vecs)
public Vec add(java.lang.String name, Vec vec)
public Frame add(Frame fr)
public void swap(int lo, int hi)
public Frame subframe(java.lang.String[] names)
names
- list of vector namesjava.lang.IllegalArgumentException
- if there is no vector with desired name in this frame.public Frame[] subframe(java.lang.String[] names, double c)
names
- names of vector to compose a subframec
- value to fill missing columns.public Futures postWrite(Futures fs)
MRTask
once
writing is complete.public Futures remove_impl(Futures fs)
remove_impl
in class Keyed<Frame>
public Vec replace(int col, Vec nv)
public Frame extractFrame(int startIdx, int endIdx)
startIdx
- index of first column (inclusive)endIdx
- index of the last column (exclusive)public Vec remove(java.lang.String name)
public Frame remove(java.lang.String[] names)
public Vec[] remove(int[] idxs)
public final Vec remove(int idx)
public void restructure(java.lang.String[] names, Vec[] vecs)
public Frame deepSlice(java.lang.Object orows, java.lang.Object ocols)
Semantics are a little odd, to match R's. Each dimension spec can be:
The numbering is 1-based; zero's are not allowed in the lists, nor are out-of-range values.
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String toString(long off, int len)
public Frame makeCompatible(Frame f)
this
s' data.f
.public java.io.InputStream toCSV(boolean headers, boolean hex_string)
InputStream
), that optionally
is compatible with R 3.1's recent change to read.csv()'s behavior.