public class TrainTestExperiment
extends java.lang.Object
Sets up and runs train-test evaluations. This class can be used directly, but it will usually be controlled from the train-test
command line tool in turn driven by a Gradle script. For a simpler way to programatically run an evaluation, see SimpleEvaluator
, which provides a simplified interface to train-test evaluations with cross-validation.
A train-test experiment experiment consists of three things:
Global output is aggregated into a CSV file; individual tasks or metrics may produce additional files.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
JOB_TYPE
The job type code for train-test experiments
|
Constructor and Description |
---|
TrainTestExperiment() |
Modifier and Type | Method and Description |
---|---|
void |
addAlgorithm(AlgorithmInstance algo)
Add an algorithm to the experiment.
|
void |
addAlgorithm(java.lang.String name,
groovy.lang.Closure<?> block)
Add an algorithm configured by a Groovy closure.
|
void |
addAlgorithm(java.lang.String name,
java.nio.file.Path file)
Add one or more algorithms by loading a config file.
|
void |
addAlgorithms(java.util.List<AlgorithmInstance> algos)
Add multiple algorithm instances.
|
void |
addAlgorithms(java.nio.file.Path file)
Add one or more algorithms from a configuration file.
|
void |
addDataSet(DataSet ds)
Add a data set.
|
void |
addDataSets(java.util.List<DataSet> dss)
Add several data sets.
|
void |
addTask(EvalTask task)
Add an evaluation task.
|
Table |
execute()
Run the experiment.
|
java.util.List<AlgorithmInstance> |
getAlgorithms()
Get the algorithm instances.
|
java.nio.file.Path |
getCacheDirectory()
Get the cache directory for model components.
|
java.lang.ClassLoader |
getClassLoader()
Get the class loader for this experiment.
|
boolean |
getContinueAfterError()
Query whether this task will continue in the face of an error.
|
java.util.List<DataSet> |
getDataSets()
Get the list of data sets to use.
|
java.nio.file.Path |
getOutputFile()
Get the primary output file.
|
ExperimentOutputLayout |
getOutputLayout() |
int |
getParallelTasks()
Get the number of evaluation tasks to permit to run in parallel.
|
boolean |
getShareModelComponents()
Query whether this experiment will cache and share components.
|
java.util.List<EvalTask> |
getTasks()
Get the eval tasks to be used in this experiment.
|
int |
getThreadCount()
Get the number of threads that the experiment may use.
|
java.nio.file.Path |
getUserOutputFile()
Get the per-user output file.
|
static TrainTestExperiment |
load(java.nio.file.Path file)
Load a train-test experiment from a YAML file.
|
void |
setCacheDirectory(java.nio.file.Path dir)
Set the cache directory for model components.
|
void |
setClassLoader(java.lang.ClassLoader loader)
Set the class loader for this experiment.
|
void |
setContinueAfterError(boolean c)
Configure whether the experiment will continue after an error.
|
void |
setOutputFile(java.nio.file.Path out)
Set the primary output file.
|
void |
setParallelTasks(int pt)
Set the number of parallel experiment tasks to run.
|
void |
setShareModelComponents(boolean shares)
Control whether model components will be shared.
|
void |
setThreadCount(int tc)
Set the number of threads the experiment may use.
|
void |
setUserOutputFile(java.nio.file.Path file)
Set the per-user output file.
|
public static final java.lang.String JOB_TYPE
The job type code for train-test experiments
TrackedJob.getType()
,
Constant Field Valuespublic void setOutputFile(java.nio.file.Path out)
Set the primary output file.
out
- The file where the primary aggregate output should go.public java.nio.file.Path getOutputFile()
Get the primary output file.
public java.nio.file.Path getUserOutputFile()
Get the per-user output file.
public void setUserOutputFile(java.nio.file.Path file)
Set the per-user output file.
file
- The file for per-user measurements.public java.util.List<AlgorithmInstance> getAlgorithms()
Get the algorithm instances.
public void addAlgorithm(AlgorithmInstance algo)
Add an algorithm to the experiment.
algo
- The algorithm to add.public void addAlgorithms(java.util.List<AlgorithmInstance> algos)
Add multiple algorithm instances.
algos
- The algorithm instances to add.public void addAlgorithm(java.lang.String name, groovy.lang.Closure<?> block)
Add an algorithm configured by a Groovy closure. Mostly useful for testing.
name
- The algorithm name.block
- The algorithm configuration block.public void addAlgorithm(java.lang.String name, java.nio.file.Path file)
Add one or more algorithms by loading a config file.
name
- The algorithm name.file
- The config file to load.public void addAlgorithms(java.nio.file.Path file)
Add one or more algorithms from a configuration file.
file
- The configuration file.public java.util.List<DataSet> getDataSets()
Get the list of data sets to use.
public void addDataSet(DataSet ds)
Add a data set.
ds
- The data set to add.public void addDataSets(java.util.List<DataSet> dss)
Add several data sets.
dss
- The data sets to add.public boolean getShareModelComponents()
Query whether this experiment will cache and share components.
true
if model components will be shared.setShareModelComponents(boolean)
public void setShareModelComponents(boolean shares)
Control whether model components will be shared. If setCacheDirectory(Path)
is also set, components will be cached on disk; otherwise, they will be opportunistically shared in memory.
Cached output improves throughput and memory use, but makes build times effectively meaningless. It is turned on by default, but turn it off if you want to measure recommender build times.
shares
- true
to enable caching of shared model components.public java.nio.file.Path getCacheDirectory()
Get the cache directory for model components.
public void setCacheDirectory(java.nio.file.Path dir)
Set the cache directory for model components.
dir
- The directory where model components will be cached.public int getThreadCount()
Get the number of threads that the experiment may use.
public void setThreadCount(int tc)
Set the number of threads the experiment may use.
tc
- The number of threads that the experiment may use. If 0 (the default), consults the property lenskit.eval.threadCount
, and if that is unset, uses as many threads as there are available processors according to Runtime.availableProcessors()
.public int getParallelTasks()
Get the number of evaluation tasks to permit to run in parallel. Reducing this can be useful for reducing the memory use of LensKit.
public void setParallelTasks(int pt)
Set the number of parallel experiment tasks to run. If 0 (the default), then up to getThreadCount()
jobs can run in parallel. This can be reduced to reduce memory use (by having fewer models in memory); the extra threads may still be used to speed up individual evaluations.
pt
- The number of tasks to run in parallel, or 0 to have no limit.public boolean getContinueAfterError()
Query whether this task will continue in the face of an error.
true
if the experiment will keep going if a segment fails.public void setContinueAfterError(boolean c)
Configure whether the experiment will continue after an error.
c
- true
to continue after an error.public java.lang.ClassLoader getClassLoader()
Get the class loader for this experiment.
public void setClassLoader(java.lang.ClassLoader loader)
Set the class loader for this experiment.
loader
- The class loader to use.public java.util.List<EvalTask> getTasks()
Get the eval tasks to be used in this experiment.
public void addTask(EvalTask task)
Add an evaluation task.
task
- An evaluation task to run.public Table execute()
Run the experiment.
public ExperimentOutputLayout getOutputLayout()
public static TrainTestExperiment load(java.nio.file.Path file) throws java.io.IOException
Load a train-test experiment from a YAML file.
file
- The file to load.java.io.IOException