Constructor and Description |
---|
CrossfoldTask() |
CrossfoldTask(String n) |
Modifier and Type | Method and Description |
---|---|
protected void |
createTTFiles()
Write train-test split files
|
protected File[] |
getFiles(String pattern)
Get the list of files satisfying the specified name pattern
|
boolean |
getForce() |
Holdout |
getHoldout() |
boolean |
getIsolate()
Query whether this task will produce isolated data sets.
|
CrossfoldMethod |
getMethod()
Get the method to be used for crossfolding.
|
String |
getName()
Get the visible name of this crossfold split.
|
int |
getPartitionCount()
Get the number of folds.
|
int |
getSampleSize() |
DataSource |
getSource()
Get the data source backing this crossfold manager.
|
String |
getTestPattern() |
String |
getTrainPattern() |
List<TTDataSet> |
getTTFiles()
Get the train-test splits as data sets.
|
boolean |
getWriteTimestamps()
Query whether timestamps will be written.
|
protected DataSource |
makeDataSource(File file) |
protected RatingWriter |
makeWriter(File file) |
List<TTDataSet> |
perform()
Run the crossfold command.
|
CrossfoldTask |
setCache(boolean on)
Configure whether the data sets created by the crossfold will have
caching turned on.
|
CrossfoldTask |
setForce(boolean force)
Set the force running option of the command.
|
CrossfoldTask |
setHoldout(int n)
Set holdout to a fixed number of items per user.
|
CrossfoldTask |
setHoldoutFraction(double f)
Set holdout to a fraction of each user's profile.
|
CrossfoldTask |
setIsolate(boolean on)
Configure whether the train-test data sets generated by this task will be isolated.
|
CrossfoldTask |
setMethod(CrossfoldMethod m)
Set the crossfold method.
|
CrossfoldTask |
setOrder(Order<Rating> o)
Set the order for the train-test splitting.
|
CrossfoldTask |
setPartitions(int partition)
Set the number of partitions to generate.
|
CrossfoldTask |
setRetain(int n)
Set holdout from using the retain part to a fixed number of items.
|
CrossfoldTask |
setSampleSize(int n)
Set the sample size (# of users sampled per partition).
|
CrossfoldTask |
setSource(DataSource source)
Set the input data source.
|
void |
setSplitUsers(boolean splitUsers)
Deprecated.
Use
setMethod(CrossfoldMethod) instead. |
CrossfoldTask |
setTest(String pat)
Set the pattern for the test set files.
|
CrossfoldTask |
setTrain(String pat)
Set the pattern for the training set files.
|
CrossfoldTask |
setWriteTimestamps(boolean pack)
Configure whether to include timestamps in the output file.
|
protected Long2IntMap |
splitUsers(UserDAO dao)
Split users ids to n splits, where n is the partitionCount
|
String |
toString() |
protected void |
writeRating(TableWriter writer,
Rating rating)
Writing a rating event to the file using table writer
|
protected void |
writeTTFilesByRatings(RatingWriter[] trainWriters,
RatingWriter[] testWriters)
Write the split files by Ratings from the DAO
|
protected void |
writeTTFilesByUsers(RatingWriter[] trainWriters,
RatingWriter[] testWriters)
Write the split files by Users from the DAO using specified holdout method
|
execute, getProject, setName, setProject
addListener, cancel, get, get, interruptTask, isCancelled, isDone, set, setException, wasInterrupted
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
addListener
public CrossfoldTask()
public CrossfoldTask(String n)
public CrossfoldTask setPartitions(int partition)
partition
- The number of paritionspublic int getSampleSize()
public CrossfoldTask setSampleSize(int n)
CrossfoldMethod.SAMPLE_USERS
.n
- The number of users to sample for each partition.public CrossfoldTask setTrain(String pat)
pat
- The training file name pattern.String.format(String, Object...)
public CrossfoldTask setTest(String pat)
pat
- The test file name pattern.setTrain(String)
public CrossfoldTask setOrder(Order<Rating> o)
o
- The sort order.RandomOrder
,
TimestampOrder
,
setHoldoutFraction(double)
,
setHoldout(int)
public CrossfoldTask setHoldout(int n)
CrossfoldMethod.PARTITION_USERS
.n
- The number of items to hold out from each user's profile.public CrossfoldTask setRetain(int n)
CrossfoldMethod.PARTITION_USERS
.n
- The number of items to train data set from each user's profile.public CrossfoldTask setHoldoutFraction(double f)
CrossfoldMethod.PARTITION_USERS
.f
- The fraction of a user's ratings to hold out.public CrossfoldTask setSource(DataSource source)
source
- The data source to use.public CrossfoldTask setForce(boolean force)
force
- The force to run option@Deprecated public void setSplitUsers(boolean splitUsers)
setMethod(CrossfoldMethod)
instead.splitUsers
- true
to split by users (CrossfoldMethod.PARTITION_USERS
),
false
to split by rating (CrossfoldMethod.PARTITION_RATINGS
).public CrossfoldMethod getMethod()
public CrossfoldTask setMethod(CrossfoldMethod m)
CrossfoldMethod.PARTITION_USERS
.m
- The crossfold method to use.public CrossfoldTask setCache(boolean on)
on
- Whether the data sets returned should cache.public CrossfoldTask setIsolate(boolean on)
on
- true
to produce isolated data sets.public boolean getIsolate()
true
if this task will produce isolated data sets.public CrossfoldTask setWriteTimestamps(boolean pack)
pack
- true
to include timestamps (the default), false
otherwise.public boolean getWriteTimestamps()
true
if output will include timestamps.public String getName()
getName
in class AbstractTask<List<TTDataSet>>
public String getTrainPattern()
public String getTestPattern()
public DataSource getSource()
public int getPartitionCount()
public Holdout getHoldout()
public boolean getForce()
public List<TTDataSet> perform() throws TaskExecutionException
perform
in class AbstractTask<List<TTDataSet>>
TaskExecutionException
protected File[] getFiles(String pattern)
pattern
- The file name patternprotected void createTTFiles() throws IOException
IOException
- if there is an error writing the files.protected void writeTTFilesByUsers(RatingWriter[] trainWriters, RatingWriter[] testWriters) throws TaskExecutionException
trainWriters
- The tableWriter that write train filestestWriters
- The tableWriter that writ test filesTaskExecutionException
protected void writeTTFilesByRatings(RatingWriter[] trainWriters, RatingWriter[] testWriters) throws TaskExecutionException
trainWriters
- The tableWriter that write train filestestWriters
- The tableWriter that writ test filesTaskExecutionException
protected void writeRating(TableWriter writer, Rating rating) throws IOException
writer
- The table writer to output the ratingrating
- The rating event to outputIOException
- The writer IO errorprotected Long2IntMap splitUsers(UserDAO dao)
dao
- The DAO of the source filepublic List<TTDataSet> getTTFiles()
protected RatingWriter makeWriter(File file) throws IOException
IOException
protected DataSource makeDataSource(File file)