Release notes for LensKit 2.1
The Git changelog and the list of closed tickets and pull requests provide more information on what has happened, including bugs that have been fixed.
General Changes
-
LensKit now uses Gradle (version 2.0) to build instead of Maven. Artifacts are still pushed to Maven Central.
-
Added
lenskit-all
module so you can easily depend on all of LensKit with a single dependency. -
Added
LenskitInfo
class, providing access to information about LensKit. Currently, it provides access to the Git revisions that are included in the build. This works with released binaries of LensKit, and versions built from Git sources; it will not work from tarball builds. -
Fixed equality and hashing on ratings to be consistently correct.
Data Access
-
New methods to more easily construct DAOs (#issue(346))
-
Rating
class now hashasValue()
andgetValue()
methods for convenience. -
Incompatible change:
ItemEventDAO
now has astreamEventsByItem
method, corresponding to the orthogonal methods onUserEventDAO
. Custom DAO implementations will need to provide this new method. -
Incompatible change: The
UserEventDAO
andItemEventDAO
classes now have type-filtered versions of theirstreamEventsByUser
andstreamEventsByItem
methods. Implementers will need to implement these new methods. -
Incompatible change: The
SQLStatementFactory
interface no longer exposes user and item count statements (they were unused by all current DAOs). -
Incompatible change: The
JDBCRatingDAO
is no longer injectable. It cannot be safely constructed by the injector without lifecycle support in the LensKit recommender container (#issue(456)). JDBC rating DAOs can still be injected into other components, but they should be configured with an instance binding to the particular DAO instance rather than attempting to let the injector instantiate the DAO. -
The JDBC DAO now caches query results. The cache is configurable with Guava’s cache builders (#issue(541)).
-
Implemented new
BinaryRatingDAO
to read ratings from a packed binary file of rating data. Build a file withBinaryRatingPacker
or thepack
eval task. The new CLI also provides a command to pack rating data. -
Simple file rating DAO now trims spaces from fields (#590).
-
XZ compression is now supported in most places that can compress or decompress data.
Configuration
-
Added
LenskitRecommenderEngineBuilder
andLenskitRecommenderEngineLoader
.-
Better interface for configuring build & load of recommender engines.
-
Allow multiple configurations to contribute to an engine.
-
Allow configurations to be removed (to remove e.g. DAOs from the model graph) and re-added to support building & serializing a model with one data access config and loading it with a different one.
-
-
Added
addComponent
methods to configuration contexts (#issue(457)). This is equivalent to binding a component type to itself, using it to supply dependencies on itself and any supertypes. -
RNG now configured via dependency injection.
-
Context-sensitive matches can now be anchored to the root (with
at(null)
) (#issue(344)). -
Fixed class loader problems with deserializing recommender models (#issue(434)).
-
Support compression in recommender model serialization (#issue(527))
-
Support writing recommenders to compressed files.
-
Auto-detect if a serialized recommender stream is compressed.
-
-
Use Grapht 0.8.1.
-
Remove
@Singleton
annotations to fix JEE7 compatibility (#issue(374)).
CLI
LensKit now has a command-line tool, lenskit
, included in the binary
distribution. This provides a lot of utilities for working with LensKit data
and configurations, and is the new entry point for running the evaluator
(lenskit-eval
is now deprecated).
This is in the lenskit-cli
package, replacing lenskit-package
as the source
of the binary distribution of LensKit.
Algorithms
This release includes several improvements and additions to LensKit’s selection of algorithms.
-
Added an implementation of OrdRec (in the
lenskit-predict
module) (#issue(225)). -
Report whether the score came from the baseline or primary scorer in the fallback item scorer and simple rating predictor (#issue(444)).
-
Add a rescoring item recommender, wrapping an existing item recommender and rewriting its scores (#issue(491)).
-
Add
RatingPredictorItemScorer
, a wrapper that uses the output of a rating predictor to produce item scores. Useful primarily for using things like OrdRec for item-scorer-based components. -
Behavior change: the default exclude set used by
TopNItemRecommender
now contains all items the user has interacted with, not just the ones they have rated. Baking the knowledge of ratings into the top-N recommender was arguably a mistake. -
Add
PrecomputedItemScorer
, an item scorer that has a precomputed set of scores it can return, and theExternalProcessItemScorerBuilder
to build such a scorer by running an external process.
User-user CF
-
Updated user-user CF to be simpler & make better use of new DAOs (#issue(356)).
-
Significantly sped up user-user CF.
-
Added user similarity thresholding. The default threshold is an absolute threshold of 0.0; configure
@UserSimilarityThreshold
to change this. -
Add support for a minimum neighborhood size to user-user CF.
-
Remove hard-coded minimum total similarity value from user-user CF. Minimum neighborhood sizes and user similarity thresholds cover this case.
Item-item CF
-
Refactored item-item model build:
-
Build the context with a provider rather than injecting a factory.
-
Use the strategy pattern to abstract sparse vs. non-sparse iteration, allowing sparse iteration to be disabled by overriding the default provider.
-
Better logging of the item-item model build process.
-
-
Configuration change: qualified
ItemItemModelBuilder
’s dependency on a threshold. If you previously changed this threshold, you need to qualify your threshold implementation (not value) with@ItemSimilarityThreshold
. -
Added
ItemwizeBuildContextProvider
that builds up the item-item build context without transposing the ratings matrix (#issue(481)). If your rating normalization is per-item instead of per-user, this allows item-item builds to be more efficient.
Matrix factorization
-
Refactored FunkSVD to use vectorz vectors and matrices, and have factored-out code useful for other matrix factorization recommenders.
-
Configuration change: the
FunkSVDItemScorer
no longer returns the baseline score for all items. It now only returns scores for user-item pairs where the scorer has both a user and item vector. If you were relying on the old behavior you can configure aFallbackItemScorer
withFunkSVDItemScorer
as the primary scorer:groovy bind ItemScorer to FallbackItemScorer bind (PrimaryScorer, ItemScorer) to FunkSVDItemScorer
Evaluator
-
The evaluator now supports loading algorithms from config files in both train-test and graph dumping tasks (#issue(406)).
-
In single-threaded mode, run eval tasks on the same thread as the evaluator (#issue(426)).
-
The evaluator now detects when two or more algorithm configurations have the same shared components, and only builds those components once. A cache directory can be used to cache these objects on disk; specify this with the
cache
directive intrainTest
. One consequence of this is that build times are now meaningless. This behavior can be disabled, and build times restored to meaning, by writingseparateAlgorithms true
in yourtrainTest
block. -
The evaluator can now configure algorithms from algorithm files.
-
The train-test task can now periodically write status files so its progress can be monitored.
-
Crossfolding can now sample users instead of just partitioning them (e.g. creating 5 partitions, each with 10% of the users, instead of forcing (100/N)%) (#issue(512)).
-
The evaluator supports binary packed files. The
pack
task will pack a data source, and the crossfolder will split to packed files if the train/test file names end with.pack
. -
Fixed escaping of GraphViz output (#issue(528)).
-
Added several Top-N metrics (precision and recall, mean reciprocal rank).
-
We have rewritten the evaluator metric interfaces and implementations to be easier to write and test.
-
Output change: The names of several evaluator output columns have changed:
-
MAE
is nowMAE.ByRating
. -
All entropy columns are renamed.
-
-
ExecutionInfo
now contains more data on the current execution.
Data Structures
- LensKit now uses vectorz for
(non-sparse) vectors and matrices. The
Vec
class and related classes are deprecated and will be removed in LensKit 3.0.
Utilities
- Remove
TaskGroupRunner
in favor of newTaskGraphExecutor
class.