Télécharger la liste

Description du projet

Duke is a fast and flexible record linkage engine. It does not use the traditional blocking (sort by key) approach, but instead relies on Lucene. This makes it high-performance (able to process 1,000,000 records in ~10 minutes). Duke can be run from the command line, but also has an API allowing incremental linking applications to be built easily. It supports reading data from CSV, JDBC, SPARQL, and NTriples, and also supports a number of string comparators and string normalizers.

Système requise

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2012-01-14 01:18
0.4

Cette version ajoute une API plus souple, un nouveau nettoyant (pour les noms de personnes), deux nouvelles sources de données (en mémoire et JNDI) et un certain nombre de corrections de bogues. Certains utilitaires supplémentaires ont également été ajoutés.
This release adds a more flexible API, a new cleaner (for personal names), two new data sources (in-memory and JNDI), and a number of bugfixes. Some additional utilities have also been added.

2011-09-12 02:05
0.3

Cette version propose une API nettoyé et plus comparateurs.
This release offers a cleaned-up API and more comparators.

2011-06-02 16:55
0.2

Cette version corrige un certain nombre de bugs et ajoute un certain nombre d'améliorations. Exemple de données et de configuration sont maintenant inclus dans la distribution. Nouveau JaroWinklerTokenized et les comparateurs DifferentComparator ont été fournis avec une nouvelle flexibilité DebugCompare de commande, plus la source de données CSV, de meilleurs rapports d'erreurs de configuration, et un - verbose option.
This version fixes a number of bugs and adds a number of improvements. Example data and setup are now included in the distribution. New JaroWinklerTokenized and DifferentComparator comparators were provided along with a new DebugCompare command, more flexibility in the CSV data source, better reporting of configuration errors, and a --verbose option.

2011-05-21 07:46
0.1

La première version.
The first version.

Project Resources