Sponsored Links
Directory Sites
Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering
A library of C code useful for writing statistical text analysis, language modeling, and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow). [LGPL]
www.cs.cmu.edu
A set of tools for Windows 2000/NT/XP that allow you to build statistical models from data. [Free]
research.microsoft.com
Generates Gaussian mixture models for large datasets using efficient KD-clustering algorithms. [Free]
www.cs.cmu.edu
An algorithm that incrementally constructs decision trees from labeled examples. [AFL]
www.cs.umass.edu
A collection of tools that implement decision trees and tables, rule learners, Naive Bayes, support vector machines, voted perceptrons, multi-layer perceptron. Meta schemes include bagging, stacking, and boosting. Written in Java. [GPL]
www.cs.waikato.ac.nz
NEITHER: A Propositional Theory Refinement
A system to modify an incomplete or incorrect rule base to make it consistent with a set of input training examples. Written in C++ [Free]
www.cs.utexas.edu
A software package that integrates more than 22 neural network, statistical, and machine learning classification, clustering, and feature selection algorithms into a modular software package. [GPL]
www.ll.mit.edu
PRODIGY: An Architecture for Planning and Learning
A system of research planning and learning utilizing explanation-based learning, partial evaluation, experimentation, graphical knowledge acquisition, automatic abstraction, mixed-initiative planning, and case-based reasoning. [Free]
www.cs.cmu.edu
Machine Learning Software Packages
FTP software repository of machine learning programs from Carnegie Mellon University School of Computer Science. There are also links to other repositories. [GPL]
www.cs.cmu.edu
CHILL: Constructive Heuristics Induction for Language Learning
A general approach to the problem of inducing natural language parsers. It uses an annotated corpus, and produces a parser by using ILP for inducing the rules that control the actions of a shift-reduce parser. [Free]
www.cs.utexas.edu
Meta-MEME: Motif-based Hidden markov Modeling of Biological Sequences
Software toolkit for building and using motif-based hidden Markov models of DNA and proteins. There is an online interactive version. Source written in C. [GPL]
metameme.sdsc.edu
SUBDUE: Graph Based Knowledge Discovery
A program which discovers interesting and repetitive subgraphs in labeled graph representations using the minimum description length principle. Includes applications to molecular biology. [Free]
cygnus.uta.edu
HMM and other statistical programs
This tool implements Hidden Markov Models and application to part-of-speech tagging. Also available; a multivariate hypothesis testing software for gaussian data, and a groundtruth/metadata editing and visualizing toolkit for OCR. [GPL]
www.cfar.umd.edu
Pfam: Database of Protein Families and HMMs
A large collection of multiple sequence alignments and trained hidden Markov models covering many common protein domains. Alignments are included as well as models for 8296 protein families, based on the Swissprot 48.9 and SP-TrEMBL 31.9 protein sequence databases. [GPL]
pfam.wustl.edu
MIX: Software for Mixture Distributions
Software for learning mixture distributions. Examples and two case studies are included. [Commercial]
icarus.math.mcmaster.ca