.: Sally

A Tool for Embedding Strings in Vector Spaces

Installation

Dependencies

The following libraries are required for building Sally from source code. These libraries are available as packages with many operating system distributions, e.g. Debian Linux and MacPorts (see detailed list of dependencies).

   >= zlib-1.2.1          http://www.zlib.net
   >= libconfig-1.4       http://www.hyperrealm.com/libconfig/      
   >= libarchive-2.70     http://libarchive.github.com/

Compilation

Sally follows the standard compilation procedure of GNU software. It has been successfully compiled on Linux, Mac OS X and OpenBSD.

  $ ./configure [options]
  $ make
  $ make check
  $ make install

Configuration options

  --prefix=PATH           Set directory prefix for installation

By default Sally is installed into /usr/local. If you prefer a different location, use this option to select an installation directory.

  --enable-libarchive     Enable support for loading archives

If this feature is enabled, Sally can also be applied to read the contents of archives, such as .tgz and .zip. This allows for processing string data in compressed form and may drastically save storage space.

  --enable-openmp         Enable support for OpenMP 

This feature enables support for OpenMP in Sally. It is still experimental. Sally will execute certain parts of the processing in parallel making use of multi-core architectures where possible.

  --enable-md5hash        Enable MD5 as alternative hash

Sally uses a hash function for mapping different features to different dimensions in the vector space. By default the very efficient Murmur hash is used for this task. In certain critical cases, however, it may be useful to use a cryptographical hash as MD5.