The first ML framework
for relational learning.

Think of getML as Tensorflow – just for relational data.

Why getML?

Billions of features with
~20 lines of python

Machine Learning models need features as an input. But building features by hand is an expensive process. Data scientists and experts spend up to 90% of their time on tasks related to feature engineering. We at getML build general-purpose algorithms for data scientists that automate feature engineering on any kind of relational data.

The data format you have

is natively stored inside a relational data model.

Feature Engineering

Transforming relational data into a flat feature table is called feature engineering.

Feature learning automates feature engineering using machine learning paradigms.

The data ML algorithms require

must be reduced to a single, flat feature table.

Feature Engineering

Transforming relational data into a flat feature table is called feature engineering.

Feature learning automates feature engineering using machine learning paradigms.

Benefits of getML for feature learning

Feature learning boosts your productivity

Feature learning automates manual feature engineering through supervised learning. This is preferable to writing and maintaining hundreds of SQL, pandas or R/data.table scripts for feature engineering. getML's algorithms allow data scientists to build end-to-end prediction pipelines in days instead of months.

Algorithms that discover domain specific patterns

Manual feature engineering is an error-prone, repetitive process that requires countless hours of meetings to obtain domain knowledge from experts. Using feature learning, data scientists let algorithms automatically learn all the relevant features logic straight from relational data.

Great features lead to high ML model accuracy

Improving your model performance starts with finding better features. Feature learning helps you avoid the negative impact of unknown unknowns or common time constraints in the model building phase. getML helps data scientists to deliver the most accurate prediction models, faster.

What ist getML?

All you need to build
end-to-end ML pipelines.

Load Data

Python

From Pandas, pyspark, pyarrow, Dict or JSON

Database connectors

Unified import interface for PostgreSQL, MySQL, MariaDB, SQLite3, SAP HANA, Greenplum or from any other ODBC compatible database

File storage

Import your data from CSV, parquet or AWS S3 buckets

Machine Learning

Feature Learning

FastProp, Multirel & Relboost for feature learning from relational data and time series

Predicting

Predict with XGB Regressor, XGB Classifier, logistics & linear or bring your own algorithm

Hyperparameter optimization

Tune hyperparameters on a latin hypercube or using a gaussian search

Evaluate & Deploy

Train pipelines

Wrap feature learner ensembles and predictors in end-to-end ML pipelines

Evaluate

Benchmark models & insights through features

Deploy

Use python, or deploy models behind a HTTP model server to serve predictions or feature transforms, or transpile pipelines to SQLite or Spark SQL.

getML is a high-performance machine learning framework to build regression and prediction models on any kind of relational data. It comes with an easy-to-use python API that allows to build end-to-end ML pipelines on terabytes of input data.

getML

The core of the getML framework is written in C++ for maximum performance and has zero external dependencies. It has an easy-to-use API and web based user interface.

Engine

  • Core of the getML framework
  • Standalone application that handles I/O, feature learning & AutoML
  • Implements a high-performance data management layer for ML models at terabyte scale

Python API

  • Open-source license, available on pip
  • Wrapper around the getML engine for easy integration of relational learning into existing data science workflows
  • Sends all the instructions & data to the getML engine

Monitor

  • Comes with the getML engine
  • Web frontend for data exploration, easy inspection of trained models and learned features

How feature learning works

To find the best set of aggregation functions and conditions, getML’s supervised learning algorithms perform an iterative, tree-based search inside relational data. This allows for the automatic generation of complex features for a given target variable on a scale and accuracy that no manual or brute-force approach can match.

How do I use it?

>>>

Try getML

It takes less than 30 seconds to get started.

getML is built for data scientists who love autonomy, automatization & highly accurate models.

Starting in less than 30 seconds

To avoid set-up procedure you can test-drive getML in a docker environment on our test cluster.

Launch getML inside your browser

Local setup

Starting with getML is as easy as downloading the getML suite and pip-installing the getml python API.

Benchmarks

Beating the state-of-the-art in Relational Learning

getML outperforms modern libraries and academic literature in terms of speed and accuracy.

5%

Beating state-of-the-art approaches when classifying a citation network by delivering 5% better results than academia.

Notebook: Cora
11%

Outperforming Facebook’s Prophet by 11 percentage points in one-step-ahead predictions.

Notebook: Interstate 94
179x

Up to 179x faster than popular feature egineering libraries featuretools and tsfresh.

Blog Post: Introducing FastProp
Reproduce all benchmark results in your browser. Try getML now.