Think of getML as Tensorflow – just for relational data.
Why getML?
Machine Learning models need features as an input. But building features by hand is an expensive process. Data scientists and experts spend up to 90% of their time on tasks related to feature engineering. We at getML build general-purpose algorithms for data scientists that automate feature engineering on any kind of relational data.
is natively stored inside a relational data model.
Feature Engineering
Transforming relational data into a flat feature table is called feature engineering.
Feature learning automates feature engineering using machine learning paradigms.
must be reduced to a single, flat feature table.
Feature Engineering
Transforming relational data into a flat feature table is called feature engineering.
Feature learning automates feature engineering using machine learning paradigms.
Feature learning automates manual feature engineering through supervised learning. This is preferable to writing and maintaining hundreds of SQL, pandas or R/data.table scripts for feature engineering. getML's algorithms allow data scientists to build end-to-end prediction pipelines in days instead of months.
Manual feature engineering is an error-prone, repetitive process that requires countless hours of meetings to obtain domain knowledge from experts. Using feature learning, data scientists let algorithms automatically learn all the relevant features logic straight from relational data.
Improving your model performance starts with finding better features. Feature learning helps you avoid the negative impact of unknown unknowns or common time constraints in the model building phase. getML helps data scientists to deliver the most accurate prediction models, faster.
What ist getML?
getML is a high performance machine learning framework to build regression and prediction models on any kind of relational data. It comes with an easy to use python API that allows to build end-to-end ML pipelines on terabytes of input data.
FastProp, Multirel & Relboost for feature learning from relational data and time series
Predict with XGB Regressor, XGB Classifier, logistics & linear or bring your own algorithm
Tune hyperparameters on a latin hypercube or using a gaussian search
Wrap feature learner ensembles and predictors in end-to-end ML pipelines
Benchmark models & insights through features
Use python, or deploy models behind a HTTP model server to serve predictions or feature transforms, or transpile pipelines to SQLite or Spark SQL.
getML is a high-performance machine learning framework to build regression and prediction models on any kind of relational data. It comes with an easy-to-use python API that allows to build end-to-end ML pipelines on terabytes of input data.
The core of the getML framework is written in C++ for maximum performance and has zero external dependencies. It has an easy-to-use API and web based user interface.
getML frames feature engineering as a machine learning problem:
To find the best set of aggregation functions and conditions, getML’s supervised learning algorithms perform an iterative, tree-based search inside relational data. This allows for the automatic generation of complex features for a given target variable on a scale and accuracy that no manual or brute-force approach can match.
How do I use it?
Machine Learning models need features as input. But building features by hand is an expensive process. Data scientists and experts spend up to 90% of their time on feature engineering related tasks. We at getML build general purpose algorithms for data scientists that automate feature engineering on any kind of relational data.
Try getML
getML is built for data scientists who love autonomy, automatization & highly accurate models.
To avoid set-up procedure you can test-drive getML in a docker environment on our test cluster.
Launch getML inside your browserStarting with getML is as easy as downloading the getML suite and pip-installing the getml python API.
Benchmarks
getML outperforms modern libraries and academic literature in terms of speed and accuracy.
Beating state-of-the-art approaches when classifying a citation network by delivering 5% better results than academia.
Notebook: CoraOutperforming Facebook’s Prophet by 11 percentage points in one-step-ahead predictions.
Notebook: Interstate 94Up to 179x faster than popular feature egineering libraries featuretools and tsfresh.
Blog Post: Introducing FastProp