Now you can enjoy all features from the Datrics Release 1.2.0 on the platform!
We've made great speed improvements and now your pipelines will run 5x faster, we've also made a lot of improvements to the user interface so that it becomes easier to create pipelines. For advanced users, we've implemented several new bricks: dimensionality reduction, binning without a target, and new encoding.
In 1.2.0 we extended the data processing's functionality of the platform by adding the ability to make the data transforming from the high-dimensional space into the low-dimensional space, ensuring that the resulted representation keeps the meaningful properties of the initial data. Dimensionality reduction techniques are widely used in Data Analysis and Machine Learning and are part of the Feature Engineering process. We started with Principal Component Analysis (PCA) as one of the techniques which, on the one hand, allows reducing the computational time of the learning process and on the other hand handles the multicollinearity issue, which is very important for linear models like Logistics Regression.
The imbalanced dataset is a well-known problem in predictive analytics. The severe skew in the distribution of classes can impact many machine learning algorithms - some of them tend to be focused on the major class and ignore the minority class. One of the approaches for solving this issue - to make the random resampling of the training dataset so that the impacts of the classes will be equalized. In 1.2.0 We added to the component's library the brick, which allows implementing one of the two main approaches to imbalanced dataset resampling - oversampling (duplicate the examples of the minority class) and undersampling (delete examples from the majority class).
Starting with 1.2.0, the binning of the numerical variables doesn't require the binary target variable - the user can either automatically or manually make the continuous variable quantization via the Binning Improvements dashboard. As before, the target variable allows to make the assessment of the categorical representation of the continuous variable and get the optimal bins concerning it.
In 1.2.0 the team has focused on improving and expanding the functionality of the Encoding functionality. Two major updates have been made to the Encoding brick. The first one is the possibility to manually add unseen values for a column to be encoded, so you can be absolutely sure the encoder is trained correctly even if your data sample might be not perfectly representable. The other update is an additional encoding mode - Weight of Evidence, this mode is quite useful when working with a Logistic Regression.
We continue to work under the adding of the AutoML functionality to the predictive pipelines. In 1.2.0 we extended the training options of the Logistics Regression model with Advanced Mode, which includes the Recursive Feature Elimination (RFE) method. Our implementation of the RFE mode for the Logistic regression includes the removing of the non-informative and highly correlated features with the iterative removing of the weakest features with respect to the model's quality criterion.
Having duplicated observations is quite common when building ML pipelines. In 1.2.0 we've added the possibility to work with data containing duplicates. Duplicates Treatment brick allows the user to treat duplicates in two different modes. The first mode works on every column and removes duplicated rows. The second mode allows limiting the columns that are to be treated as a unique key identifier and the remaining columns will automatically be selected based on the first/last occurrence of a value.
Having API Key management is quite a common thing. When you create the model API or Pipeline API you want to integrate it in your backend or share access to it with somebody else. Currently, it's possible by generating API keys in the corresponding section on the platform.
You can generate API keys and revoke them in the convenient admin panel.
The creation of reproducible workflows never was so easy. In 1.2.0 you can generate feature in 1 pipeline and then transform this pipeline to the API so that it will transform raw data in the format of features that your model accept.
Also in 1.2.0, we've created the possibility to test your pipeline`s APIs straight on the platform with the same UI that you've used previously to test models.
Having multiple versions of models and usually is a pain. We know that and in 1.2.0 we created the possibility to replace the existing model with the new one without changing the API link. You can test your models separately and then just click on the 1 button to replace the old model with the new one without the need to change the code that uses it.
The way how we've provided the possibility to replace deployed model without changing the API link to it we've made the same with APIs for pipelines.
There are a lot of data warehouses and databases that exist and different companies use different. In datrics we know about that and we're happy to present the connector to the Exasol data warehouse. You can use it the same way that you've used Postgres or MySQL connectors. Just set up a connection to Exasol and drag&drop it to the pipeline canvas.