Now it is possible to work with bricks even in the case when the data has missing values - we automatically diagnose and handle the missing values issue considering the function to be performed:
Now it has become easier to find the brick because we've added keywords to the search. You can enter "cleansing" to find Missing Values Treatment brick, or "scaling" to find Normalization brick, or just enter PCA to find Dimensionality Reduction brick.
We've moved all notifications to the separate tab in the right sidebar, now it's easier to navigate through them. All notifications are sorted by severity, errors are always on the top of the list.
Finally, you can select, copy and paste multiple bricks on the scene. You can select bricks by holding the shift button on your keyboard or just frame them with your mouse.
Added functionality for the out-of-the-box data segmentation, which provides the cluster analysis results from the raw data, without the necessity to create the data preparation pipeline. Data Segmentation brick reproduces
data cleansing → feature engineering → modeling
pipeline and returns the data segmentation model and the data processing scenario, which can be implemented as a Datrics pipeline. Data processing scenario includes data cleansing, encoding, missing values treatment, and feature selection. The prepared features are used for the fitting of the K-Means clustering model with the optimal number of clusters. Brick supports simple and advanced modes. Simple mode is the completely automated mode, which includes the detection of the optimal number of clusters and feature engineering without the user's involvement, but in the advanced mode, the user can configure the model's hyperparameters manually.
Now you can use SSL termination to connect to your databases. Just attach certificates when you create the new data source connection.
A new operation has been added to the math formula - now we can construct the complex conditions using AND and OR and NOT logical operators, and the strings processing becomes more flexible.
Added thresholds to binary classification models. Now the cutoff point can be changed from the default 0.5 value. This threshold is used to determine the affiliation with a positive class based on the predicted probability. The threshold is taken into account when generating all applicable model performance metrics and visualizations.
Added Model Scores Distribution dashboard to the Model Performance tab for binary classification models.
Added TimeSeries Forecasting brick which supports stratification and may be used for the time series forecasting without complex settings. Time Series Forecasting brick provides the possibility to train and apply the forecasting model based on the analysis of historical time-series data with the inline capabilities of its preprocessing. Time Series Forecasting brick performs the analysis pipeline that consists of three stages - Time Series feature extraction, which includes Time Series feature extraction, Time Series Preprocessing, and Model fitting and applying. First, we detect the time-series features like a trend, seasonality, and data logging frequency, including detecting the features that might be considered additional regressors. Next, we perform the preprocessing of time-series data - outliers and missing values treatment, denoising, and discretization. And finally, fitting the stratified forecasting model and making the forecasting. Brick has two modes of usage - simple and advanced modes. In simple mode, the data preprocessing and the model hyper-parameters settings are performed automatically based on the dependencies extracted from the time series, the user should define the target and date-time variables only. In the advanced mode, the user can configure the brick with all advantages of the simplified mode, but without its limitations - we provide a very flexible combination of the manual and automatic configurations that allows introducing the expert knowledge to the time series processing pipeline. Time Series forecasting brick is equipped with a Forecasting Dashboard, which provides a detailed description of the time-series processing stages and the forecasting results.
We have improved the models' interpretability for the binary classification via extending the Model Performance dashboard with the Model Score Distribution plot. The new plot depicts the distribution of the output scores per target clases, including probability density function, and range- and quantile-based discretization plots, which reflect the share of the class items that took the specific score range.
The Datrics data processing section was extended with the Pivot Spreadsheet brick that provides a possibility to reorganize and summarize the input data using the table of grouped values that aggregates the items of an input dataset within some categorical values.
Now you can upload CSV, XLS, XLSX files with a new intuitive, visual user interface.In the new uploader interface users can: