Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms

Korablev, Yuri; Sudakov, V. A.

doi:10.31857/S0424738825010039

Home>Issue number 1>Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms

Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms

Table of contents

Annotation Estimate Publication content

References Comments

Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms

Annotation

PII

S0424738825010039-1

DOI

10.31857/S0424738825010039

Publication type

Article

Status

Published

Authors

Yuri Korablev Send message

Affiliation: Financial University under the Government of the Russian Federation

V. A. Sudakov

Affiliation: Keldysh Institute of Applied Mathematics of Russian Academy of Sciences (KIAM RAS)

Edition

Volume 61 Issue number 1

Pages

25-37

Abstract

The main mistakes made by researchers when predicting events using models based on machine learning are discussed. Such errors are: loss of events themselves, due to the construction of abstract features; models are trained on customers rather than events from customers; construction of artificial features; incorrect validation and erroneous model quality metrics; and static parameters are used. An analysis of the mistakes made in one example from Kaggle is provided. The area under the ROC curve for this example is very high — 0.88, but this quality metric is calculated incorrectly. After correcting all errors, the correct metric turned out to be 0.599. A different approach to analyzing and predicting events is presented, which differs significantly from classical machine learning methods. The method is based on consideration of individual mechanisms of event formation for each client. Mechanism models are being built. Using mathematical methods, the parameters of the models of these event formation mechanisms are restored. Parameters are extrapolated to the future. The forecast of a future event is obtained as a result of the functioning of the mechanism model with established parameter values. The model quality metric, the area under the ROC curve, turned out to be 0.615, which is slightly higher than in the Kaggle example, based on machine learning. Thereby, it is shown that the proposed approach is competitive to advanced machine learning techniques.

Keywords

анализ событий прогноз событий машинное обучение ошибки моделей механизм образования событий восстановление параметров сплайновая коллокация сглаживающий сплайн монотонный сплайн качество прогноза валидация

Received

15.04.2025

Number of purchasers

Views

Readers community rating

0.0 (0 votes)

Cite Download pdf Download JATS

GOST	Korablev Y., Sudakov V. Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms // Economics and the Mathematical Methods. – 2025. – V. 61. – Issue number 1 C. 25-37 . URL: https://emmras.ru/s0424738825010039-1/?version_id=105897. DOI: 10.31857/S0424738825010039
MLA	Korablev, Yuri, Sudakov, V. A "Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms." Economics and the Mathematical Methods. 61.1 (2025).:25-37. DOI: 10.31857/S0424738825010039
APA	Korablev Y., Sudakov V. (2025). Common mistakes in using machine learning when forecasting events and a new approach based on models of the event formation mechanisms. Economics and the Mathematical Methods. vol. 61, no. 1, pp.25-37 DOI: 10.31857/S0424738825010039

References

1. Ехлаков Р. С., Судаков В. А. (2022). Прогнозирование стоимости котировок при помощи LSTM и GRU сетей // Препринты ИПМ им. М. В. Келдыша. № 17. 13 с. DOI: 10.20948/prepr-2022-17 [Ekhlakov R. S., Sudakov V. A. (2022). Forecasting the cost of quotes using LSTM & GRU networks. Preprints of IAM after M. V. Keldysh, 17. 13 p. (in Russian).]

2. Кораблев Ю. А. (2022). Об одном алгоритме восстановления функции по разным функционалам для прогнозирования редких событий в экономике // Финансы: теория и практика. № 3 (26). С. 196–225. DOI: 10.26794/2587-5671-2022-26-3-196-225 [Korablev Yu.A. (2022). An algorithm for restoring a function from different functionals for predicting rare events in the economy. Finance: Theory and Practice, 3 (26), 196–225 (in Russian).]

3. Кораблев Ю. А. (2023). Емкостный метод анализа и прогнозирования редких событий в экономике: монография. М.: РУСАЙНС. 296 с. ISBN: 978-5-466-04159 [Korablev Yu.A. (2023). Capacity method of analysis and forecasting of rare events in the economy. Moscow: RUSCIENS. 256 p. (in Russian).]

4. Craven P., Wahba G. (1978). Smoothing noisy data with spline functions — estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31 (4), 377–403. DOI: 10.1007/BF01404567

5. Friedman J. (1999). Greedy function approximation: A gradient boosting machine. Technical Report. Deptartment of Statistics. Stanford University.

6. Friedman J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 5 (29), 1189–1232. DOI: 10.1214/aos/1013203451

7. Golub G. H., Heath M., Wahba G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21 (2), 215–223. DOI: 10.1080/00401706.1979.10489751

8. Hansen P. C. (1992). Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review, 34 (4), 561–580. DOI: 10.1137/1034115

9. Hansen P. C. (2001). The L-curve and its use in the numerical treatment of inverse problems. In: P. Johnston (ed.). Computational inverse problems in electrocardiology. Advances in Computational Bioengineering. Southampton: WIT Press.

10. Korablev Yu.A. (2022). Restoration of function by integrals with cubic integral smoothing spline in R. ACM Transactions on Mathematical Software, 48 (2), 1–17. DOI: 10.1145/3519384 ISSN: 0098-3500

11. Nagesh S. C. (2022). Predict customers probable purchase. Kaggle. Available at: https://www.kaggle.com/code/nageshsingh/predict-customers-probable-purchase

12. Nelder J. A., Mead R. (1965). A simplex method for function minimization. The Computer Journal, 4 (7), 308–313. DOI: 10.1093/comjnl/7.4.308

13. Quinn B. G., Fernandes J. M. (1991). A fast efficient technique for the estimation of frequency. Biometrika, 3 (78), 489–497.

14. Quinn B. G., Hannan E. J. (2001). The estimation and tracking of frequency. Cambridge: Cambridge University Press. 278 p.

Comments

No posts found

Write a review

Translate

References

Comments

Via social network