

Advances in Financial Machine Learning [Lopez de Prado, Marcos] on desertcart.com. *FREE* shipping on qualifying offers. Advances in Financial Machine Learning Review: Practical, up-to-date, full of nuggets of useful info on building ML trading systems - Well-written, well-researched book that provides new insight in many areas; much better than your run-of-the-mill book that gives a cursory overview of topics like ML and finance. It's obvious from reading this book that the author knows what he's talking about. The book is for people who already know something about both ML and machine trading systems; it's not an introduction. A major theme of the book is the danger of backtest overfitting. Highly recommended! An overview of topics follows, focusing on things I found the most useful. Part 1: Data Analysis (chapters 1-5) Chapter 1 includes a list of reasons that financial ML projects usually fail. It's nice to have the warning at the outset of what traps to avoid. Chapter 2 discusses ways to represent market information; there are surprises here you wouldn't think of on your own, such as "Tick Imbalance Bars", which show up here only because the author has significant experience in HFT (cf. one of his earlier books on market microstructure.) Chapter 3 on Labelling addresses some issues I ran into myself working on dataset preparation. How to do appropriate labelling in computational finance isn't as obvious as in some other ML domains. Chapter 4 continues with appropriate weights for data samples, which help deal with data that violates the IID assumption. I knew the basics of ARIMA processes and the idea of time series stationarity before reading this book, but Chapter 5 introduced me to fractionally differentiated time series, which appears in earlier literature in the 1980s, but which somehow I had missed. This is important in dealing with time series with long memory. Part 2, Modeling (chapters 6-9) Part 2 has a different focus than some readers might expect, talking about modeling in general, but without mentioning specific ML models, such as linear regressions, neural nets, random forests, etc. The book states in its introduction that it is intended to be model-agnostic, and covers issues affecting modeling in general. Chapter 6, for instance, is on ensembles. Chapter 7 deals with cross-validation in financial time series, which is a crucial topic for any model evaluation, and chapter 8 talks about feature importance. Chapter 9 is on hyper-parameter tuning. Part 3: Backtesting (chapters 10-16). Chapter 10 covers bet sizing. Chapter 11-16 discuss backtesting in great detail, including the dangers involved, good statistics to compute, synthetic data use, more on cross-validation, and strategy risk and asset allocation/optimization. Part 4 (chapters 17-19) is called "Useful Financial Features" -- it's about feature engineering, and has a bunch of features that will be new to many readers (there are "entropy features" and "microstructure features", for instance; this isn't just the basic stuff related to returns and volatility. Part 5 (chapters 20-22) seemed the least useful to me, on high-performance computing, since I personally was already aware of this stuff and there is a lot more to say on the topic than these few chapters could cover. Review: Foundational Resource for SWEs, ML Researchers getting into Investing - If you're coming from a computer science and/or machine learning background, you will learn a lot about how to frame your algorithmic thinking in the domain of finance and will leave you hungry for more hardcore graph theory, parallelization, machine learning (beyond simple random forest ensembles and clustering), advanced algorithms, and gutty details of implementation, which are left for you to explore and enjoy. The purpose of this book is not to explain how to apply Deep Learning to make money, but rather to lay a solid foundation of how to invest in a scientifically rigorous fashion given the modern machine learning toolset and access to PBs of data. In many cases, rather than focussing on the specifics of any given model, Dr. Lopez de Prado focuses on generating and selecting useful features. The book, which is a hybrid of a textbook and a manual, explains using both formal mathematics and empirical evidence why many of the assumptions about Machine Learning applied to the financial world are wrong and follows through with rigorous and practical solutions. For example, one of the most common false assumptions addressed in the book is that of IID samples in financial time series data. Dr. Lopez de Prado manages to pull together ideas from a wide spectrum of academic disciplines including mathematics, econometrics, machine learning, computer science, information theory, and physics to build a strong scientific basis upon which to algorithmically invest. Despite the diversity of subject matter, the book progresses well, building on and reusing early themes and then exploring domain specific topics like market microstructure and quantum computing. Source code to implement many of the methods is provided as a practical toolkit to test out the claims presented. The thorough use of references is particularly helpful as it keeps the content fairly short and to the point. Speed reading not recommended. Using a programming analogy, the mathematical notation is more reminiscent of the explicit verbosity of C++ than that of python (which is used in the book and is meant to be concise). It's not much of a problem but be aware the information content is dense. Something that's mentioned but not explored is how to make use of “alternative datasets”. Given many of the advances in the wider realm of ML have been around data you don’t get from exchanges, it would be nice if some helpful pointers or references for dealing with alternative data were included. That said, it's not the end of the world given the wealth of resources online for analyzing text, image, and video data. Buy this book if you're an experienced programmer getting into Finance or a Financial Professional looking to strengthen your algorithmic understanding. It is densely packed with a wealth of practical methods and breaks down and offers alternatives to faulty investing science.




| Best Sellers Rank | #40,630 in Books ( See Top 100 in Books ) #6 in Machine Theory (Books) #10 in Business Investments #224 in Investing (Books) |
| Customer Reviews | 4.4 out of 5 stars 677 Reviews |
E**S
Practical, up-to-date, full of nuggets of useful info on building ML trading systems
Well-written, well-researched book that provides new insight in many areas; much better than your run-of-the-mill book that gives a cursory overview of topics like ML and finance. It's obvious from reading this book that the author knows what he's talking about. The book is for people who already know something about both ML and machine trading systems; it's not an introduction. A major theme of the book is the danger of backtest overfitting. Highly recommended! An overview of topics follows, focusing on things I found the most useful. Part 1: Data Analysis (chapters 1-5) Chapter 1 includes a list of reasons that financial ML projects usually fail. It's nice to have the warning at the outset of what traps to avoid. Chapter 2 discusses ways to represent market information; there are surprises here you wouldn't think of on your own, such as "Tick Imbalance Bars", which show up here only because the author has significant experience in HFT (cf. one of his earlier books on market microstructure.) Chapter 3 on Labelling addresses some issues I ran into myself working on dataset preparation. How to do appropriate labelling in computational finance isn't as obvious as in some other ML domains. Chapter 4 continues with appropriate weights for data samples, which help deal with data that violates the IID assumption. I knew the basics of ARIMA processes and the idea of time series stationarity before reading this book, but Chapter 5 introduced me to fractionally differentiated time series, which appears in earlier literature in the 1980s, but which somehow I had missed. This is important in dealing with time series with long memory. Part 2, Modeling (chapters 6-9) Part 2 has a different focus than some readers might expect, talking about modeling in general, but without mentioning specific ML models, such as linear regressions, neural nets, random forests, etc. The book states in its introduction that it is intended to be model-agnostic, and covers issues affecting modeling in general. Chapter 6, for instance, is on ensembles. Chapter 7 deals with cross-validation in financial time series, which is a crucial topic for any model evaluation, and chapter 8 talks about feature importance. Chapter 9 is on hyper-parameter tuning. Part 3: Backtesting (chapters 10-16). Chapter 10 covers bet sizing. Chapter 11-16 discuss backtesting in great detail, including the dangers involved, good statistics to compute, synthetic data use, more on cross-validation, and strategy risk and asset allocation/optimization. Part 4 (chapters 17-19) is called "Useful Financial Features" -- it's about feature engineering, and has a bunch of features that will be new to many readers (there are "entropy features" and "microstructure features", for instance; this isn't just the basic stuff related to returns and volatility. Part 5 (chapters 20-22) seemed the least useful to me, on high-performance computing, since I personally was already aware of this stuff and there is a lot more to say on the topic than these few chapters could cover.
A**Y
Foundational Resource for SWEs, ML Researchers getting into Investing
If you're coming from a computer science and/or machine learning background, you will learn a lot about how to frame your algorithmic thinking in the domain of finance and will leave you hungry for more hardcore graph theory, parallelization, machine learning (beyond simple random forest ensembles and clustering), advanced algorithms, and gutty details of implementation, which are left for you to explore and enjoy. The purpose of this book is not to explain how to apply Deep Learning to make money, but rather to lay a solid foundation of how to invest in a scientifically rigorous fashion given the modern machine learning toolset and access to PBs of data. In many cases, rather than focussing on the specifics of any given model, Dr. Lopez de Prado focuses on generating and selecting useful features. The book, which is a hybrid of a textbook and a manual, explains using both formal mathematics and empirical evidence why many of the assumptions about Machine Learning applied to the financial world are wrong and follows through with rigorous and practical solutions. For example, one of the most common false assumptions addressed in the book is that of IID samples in financial time series data. Dr. Lopez de Prado manages to pull together ideas from a wide spectrum of academic disciplines including mathematics, econometrics, machine learning, computer science, information theory, and physics to build a strong scientific basis upon which to algorithmically invest. Despite the diversity of subject matter, the book progresses well, building on and reusing early themes and then exploring domain specific topics like market microstructure and quantum computing. Source code to implement many of the methods is provided as a practical toolkit to test out the claims presented. The thorough use of references is particularly helpful as it keeps the content fairly short and to the point. Speed reading not recommended. Using a programming analogy, the mathematical notation is more reminiscent of the explicit verbosity of C++ than that of python (which is used in the book and is meant to be concise). It's not much of a problem but be aware the information content is dense. Something that's mentioned but not explored is how to make use of “alternative datasets”. Given many of the advances in the wider realm of ML have been around data you don’t get from exchanges, it would be nice if some helpful pointers or references for dealing with alternative data were included. That said, it's not the end of the world given the wealth of resources online for analyzing text, image, and video data. Buy this book if you're an experienced programmer getting into Finance or a Financial Professional looking to strengthen your algorithmic understanding. It is densely packed with a wealth of practical methods and breaks down and offers alternatives to faulty investing science.
I**N
A must read
A fantastic addition to finance literature with must read chapters on backtesting pitfalls, hierarchical risk parity, deflated Sharpe ratios, explosiveness tests, entropy(complexity) estimators, and microstructural features. You can and should read this book even with no knowledge of machine learning. You will find it more useful if you have worked through the first four chapters of Aurelian Geron’s Hands-On Machine Learning. No knowledge of fully connected neural networks, convolutional networks or recurrent networks is required. Dr de Prado works in the realm of systematic quantitative investing. I think of this as trading rather than investing, where you expect to make a large number of decisions, each with a probability of success somewhat over 50%. The comprehensive book Expected Returns by Antti Ilmanen explains how difficult it is to find systematic opportunities that last. I suspect this applies to all quantitative strategies, including ones informed by machine learning, although we all would love to understand exactly how the folks at Renaissance Technologies construct and update their models. I think of investing as making a small number of decisions based on extensive due diligence, where you expect a high probability of success, much greater than 50%. A private equity professional or an old fashioned stock picker like Warren Buffett makes at most a handful of investment decisions a year. If they see a business with legs, one that will sustain growth of 20%+ per year trading a P/E of under 20, that business is unlikely to see much multiple compression and so is likely to yield 15% annual returns over the next 5 years. Or they might anticipate similar returns from a mature business growing at 5% per year- with no margin erosion - and 10% free cash flow yields. These investors usually work best in teams due to the broad scope of their due diligence and also to achieve some diversification. I do believe Dr de Prado’s ideas can be used to help inform some of these decisions but clearly we are dealing in the world of small numbers, even at the largest private equity groups and hedge funds. Nevertheless, any young professional entering these fields would benefit by at least understanding how to work with data scientists who can help improve their decision making. I am inclined to agree with Dr de Prado that there are a lot of charlatans in the investing/trading world and it is not easy to figure out who they are. This, taxes and the wonders of compounding in a low signal to noise world are what makes low cost passive investing the right choice for most people trying to build a nest egg for retirement.
H**U
Good Ideas and Soild References
Great book with loads of insight. I highly recommend this book to any quants out there modeling and reviewing quant models. Abuse of mathematical notations is almost unbearable. He could have put some effort into writing better mathematical notations if he were to write a book.
P**N
the book that's on every quant's desk right now
TLDR: the book is awesome, it really is on another level, and you will be stuck in the past if you don't ingest this book. If you are not in the target audience I think you will find this book hard to digest. Also I have read some chapters twice and worked through the code samples, so I believe I offer a perspective that other readers may be lacking. Marcos has given a number of lectures titled “The 7 Reasons Most Machine Learning Funds Fail”, you can find the lecture slides online. The seven core ideas in that lecture are covered in chapters 2-8, with other chapters offering supporting details, or going further in depth. If you have limited time to process the book, I think you would be better served by taking a deep dive on chapters 2-8, rather than skimming the whole thing. The ideas in this book work, and you would be doing yourself a disservice by not reading this book. Some of the ideas range from the common sense (backtesting is not a research tool, feature importance is) to the heretical ("for decades most financial research has been based on over-differentiated (memory-less) series, leading to spurious forecasts and overfitting.") [That quote was in his 7 Reasons presentation from Quantcon 2018, not the book.] He offers compelling arguments and solutions backed by peer reviewed publications for all his points. The book would be a highly valuable reference even without the code snippets, but he provides functional code and even tools to make it work on large datasets. Once again this code is not for the faint of heart, his use of Pandas will leave even a seasoned financial developer to RTFM. There are some flaws which I can overlook. Strict software engineers will be irked at the code violating PEP8. It is hard to put code samples into a book so things like multiple statements per line can greatly compact the code and make it readable. In chapter 20 he uses threads and processes interchangeably although they are two distinct tools. Chapter 22 felt a little out of place but it seems compulsory for financial authors to include a "just for fun" final chapter. There was a quick discussion at the end of Chapter 14 on performance attribution, which felt rushed and I feel it would be hard for the non-financial portion of the target audience to follow. These are minor items. I found at least three errors in the code which I hear have been corrected in the second printing. It is arguable that the ideas in this book could be extended to any asset class. If I had to guess, I would say this was often applied to trading futures, although bonds, equities, and equity options are briefly mentioned.
P**.
My impression is that the text reads a bit like an academic survey of some existing ML methods applied ...
I have run through a quick pass of the entire text in one sitting, so I may possibly re-read more in depth and alter my review at some point in the future. My impression is that the text reads a bit like an academic survey of some existing ML methods applied to quantitative finance, a bit heavy on theoretical models and sourcing many fairly recent papers culled from various financial and machine learning literature, many referenced from the author himself. However, the author also points out that he has a lot of experience in the quantitative field and elaborates a bit on the overall systematic step by step process of development that a real team of quants might use. Don't expect an in depth description of specific implementations (like SVMs, Gradient Boosting, NNs,etc), but a more general approach to the various learner methods. The Good: I enjoyed getting his perspective on the overall flow and piece by piece breakdown on each of the steps involved in the process of developing a ML based algorithm, from data collection, partitioning, and scrubbing, all the way to the design and execution phase, including a lengthy description of some of the pitfalls and possible solutions to using various cross-validation methods, in order to gain better confidence in financial data and algorithms, that many already know suffer from characteristics like non-IID properties, data overlap, and time dependencies. On the more concrete side, he also presents many standalone python based functions to concretely implement many of the concepts that he describes. The bad: While it definitely reads like it is written from someone with a strong theoretical background, and much experience in the financial field. I also, felt that it fails in that it never really integrates all of the build up to a practical example of a systematic design implementation, that uses many of his ideas and demonstrates their validity. In other words, do not expect any top level concrete design or systematic design and back-test examples with real financial data and results at all. It is mainly bits and pieces of the pipeline that ultimately may go into a complete systematic development of a system, but no real evidence that any of it is of use, other than to take the author's word, or just accept the theoretical modelling. To clarify further, it's ok to point out the shortcomings of classical portfolio optimization, but show a clear example of an ML based portfolio optimization; how does it perform using various validation methods compared to classical? Using real, cleaned financial data. It would definitely be useful to see at least one complete implementation of a system that utilizes the methods described within. In addition, concepts like quantum computing are great and all, but when you've been at this development long enough, the more fancy and advanced the tools sound, they don't really bring all that much to the table, if you can't even develop a successful system or algorithm at a much simpler level (which is not easy). update(s): I'll just add that, after a closer reading, hasn't really changed my opinion much. However, if it helps anyone I found an excellent simulation of HRP, using real financial data on ilya kipnis great R based blog, QuantStratTradeR. This is the kind of empirical data, that would really add value to the text.
M**S
Fresh Thinking for a Field in Need of New, Useful Ideas
Having worked both as a quant at a hedge fund and in data science at a tech startup, the lack of penetration of modern data science and machine learning techniques into everyday quantitative finance has long surprised me. Cross-validation, feature importance, and random forests are commonplace outside of quantitative finance, yet rarely mentioned inside. This is not to say that there are no hurdles that prevent the adoption of these ideas – the non-stationary time series that are common in finance are challenging data sets. What has been needed is someone to address these challenges and bridge the gap between quantitative finance and data science/machine learning. Dr. Marcos López de Prado does a fantastic job of doing just that in Advances in Financial Machine Learning. He provides a detailed description how to turn machine learning and data science theory into practice. The chapters are clear, concise, and full of useful references. Dr. López de Prado goes so far as to include snippets of Python code illustrating exactly his methods. This is no small matter – many quants use the gap between the theory in their papers and the art of putting their theory into practice in order to create a veil of uncertainty that can obscure a lack of robustness or reproducibility in their results. In contrast, Dr. López de Prado seems to invite his readers to use his techniques directly via his code snippets. In addition to the most direct applications of data science and machine learning into quantitative finance, Dr. López de Prado goes in depth on more advanced topics such as creating more rigorous and predictive trading strategies, detecting regime shifts, and breathing new life into risk parity. For any quant looking to move beyond the well-tread techniques of OLS regression, independent and identically distributed random variables, and efficient frontiers into the methods of modern data science and machine learning, Advances in Financial Machine Learning is a must-read.
A**Y
One of the best approaches for a Quant to learn ML
There are many famous books that teach you Machine Learning, Econometrics, Quant Finance. While all these books are good, they often concentrate on the single subject instead of a combination of those. Many of us might be good at Quant Finance or only Machine Learning, but we often find it troubling to connect the bridge between the two worlds. Lopez de Prado's book is a synergy of both Quant Finance and ML in this way. The book teaches the right way to apply machine learning in finance and covers a lot of applications related to asset allocation, high-frequency finance, Quant Trading etc. The book is divided into 5 parts with each part building up to a more advanced material. Part one discusses different kinds of financial data and how to use them for analysis and training purposes. Part two discusses modeling covering important algorithms, their merits, and demerits when applied to financial data. Part three concentrates on the dangers of Backtesting in today's practice, and the right way to conduct backtesting, and interpret the backtest statistics as its really important that ML models don't suffer from data mining. My favorite one in this part was the Machine Learning Asset Allocation chapter which solves the minimum variance problem compared to Markowitz's or Risk Parity methods. Part four discusses various advanced applications related to Quant Trading, Portfolio construction, Market Microstructure. Part five is not for the faint-hearted as it teaches various advanced computer science techniques to speed up the performance ML algorithms in practice. Overall, the book lets you change the way that one traditionally thinks about financial data and ML applications on it. I'd recommend this book to any aspiring quant working/wants to work in ML quant research. To the people who already purchased, The best way to use the book is to try out the algorithms yourself (All the Python code is on Lopez de Prado's website) and study its applications and results.
TrustPilot
2 周前
3 周前