Visualizing the Learning Logic #8: LightGBM vs Extra Trees (1/2)
A model that builds many trees and returns the average
Disclaimer: As a freelance analyst, my posts exclusively cover the AI methodologies of Portfolio123 (P123), without addressing individual stock recommendations.
This Post is included in Section 4: How to Effectively Choose Algorithms and Tune Hyperparameters.
The table of contents is as follows:
AI-Driven Quant Investment Strategies
Introduction
In this Substack “AI-Driven Quant Investment Strategies”, this series of posts is planned to be structured as follows, and this post is the second one.
Visualizing the Learning Logic — ExtraTrees & LightGBM
Most ML comparisons focus on “which is more accurate” or “how results differ.”
This series instead focuses on the learning logic itself: what the algorithm is doing under the hood.
Planned total: ~11 posts (subject to change)
LightGBM learning logic: 4 posts
ExtraTrees learning logic: 3 posts
Comparing LightGBM vs. ExtraTrees learning logic: 2 posts (← Now we are here)
Validation and training: 2 posts
Assumptions for this series
Target: 3MRel (3-month relative return)
Universe: S&P 500
Features and hyperparameters: We won’t deep-dive here (planned for later series)
Comparing the Learning Logic of LightGBM and ExtraTrees (1/2)
Up to this point, we have visualized the learning logic of LightGBM and Extra Trees separately. Starting with this post, we will summarize their learning logic in a comparative format.
Introduction: Why Extra Trees often performs better than LightGBM when features are few and the data is noisy
In one sentence, the reason Extra Trees tends to outperform LightGBM in regimes with few features and/or high noise is this:
Extra Trees has an advantage on the “average away the wobble” side in situations where LightGBM’s “ability to aggressively optimize and hit the target (sharp optimization)” is more likely to backfire.
Let’s unpack this in an investment-strategy context.
(1) When there are few features, LightGBM can “over-dig the same small set of features”
In a world with few features, the model has limited “tools” available.
LightGBM (boosting) tends to reuse features that seem effective and create finer and finer conditional splits in order to reduce residuals.
As a result, it may start assigning meaning to tiny fluctuations in those few features (thresholds that happened to work), making it easier to over-optimize to the past.
Stocks are especially noisy, so the more you try to precisely “hit” using a small number of features, the more likely you are to pick up accidental boundaries—meaning the model is more likely to collapse in the future.
On the other hand, Extra Trees:
randomizes both features and thresholds,
builds many trees and averages them,
so even with only a few features, it tends to rely less excessively on any single tree (i.e., it tends to be less biased toward one fragile structure).
What the Paid Section Deeply Explores:
(Free Section)
(1) When there are few features, LightGBM can “over-dig the same small set of features”
(Paid Section)
(2) When noise is large, LightGBM is more likely to “chase noise as residual”
(3) The “power of averaging” works: variance goes down (= predictions become more stable)
(4) With few features × high noise, differences show up more clearly in ranking-based operation
(5) Summary
Pricing Plans
To continue learning highly specific and practical AI strategy construction methods, please consider a premium subscription.
Monthly Plan: $8 / Month, Flexible starter option
Annual Plan: $80 / Year, 2 months free (approximately $6.67/month)
Subscribe now and advance to the next level of AI Quant Strategy!
The following section is available to paid subscribers only.

