Loading data…

Attendance Analysis

What Actually Fills Seats in American Soccer?
League
Season
Day
Venue
Schedule
Toggle pills to filter every chart below. Click a league or season to show/hide it. Click Reset to restore all filters.

What Actually Drives Attendance? — Controlled Model Results

Ranked by combined importance across Random Forest, Gradient Boosting, and Mixed-Effects Regression (trained on 4,186 matches). Green = helps fill seats. Red = hurts attendance. Longer bar = more important. Unlike raw correlations, these isolate each factor’s independent effect after controlling for all 31 other variables. Dimmed rows are not statistically significant (p ≥ 0.05).

Binary Factor Effects

For each yes/no condition, how full is the stadium when it’s present vs absent? Green “% higher” = that condition is associated with more seats filled. Red “% lower” = associated with fewer seats filled. Averages are league-weighted to prevent misleading results from mixing different tiers.

Utilization Distribution

How many matches fell into each “% of seats filled” bucket. Taller bars = more matches at that fill level. Green bars are well-attended matches, red bars are poorly attended.

Utilization by Season

Are stadiums getting fuller or emptier over time? Higher line = more seats filled that season. Each league is shown separately.

By Day of Week

Which days of the week draw the best crowds? Taller bar = higher % of seats filled on that day.

Coefficient Significance — Mixed-Effects Regression (|z-value|)

Each bar shows how statistically confident the model is that this factor matters. Longer bar = stronger evidence. Green = helps fill seats, red = hurts attendance. Sorted by strength of evidence. Dashed line marks p<0.05 significance threshold.

Feature Importance — RF vs GB vs MixedLM

Three different models agree on which factors matter most. Longer bar = more important to that model. Where models disagree, the truth is nuanced.

Model Comparison — 2024 Holdout Performance

Trained on 2019–2023 (4,716 matches), tested on 2024 (1,070 matches). Higher R² = better predictions. Lower RMSE/MAE = smaller errors.

Robustness Check — COVID 2020 Sensitivity

How stable are the coefficients? The main model excludes the COVID-affected 2020 season. Re-fitting with 2020 included ( additional matches) reveals which effects are robust and which are inflated by government-mandated capacity restrictions. Large bars = unstable coefficients — interpret with caution. Rivalry and MLS league dummy change dramatically, suggesting COVID distorted cross-league comparisons.

Factor Combination Matrix

What happens when two factors combine? Pick a row factor and a column factor to build a grid. Each cell shows the average % of seats filled for that specific combination. Greener cells = fuller stadiums. The “n=” shows how many matches had that combination — higher n = more reliable.

Rows Columns Show

Best Conditions

The factor combinations most associated with full stadiums. Higher % = more seats filled under those conditions. Only combinations with 3+ matches are shown.

Worst Conditions

The factor combinations most associated with empty seats. Lower % = fewer seats filled under those conditions.

Temperature & Attendance

Every dot is a match. Does temperature affect how full the stadium is? Dots higher up = more seats filled. Look for a “sweet spot” temperature range.

Temperature Ranges

Average % of seats filled at each temperature range. Taller bar = more seats filled in that range. Gray bars show sample size.

Rain Impact

Does rain keep fans home? Taller bar = more seats filled.

Wind Speed

Does wind affect turnout? Taller bar = more seats filled.

Cloud Cover

Does overcast sky matter? Taller bar = more seats filled.

Weather Conditions

Average attendance by weather type at kickoff. Taller bar = more seats filled.

Forecast vs Actual

🌦️
Limited Forecast Data
Only USL League One had probability-of-precipitation (PoP) forecast data available. The model used actual weather at kickoff instead of forecasts. Research suggests the forecast matters more than actual conditions — this remains a limitation. Warm weather (80–90°F) was the strongest temperature predictor (p=0.014, −1.8pp vs comfortable baseline). Cold, cool, and hot temperatures were not statistically significant.

Recent Form & Attendance

Do winning teams draw bigger crowds? Points from last 5 matches vs seats filled. Dots higher up = more seats filled. Trend going right-and-up = winning helps.

League Position & Attendance

Does standing in the table affect turnout? Position 1 = top of the league. If dots trend upward to the left, higher-placed teams fill more seats.

Win/Loss Streak

Does a hot streak or cold streak change attendance? Taller bar = more seats filled during that streak length.

Goals Scored (Last 5)

Do exciting, high-scoring teams attract more fans? Dots higher up = more seats filled.

Expected Goals (xG)

Does quality of chances created affect attendance? Dots higher up = more seats filled.

Goals Per Game & Attendance

Do higher-scoring teams fill more seats? The model found goals_per_game significant (p=0.027, +2.5pp per goal). Dots higher up = more seats filled.

New Manager Effect

The “new manager bounce” — the model found a small negative effect (p=0.032, coef=−1.4pp). Coaching changes are associated with lower attendance, likely because they occur during poor runs of form.

Interactive Scatter

Every dot is one match. Pick any factor for the horizontal axis to see if it affects attendance. Dots higher up = more seats filled. Look for upward or downward trends — a clear slope means the factor matters. A flat cloud of dots means it doesn’t.

X-Axis Y-Axis Color

League Comparison

How do the leagues compare on key metrics? Longer bar = higher value for that league.

Utilization Distribution by League

How consistent is attendance in each league? A tall, narrow cluster = consistent fill rates. A wide spread = some matches packed, others empty.

Season Trends by League

Is attendance growing or shrinking year over year? Line going up = stadiums getting fuller. Line going down = attendance declining.

Day of Week by League

Which days work best for each league? Taller bar = more seats filled on that day. Compare how sensitive each league is to midweek scheduling.

Attendance Distribution by League

How many people actually show up? Taller bar = more matches at that attendance level. Shows the raw scale difference between leagues.

Capacity Utilization vs Attendance

Raw attendance vs % of seats filled. Dots in the upper-right = big crowds AND a full stadium. Upper-left = small venue, but packed. Lower-right = large crowd but still lots of empty seats.

All Clubs

Club League Venue Matches Capacity Avg Att. Avg Util. Trend

Capacity vs Utilization

Does venue size affect how full it gets? Each dot is a club. Dots higher up = a larger share of seats filled. Dots further right = bigger venue. Upper-left = small and packed. Lower-right = big and empty.

Club Utilization Ranking

Every club ranked by how full their stadium is on average. Longer bar = higher % of seats filled. This measures demand relative to venue size, not raw headcount. Color indicates league.

Club Random Effects — All 94 Clubs Ranked

After controlling for all 30 predictors (weather, schedule, market, etc.), how much does each club over- or under-perform? Green = fills more seats than predicted. Red = fills fewer. Gold = Greenville Triumph SC. This captures intangible “club culture” effects the model can’t explain — the ICC of means these intercepts dominate all other factors.

Match Explorer

Browse every match in the dataset. Click any column header to sort. Type in the search box to filter by club, venue, or date. All filters from the top bar apply here too.

Date League Home Away Score Attendance Capacity Utilization Temp Day

Highest Utilization Matches

The 10 matches that came closest to (or exceeded) a sellout. Higher % = more of the stadium was full. What do these matches have in common?

Lowest Utilization Matches

The 10 matches with the emptiest stadiums relative to capacity. Lower % = more empty seats. What conditions led to these?

By Month

Which months draw the best crowds? Taller bar = more seats filled.

By Day of Week

Which days fill the most seats? Taller bar = more seats filled.

By Kickoff Hour

What start time fills the most seats? Taller bar = more seats filled.

Midweek vs Weekend

How much does a midweek match hurt attendance? Taller bar = more seats filled.

Season Fatigue

Do fans lose interest as the season goes on? Match number in season vs seats filled. Downward trend = fans fading late in the year.

Home Opener Bump

Do home openers draw bigger crowds than regular matches? Taller bar = more seats filled.

Season Finale

Do season finales draw bigger crowds? Taller bar = more seats filled.

Soccer-Specific Stadium

SSS vs shared/multi-use venues. Taller bar = more seats filled.

Metro Population

Do bigger cities fill more seats? Each dot is a club. Dots higher up = more seats filled. Dots further right = bigger metro area.

Median Household Income

Do wealthier markets fill more seats? Dots higher up = more seats filled. Further right = higher local income.

Hispanic Population %

Soccer research consistently finds Hispanic population % predicts attendance. Dots higher up = more seats filled.

Pro Teams in Market

Does competition from other pro sports hurt attendance? Taller bar = more seats filled.

College Town

Do college towns boost or hurt soccer attendance? Taller bar = more seats filled.

Venue Capacity

Do bigger or smaller venues fill a higher % of seats? Dots higher up = more seats filled. Right = bigger venue.

Venue Age

Do newer venues attract more fans? Dots higher up = more seats filled. Right = older venue.

Indoor vs Outdoor

Indoor venues are immune to weather. Does that help? Taller bar = more seats filled.

Team Investment — MLS Payroll vs Attendance

The single most important predictor across all three models. Each $1M in payroll adds ~+0.7pp utilization (p≈0). Only MLS has payroll data — other leagues show $0M. Dot higher up = more seats filled. Note: including COVID-affected 2020 data inflates this coefficient by ~3×.

Promotions

Do promotional nights (giveaways, fireworks, theme nights) boost attendance? Taller bar = more seats filled.

Rivalry Matches

Do rivalry matches draw bigger crowds than regular games? Taller bar = more seats filled.

Holiday Weekends

Do holiday weekends (Memorial Day, July 4th, Labor Day) boost attendance? Taller bar = more seats filled.

Opponent Quality

Do fans come out more for top-of-the-table opponents? Position 1 = best team. If dots trend upward left, better opponents = more seats filled.

Opponent Travel Distance

Do nearby opponents (who bring more away fans) boost attendance? If dots trend downward right, closer opponents draw better.

Playoff Contention

Does being in the playoff race boost late-season attendance? Taller bar = more seats filled.

Expansion Year

Do second-year expansion teams see a “novelty bump”? Taller bar = more seats filled.

Google Trends

Does online search interest predict in-stadium attendance? Dots higher up = more seats filled. Right = more search interest.

Promotion Type Breakdown

Which types of promotions work best? Research shows fireworks = +26%, giveaways = +7%. Taller bar = more seats filled for that type.

Variable Coverage

Season x Club Matrix Avg Utilization

Research Question

What actually fills seats at American men’s professional soccer matches? This study uses multivariate analysis to isolate the individual contribution of predictors across categories — from weather and scheduling to team performance and promotional activity — on match-by-match capacity utilization.

Why Capacity Utilization?

Raw attendance conflates venue size with demand. A 3,000-person crowd means something very different in a 3,200-seat SSS versus a 20,000-seat MLS venue. Capacity utilization (attendance / capacity) normalizes across venues and isolates actual demand signals.

Statistical Approach

  • Primary: Hierarchical mixed-effects linear regression (statsmodels.MixedLM) with club as random intercept — R² conditional = 0.73, ICC = 0.64, 19 significant predictors (p<0.05)
  • Secondary: Random Forest + Gradient Boosting for non-linear feature importance comparison
  • Validation: Train on 2019–2023 (4,716 matches), test on 2024 (1,070 matches) — holdout R² = 0.49
  • Key finding: 64% of attendance variance is between clubs (ICC), not between matches — club identity dominates
  • Explore: Full model results on the Drivers tab. Run what-if scenarios on the Predict tab.

Data Sources

  • Match results: Flashscore, American Soccer Analysis
  • Attendance: Transfermarkt, USL Match Center, Wikipedia, worldfootball.net
  • Weather: Open-Meteo Historical API (100% coverage)
  • Demographics: U.S. Census ACS 5-Year Estimates
  • xG: FotMob / American Soccer Analysis
  • Trends: Google Trends (pytrends)
  • Venues: Manual research + verification

Key Findings from Literature

  • Temperature effects are nonlinear — thresholds at 90°F and 20°F windchill
  • Rain forecast matters more than actual rain
  • Expansion team novelty is short-lived — Year 2 only
  • SSS effect is confirmed but less dramatic than MLS studies suggested
  • Promotions have diminishing returns but net benefit always positive

Variable Taxonomy

Significant (p<0.05)   Not significant   Hover for details.

League Coverage

What-If Scenario Builder
Pick a club and adjust match conditions to predict capacity utilization. The model uses each club's historical baseline plus the effects of scheduling, weather, performance, and promotions. The waterfall on the right shows how each factor adds to or subtracts from the prediction.

Club

Schedule

Matchday

Weather

Performance

Market

--
Predicted Capacity Utilization

Factor Contributions — Waterfall

How each input adds to or subtracts from the baseline prediction. Green = positive contribution, red = negative.

What Actually Fills Seats? A -Match Study of American Soccer Attendance

The Question

What drives attendance at American men’s professional soccer matches? Decades of sports economics research have tackled this question for baseball, football, and basketball, but American soccer — particularly below MLS — remains understudied. A scoping review of 235 attendance studies found almost none covering lower-division American soccer. This study fills that gap.

We analyzed matches with attendance data across four leagues (MLS, USL Championship, USL League One, MLS NEXT Pro) over six seasons (2019–2024), testing 31 potential attendance drivers using a mixed-effects linear regression model with club random intercepts.

Why Capacity Utilization?

Raw attendance numbers are misleading. A crowd of 3,000 looks very different in a 3,200-seat soccer-specific stadium versus a 20,000-seat shared venue. Capacity utilization — attendance divided by capacity — normalizes across venues and isolates actual demand signals from venue size effects. This is our dependent variable throughout.

The Model

We used a hierarchical mixed-effects linear regression (statsmodels MixedLM) with club as a random intercept. This lets each club have its own baseline attendance level while estimating shared effects of weather, scheduling, promotions, and performance across all clubs.

The model was trained on data ( training matches) and validated on a held-out 2024 season ( holdout matches). Of total matches with attendance data, 523 from the COVID-affected 2020 season and 7 from clubs with fewer than 5 home matches were excluded from training. Two machine learning models (Random Forest and Gradient Boosting) served as secondary validation for feature importance rankings.

Conditional R² = 0.81 — the model explains 81% of the variance in capacity utilization when club identity is included. Holdout R² = 0.70 on unseen 2024 data.

Key Findings

Finding 1
Club identity is everything

The intraclass correlation coefficient (ICC) is 0.74, meaning 74% of attendance variance is between clubs, not between matches. Once you know which club is playing, the most important question is already answered. Weather, scheduling, promotions — all secondary.

This is the single most important finding. It suggests that long-term brand building, community engagement, and fan culture development are far more valuable than any match-day tactical lever.

Finding 2
Payroll buys fans

Payroll is the #1 measurable predictor among MLS clubs (payroll data is only available for MLS). Each additional $1M in guaranteed compensation adds ~+0.7 percentage points of utilization (p ≈ 0). This held as the top feature in Random Forest and Gradient Boosting importance rankings as well. For USL and MLS NEXT Pro, payroll data is unavailable, so this effect cannot be measured at lower tiers.

This effect is sensitive to the inclusion of COVID-affected 2020 data: adding 2020 inflates the payroll coefficient by roughly 3×, because government-mandated capacity restrictions created an artificial correlation — MLS clubs (which have payroll data) had higher relative utilization under caps. The estimate reported here excludes 2020.

Implication: investment in the playing squad is not just a competitive strategy — it is a revenue strategy. Better players attract more fans.

Finding 3
Season timing matters more than weather

Match number in the season (z = ) is the single strongest fixed-effect predictor. The midweek penalty (z = ) is nearly as large. Meanwhile, temperature (z = -2.5) and wind (z = -2.4) are far weaker. Fans decide based on the calendar, not the forecast. Early-season matches see lower utilization regardless of weather; late-season matches (especially with playoff implications) draw better.

Warm weather above 80°F has a slight negative effect (, p = 0.014), meaning hotter games see marginally lower turnout vs. the comfortable 65–80°F baseline. Cold and cool temperatures were not statistically significant.

Finding 4
The new manager bounce is slightly negative

Despite literature suggesting a 5-match attendance bump after a coaching change, our data shows the opposite: a small but significant negative effect (coef = , p = ). Matches under a newly appointed manager see about lower utilization. This may reflect the turmoil surrounding a coaching change rather than fan apathy — clubs change managers during poor runs, and it takes time for results (and crowds) to recover.

Finding 5
Home openers are real

The home opener effect is (p = ) — a meaningful match-specific effect. Opening day is the one time fans who might not otherwise attend will show up. Clubs should treat this as a premier marketing event. Season finales also see a smaller but significant bump of (p = ).

Finding 6
League position drives attendance

Higher league position (lower number = better) significantly boosts attendance (coef = , z = , p < 0.001). Each position higher in the table adds about of capacity utilization. A club sitting 1st vs 20th could see roughly difference — meaningful for a front office trying to monetize a playoff push. This is the 5th strongest predictor in the model.

Finding 7
Promotions work but are rare

Promotional nights add (p = ), but only 204 of the 5,465 matches in our dataset (3.7%) had identifiable promotions. This is a massively underexplored lever — the effect is strong, but clubs are not using it frequently enough for us to study diminishing returns.

Finding 8
The model works across three leagues

Holdout R² by league: USL League One = , MLS = , USL Championship = , MLS NEXT Pro = . The model generalizes well across MLS, USL Championship, and USL League One — all above 0.60 — capturing the mix of community-owned independents, MLS affiliates, and major-market clubs in each tier. Only MLS NEXT Pro resists prediction (see Finding 9).

Finding 9
MLS NEXT Pro is unpredictable

The model performs worse than guessing the mean for MLS NEXT Pro (R² = ), though this is based on only 26 holdout matches. Development leagues follow fundamentally different logic — parent club affiliation, roster decisions, venue sharing, and uncertain fan bases create too much noise for our 31 predictors to capture.

What Doesn’t Matter

Several factors that might seem important turned out to be statistically insignificant:

  • Precipitation (p = ) — rain does not significantly affect turnout once other factors are controlled
  • Hot weather (p = ) — no significant effect, possibly because most hot matches are evening kickoffs
  • Opponent distance — travel distance for the visiting team (a proxy for away support) is marginally significant but highly sensitive to COVID data: including 2020 inflates its effect by ~4×. Without 2020, the coefficient is near the significance threshold.

The ICC Story

The ICC of 0.74 deserves deeper attention. It means that if you randomly picked two matches from the same club, 74% of the difference in their combined attendance variance compared to any two random matches in the dataset is explained simply by them being the same club.

Put differently: the ceiling on what any scheduling, weather, or promotional strategy can accomplish is roughly 26% of total variance. The other 74% is baked into who the club is — its history, its fanbase, its community roots, its stadium atmosphere.

For front office professionals, this means the highest-ROI investment is not in any single matchday tactic, but in the slow, compounding work of building a club culture that fills seats regardless of conditions.

Limitations

  • Forecast weather gap: Our model uses actual weather, but fans decide based on the forecast. This likely underestimates the true weather effect.
  • Promotions data sparsity: Only 3.7% of matches (204 of 5,465) had identifiable promotions. Many clubs run promotions that are not well-documented in public sources.
  • Announced vs. actual attendance: Some leagues report “tickets distributed” rather than turnstile counts. This adds noise to the dependent variable.
  • COVID-affected 2020: The 2020 season is excluded from the main model. A robustness check re-fitting with 2020 showed large coefficient inflation: payroll ~3× higher, home opener ~2×, and two variables (opponent distance, day of week) shifted from marginal to highly significant. No coefficients changed sign, but magnitudes are unstable, confirming that 2020’s government-mandated capacity restrictions distort the signal.
  • Payroll only for MLS: Salary data is publicly available only for MLS, so the payroll effect cannot be estimated for lower divisions.

Explore the Data

The full interactive dashboard is live at this URL. Use the Predict tab to run your own what-if scenarios using the model, or explore the other tabs to dig into the raw data across all four leagues and six seasons.

Dataset: matches, 68 columns, 4 leagues. Model: statsmodels MixedLM with club random intercepts ( training, holdout). Validation: temporal holdout (2024 season). Secondary: Random Forest (500 trees), Gradient Boosting (500 trees).

Club Identity by Sport — ICC

Intraclass Correlation Coefficient: what percentage of attendance variance is explained by which team is playing vs. game-day factors. Higher = brand matters more.

Universal Predictor Rankings — Avg |z-score|

Average absolute z-score across all 5 sports. Higher = more consistently important. Badges show which sports each factor is significant in (p<0.05).

Coefficient Comparison — Same Variable Across Sports

How the same variable's effect changes across sports. Green = positive effect, red = negative. Dimmed = not statistically significant. Blank = not available for that sport.

Midweek Penalty — by Sport

How much does a weekday match hurt attendance? Bigger bar = bigger penalty. The midweek effect is the single most consistent predictor across all sports.

Rivalry Premium — by Sport

How much does a rivalry/division match boost attendance? Bigger bar = bigger boost. Soccer fans respond the most to local derbies.

Home Opener Effect — by Sport

How much does opening day boost attendance? Bigger bar = bigger effect. MLB has the biggest home opener premium by far.

New Coach Bounce — Myth or Reality?

Does hiring a new coach boost attendance? Only soccer shows a significant effect (p<0.05). In all other sports, the "new manager bounce" is a myth.

Key Findings

Attendance by Season

Every Greenville home match plotted chronologically. Higher dots = bigger crowds. The line connects season averages.

Utilization by Season

What % of Paladin Stadium was filled each season? Higher = more of the 16,000 seats occupied.

Every Home Match — Full Timeline

Each bar is one home match. Taller bar = higher attendance. Color = result (green win, yellow draw, red loss). Hover for opponent and details.

Attendance by Opponent

Which opponents draw the biggest crowds to Paladin Stadium? Longer bar = higher average attendance. Label shows match count.

Opponent Distance vs Attendance

Does the visiting team's travel distance affect Greenville's gate? Each dot is a match. Dots higher up = more fans.

By Day of Week

Which days draw the best crowds?

By Month

Monthly attendance patterns across all seasons.

By Kickoff Time

Evening vs afternoon vs matinee.

Temperature vs Attendance

Greenville is an outdoor venue — how sensitive are fans to heat and cold? Each dot is a match.

Rain vs Dry

Does rain keep Greenville fans away?

Temperature Buckets

Average attendance in each temperature range.

Wind Speed

Average attendance by wind conditions.

Weather Category

Clear vs cloudy vs rainy vs stormy.

Form (Last 5) vs Attendance

Do winning streaks fill Paladin Stadium? X-axis: points from last 5 matches.

League Position vs Attendance

Does table position affect turnout? Lower number = higher in standings.

Win / Draw / Loss

Average attendance grouped by match result.

Goals Scored

Attendance by number of Greenville goals.

xG Performance

Home xG vs attendance scatter.

Special Match Effects

Home openers, rivalries, holiday weekends, promotions, and playoff contention. Green = present, gray = absent.

Season Progression

Match number in season vs attendance. Does Greenville fade late or build momentum?

Google Trends Interest vs Attendance

Does online buzz predict butts in seats? Higher trend interest should predict bigger crowds.

Greenville vs USL League One Average

How does Greenville compare to the league average each season? Green = Greenville, gray dashed = USL L1 average.

Attendance Distribution

Histogram of all Greenville home attendance figures. Where do most matches cluster?

Midweek vs Weekend

The midweek penalty is one of the strongest effects in the model. How bad is it for Greenville?

Utilization Spread by Season

Min, max, and average utilization each season. Smaller range = more consistent crowds.

All Greenville Home Matches

Every home match in the dataset. Click headers to sort.

Date Season Opponent Score Attendance Utilization Temp Day Form xG