Why Shuttle Pilots Fail in Week 6

A 1,400-employee suburban site runs an employee shuttle pilot. Week 1 ridership lands at 38% of forecast, week 2 at 41%, week 3 at 39%. Manager review at week 4: “trending well, watch for ramp.” Then week 5 dips. Week 6 craters. The committee declares the pilot a learning experience and goes back to parking expansion. That collapse is so common it should have a name. It is also, in most cases, a misdiagnosis. Sign-up enthusiasm in weeks 1-2 is driven by novelty, manager push, and a launch comms cadence — three variables that decay inside 10-14 days. Retention runs on something else: route fit, on-time consistency, the feedback loop. Those take far longer to lock in. Lally et al. put median habit formation at 66 days. Six weeks is 42. The pilot is being killed before any new commute habit has reached automaticity.

What week-1 sign-ups actually measure

A high week-1 number measures launch-comms reach, not whether the shuttle program will work at steady state.

Three forces drive the early curve, and none of them are operational. Novelty is the first: a free ride to work that the employee has not tried before is interesting, and human beings sample interesting things. Manager push is the second: when an ops VP sends an all-hands email and a director adds a follow-up in the team meeting, sign-ups bump for about two weeks before the signal fades into general inbox noise. Comms cadence is the third: posters in the lobby, mentions in the weekly newsletter, a short video on the intranet. All of that runs at high volume in the first two weeks, then ramps down because nobody plans a six-month internal-comms campaign for a shuttle pilot.

A 2024 study on habit decay in daily-life behaviors found that habit decay stabilizes in under two weeks on average, with a range of 1 to 65 days. The sign-up enthusiasm curve is, in effect, a habit-decay curve from the prior baseline of “I drive in.” Once the launch noise stops, that curve flattens fast. If the new commute habit has not started forming on the same timeline, ridership drops.

The mismatch is the entire problem.

Why week 6 is the visible inflection — not the cause

Habit formation does not run on a calendar that flatters six-week pilots.

Phillippa Lally’s UCL team ran the canonical study in 2009: 96 volunteers, a single new daily behavior of their choosing, twelve weeks of self-report data. The median time to reach 95% of the automaticity asymptote came in at 66 days, with a range of 18 to 254 days. The curve is asymptotic: early repetitions yield bigger automaticity gains than later ones, and the gains keep arriving for months. A reader at day 42 is sitting on a partially formed habit, not a finished one.

Di Maio and colleagues took that frame to commute-mode shifts directly. Their 2025 paper in Applied Psychology: Health and Well-Being tracked 42 participants through a 14-week active-commuting habit-substitution program. New commute-habit automaticity rose sharply in early weeks then slowed (linear b = 2.14, p < .001; quadratic b = -1.01, p = .012). Old commute-habit automaticity decayed linearly across the post-intervention window (b = -0.59, p = .013). The two curves crossed late, not early. Once participants did reach the substituted regime, weekly adherence ran at 86%.

That asymmetry is the heart of the week-6 problem. Old habits decay in weeks. New habits build in months. The intervening period, which feels operationally like collapse, is when the rider has stopped doing the comfortable thing but has not yet automated the new thing. They are deciding, every morning, whether to ride or drive. Tuesday is a yes. Wednesday is a yes. Friday they have a 7am call and they drive. The ops dashboard sees that as a no-show. The rider experiences it as a temporary exception. The two readings disagree, and at week 6 the operator is reading the dashboard.

Three signals that look like collapse — and are not

Three operational metrics dip around week 5-7 even in pilots that ultimately succeed. The week-6 reviewer who reads those dips as collapse is the most expensive reader the program has.

Start with no-show drift. A booked rider who does not show is the most legible failure signal a shuttle dispatcher has, and it is also the most context-poor. In weeks 1-3, riders are diligent because the program is new. By week 5 they have figured out that the consequences of a missed booking are minor, and they cancel less often than they should. The no-show rate climbs. That is not a retention problem. That is a booking-discipline problem, and it is fixed with a soft-friction nudge — a one-line WhatsApp confirmation, a short cancellation window — not with route surgery.

Then there is route-edit churn. Around week 4, riders start asking for stop changes: “can we add a stop at the train station,” “can we shift the 7:25 to 7:35.” A program with a healthy feedback loop welcomes that as evidence the riders are committed enough to negotiate. A program reading those edits as instability tightens up, freezes routes, and tells riders to take it or leave it. Two weeks later the same riders take it. Then they leave it.

NPS softening completes the trio. The first survey, run at week 2, captures the novelty premium and reads in the +40s. The second survey, at week 6, drops into the +10s. That is not collapse — it is the survey now sampling habit-stage riders rather than honeymoon-stage riders, and any benchmark from week 2 will look terrible by comparison. A program that anchors its read on week-2 NPS is going to keep being disappointed.

Two patterns that do predict a dead pilot

Some week-6 collapses are real, and two operational patterns flag them. Both sit upstream of the rider data, meaning the dashboard is reading a downstream symptom of an avoidable design choice.

Over-fit pilot routes lead the list. The trap is to take week-1 sign-up addresses, draw the perfect line through them, and run that line for six weeks. Sign-up data captures who responded to the launch comms; it does not capture who would ride at steady state. By week 4, the people who signed up because the route happened to pass their house are riding. The people whose actual commute is two stops off the line are not, and they were not in the sign-up list because they assumed the program “wasn’t for them.” A route locked to week-1 geometry is structurally undersized against the eventual demand. The fix is to design the pilot route as a hypothesis, not a destination — flexible enough to absorb three to five stop changes in the first eight weeks without renegotiating the contract.

Under-staffed driver pools sit alongside it. Pilots typically run with one driver per route plus a single floater across the program. The first time a driver calls in sick on a Tuesday, dispatch substitutes the floater. The second time, the floater is already covering. The third time, a route runs late or skips. Riders who experienced two on-time weeks and then one bad week do not give the program four more weeks to recover; they go back to the car. Charter coach rates run $135-$285 per vehicle hour in 2024 averages, and a six-route pilot has only modest financial room for redundant capacity, but the pilot that does not budget for at least 1.5 drivers per route is buying ridership risk it cannot price.

What killing a 6-week pilot actually costs

The Microsoft Connector, the most documented private commuter-shuttle program in North America, ran 13 buses at launch in 2007 and reached 22 routes and roughly 80 buses by 2016. Microsoft’s own five-year-mark report disclosed that 60% of riders previously drove alone and that the program eliminated 40.5 million miles of driving across that period. Stanford Research Park, working with operator WeDriveU, watched its drive-alone share fall from 73% in 2016 to 63% in 2019, with vanpool and transit each rising from 8% to 13% of the mode mix over three years. By 2025, 38 vanpools were operating across the park.

None of those numbers showed up at week 6. They showed up at year three.

A reasonable skeptic, looking at this argument, might say: pilots are supposed to fail. That is the entire point of running one. Recent press coverage of MIT’s NANDA project pegged enterprise AI pilot failure at 95%, and the implication is that a pilot collapsing in week 6 is the system working as designed. That is true as far as it goes. But most shuttle pilots do not fail because the underlying program would not work at steady state. They fail for operational reasons (route lock-in, driver pool understaffing, no rider feedback loop) that the pilot itself manufactured. Killing those programs in week 6 throws away the signal with the noise. The expected-value calculation is asymmetric: a real failure caught at week 6 saves three months of operating cost; a misdiagnosed failure forfeits a multi-year mode-shift program of the kind Stanford Research Park documented across three years and a 10-point drive-alone reduction. Operators including Ryde have built rider-feedback loops on the WhatsApp channel and Policy Engine specifically to keep that signal separable from the noise — the goal is not to win the week-6 review, it is to make the week-6 review readable.

The Möser-Bamberg meta-analysis of voluntary travel-behavior-change programs pegged the average car-use reduction at about 7%, with effect sizes consistent across two waves of replication. That is what real mode shift looks like — not a 50% conversion at week 1. A pilot dashboard that expects to see a 50% number is going to read 7% as failure even when 7% is the best published evidence-based number anyone has ever produced for this kind of program.

This is what a 12-week minimum buys. It buys the time for the new-habit curve to cross the old-habit curve. It buys two cycles of route adjustment, not one. It buys two NPS reads that are both sampling habit-stage riders, so the comparison is meaningful. Any ops VP currently sitting in a week-6 review with a “kill or continue” question on the agenda should ask one upstream question first: are we reading a verdict, or are we reading the gap between two curves that have not yet crossed? The 12-week pilot is the shortest pilot that produces an answer the data can actually support, not a longer pilot but a properly-windowed one. For employers scoping that next round, the employee-shuttle-vs-parking-cost math lays out what the comparison looks like over a 30-year horizon, and the contact form is the right place to start if you want a parallel-pilot design that does not collapse the signal into the noise.

Sources

How are habits formed: Modelling habit formation in the real world — Lally, P., van Jaarsveld, C. H. M., Potts, H. W. W., & Wardle, J., European Journal of Social Psychology, 2010. Accessed 2026-05-09.
Habit substitution toward more active commuting — Di Maio, S. et al., Applied Psychology: Health and Well-Being, 2025. Accessed 2026-05-09.
The temporal trajectories of habit decay in daily life — PsyCh Journal / PMC, 2024. Accessed 2026-05-09.
Voluntary Travel Behavior Change — Policy Brief — California Air Resources Board, citing Möser & Bamberg (2008). Accessed 2026-05-09.
Microsoft Connector: 19 routes, 53 buses later — Seattle Times. Accessed 2026-05-09.
Reducing Microsoft’s Commuting Footprint — Five Years of The Connector — Microsoft Green Blog. Accessed 2026-05-09.
Stanford Research Park gains traction in effort to shift workers’ commute habits — Palo Alto Online. Accessed 2026-05-09.
Why enterprise AI pilots fail — CIO Dive (covering MIT NANDA, 2025). Accessed 2026-05-09.
Commuter Benefit Monthly Limit Increase, 2024 to 2025 — Commuter Services / IRS Section 132(f). Accessed 2026-05-09.

Why Employee Shuttle Pilots Fail in Week 6