X5 — крупнейший розничный ритейлер РФ (Пятёрочка / Перекрёсток / Чижик). Собес туда фокусируется на retail-специфике: basket analysis, корзина, geo-аналитика, форматы магазинов. В этом гайде разберу 25+ реальных вопросов с 5 раундов собеса аналитика X5 (X5 Tech / Pyaterochka / Perekrestok / X5 ID) — с разбором сильного и слабого ответа.
Грейды аналитика в X5 (2026)
| Грейд | Compensation/мес РФ | Опыт |
|---|---|---|
| Junior | 130-200K ₽ | 0-1 год |
| Middle | 200-310K ₽ | 1-3 года |
| Senior | 310-470K ₽ | 3-6 лет |
| Lead | 470-620K+ ₽ | 6+ лет |
5 раундов собеса X5
| Раунд | Что | Длительность |
|---|---|---|
| 1. HR | Мотивация, retail опыт | 30 мин |
| 2. SQL live | 2-3 задачи + retail cases | 60 мин |
| 3. Python + статистика | pandas + cohort + A/B | 60 мин |
| 4. Retail-кейс | Basket / format / geo | 60 мин |
| 5. Финал | Behavioral + business | 45 мин |
6 SQL-вопросов с собеса X5
Топ-10 категорий по выручке в Пятёрочке за месяц.
✅ Сильный ответ:
\\\sql
SELECT
category,
sum(amount) AS revenue,
count(distinct check_id) AS checks,
sum(amount) / count(distinct check_id) AS avg_check
FROM transactions
WHERE store_format = 'Pyaterochka'
AND tx_date >= toStartOfMonth(today())
GROUP BY category
ORDER BY revenue DESC
LIMIT 10;
\\\
Senior follow-up: «Эта query на 5B транзакций в месяц медленная — нужен MV \daily_revenue_by_category_format\ с AggregatingMergeTree.»
Basket analysis: сколько checks с >5 items vs <3 items.
✅ Сильный ответ:
\\\sql
WITH basket_sizes AS (
SELECT check_id, count() AS items
FROM transactions
WHERE tx_date >= today() - 30
GROUP BY check_id
)
SELECT
CASE
WHEN items <= 2 THEN '1-2 items (snack)'
WHEN items <= 5 THEN '3-5 items (small)'
WHEN items <= 10 THEN '6-10 items (medium)'
ELSE '11+ items (big)'
END AS bucket,
count() AS checks,
count() * 100.0 / sum(count()) OVER () AS pct
FROM basket_sizes
GROUP BY bucket;
\\\
Cross-sell: какие пары products покупаются вместе.
✅ Сильный ответ:
\\\sql
SELECT
a.product_id AS product_a,
b.product_id AS product_b,
count() AS co_occurrence
FROM transactions a
JOIN transactions b
ON a.check_id = b.check_id
AND a.product_id < b.product_id
WHERE a.tx_date >= today() - 7
GROUP BY product_a, product_b
HAVING co_occurrence >= 100
ORDER BY co_occurrence DESC
LIMIT 50;
\\\
Senior follow-up: «Это market basket analysis. Для actionable insights нужны support / confidence / lift.»
Geo-aналитика: revenue per store по регионам.
✅ Сильный ответ:
\\\sql
SELECT
s.region,
s.format,
count(distinct s.store_id) AS stores,
sum(t.amount) AS revenue,
sum(t.amount) / count(distinct s.store_id) AS revenue_per_store
FROM stores s
JOIN transactions t ON s.store_id = t.store_id
WHERE t.tx_date >= today() - 30
GROUP BY s.region, s.format
ORDER BY revenue_per_store DESC;
\\\
Promo effectiveness: revenue lift во время промо.
✅ Сильный ответ:
\\\sql
WITH promo_periods AS (
SELECT product_id, promo_start, promo_end
FROM promos
WHERE promo_start >= '2026-01-01'
),
sales_during_promo AS (
SELECT t.product_id, sum(t.amount) AS promo_revenue
FROM transactions t
JOIN promo_periods p
ON t.product_id = p.product_id
AND t.tx_date BETWEEN p.promo_start AND p.promo_end
GROUP BY t.product_id
),
sales_baseline AS (
SELECT t.product_id, avg(t.amount) AS baseline_daily_revenue
FROM transactions t
WHERE t.tx_date >= '2025-12-01' AND t.tx_date < '2026-01-01'
GROUP BY t.product_id
)
SELECT s.product_id,
p.promo_revenue,
b.baseline_daily_revenue,
(p.promo_revenue - b.baseline_daily_revenue * dateDiff('day', pp.promo_start, pp.promo_end)) AS revenue_lift
FROM sales_during_promo s
JOIN promo_periods pp ON s.product_id = pp.product_id
JOIN sales_baseline b ON s.product_id = b.product_id;
\\\
Customer segmentation by recency-frequency.
✅ Сильный ответ:
\\\sql
WITH user_rf AS (
SELECT user_id,
dateDiff('day', max(tx_date), today()) AS recency,
count(distinct check_id) AS frequency
FROM transactions
WHERE tx_date >= today() - 90
GROUP BY user_id
)
SELECT
CASE
WHEN recency <= 7 AND frequency >= 10 THEN 'Champion'
WHEN recency <= 14 AND frequency >= 5 THEN 'Loyal'
WHEN recency <= 30 THEN 'Active'
WHEN recency <= 60 THEN 'At-risk'
ELSE 'Dormant'
END AS segment,
count() AS users
FROM user_rf
GROUP BY segment;
\\\
5 Python/A-B вопросов
Pandas: average basket size по часам дня.
✅ Сильный ответ:
\\\python
import pandas as pd
df['hour'] = df['tx_ts'].dt.hour
basket_by_hour = (
df.groupby(['check_id', 'hour'])['amount'].sum()
.groupby('hour').mean()
)
\\\
Store format A/B test design.
✅ Сильный ответ:
«Cluster A/B (рандомизация на store level):
- Select pilot stores — 20-30 similar profile (size, location, demographic)
- Match pairs — каждый pilot пара с control of same profile
- Randomize — pairs assigned A/B randomly
- Run — minimum 4 weeks для weekly seasonality
- Analyze — paired t-test на key metrics
Key metrics:
- Revenue per store
- Average check
- Visits frequency
- Customer satisfaction (NPS)
Guardrails:
- Margin% (не растёт за счёт promo)
- Customer complaints
- Employee feedback»
Promo cannibalization measurement.
✅ Сильный ответ:
«Каннибализация в retail:
Promo на бренд A может cannibalize бренд B в той же категории.
Measurement:
- Pre-promo baseline: category-level revenue
- During promo: total category revenue
- Difference = net promo effect
Если total category revenue не вырос → 100% cannibalization (просто перераспределение).
Решение: measure incremental volume не by brand but by category total. Categorical lift = net win.
В X5 cannibalization мониторится weekly для оценки promo ROI.»
Customer lifetime modeling.
✅ Сильный ответ:
«Approaches для retail:
- Simple LTV = AOV × frequency × lifespan
- BG/NBD + Gamma-Gamma — probabilistic (frequency + monetary)
- ML-based — gradient boosting на 50+ features (demographic, behavioral, geo, season)
Для X5 specifically:
- Loyalty card customers tracked
- Cross-format LTV (Pyaterochka + Perekrestok customers вышей)
- Geographic LTV variability (Москва ≠ регионы)
LTV used для:
- Marketing spend allocation
- VIP customer programs
- Churn prediction»
Sample size для basket size A/B.
✅ Сильный ответ:
\\\python
from scipy import stats
import math
def sample_size_continuous(baseline_mean, baseline_std, mde_pct, alpha=0.05, power=0.80):
mde_abs = baseline_mean * mde_pct
z_alpha = stats.norm.ppf(1 - alpha/2)
z_beta = stats.norm.ppf(power)
n = ((z_alpha + z_beta) ** 2 * 2 * baseline_std 2) / (mde_abs 2)
return math.ceil(n)
# Basket size: baseline = 500₽, std = 300₽, MDE = 2%
n = sample_size_continuous(500, 300, 0.02)
print(f"Sample size: {n}") # ~141,000 per group
\\\
5 retail-кейсов
Открытие нового магазина — какие данные нужны?
✅ Сильный ответ:
«Data для location decision:
Demographic (catchment area):
- Population density (3-5 km radius)
- Age / income distribution
- Household composition
Competition:
- Distance to competitors
- Competitor format / price tier
- Market share existing players
Foot traffic:
- Pedestrian flow (mobile geo data)
- Public transport access
- Parking availability
Real estate:
- Lease cost vs predicted revenue
- Visibility (corner / high-traffic)
Internal:
- Existing X5 stores cannibalization
- Distribution logistics (delivery distance)
Output: financial model — payback period, NPV.»
Цены: как сравнить с конкурентами?
✅ Сильный ответ:
«Price tracking system:
- Mystery shoppers — manual price checks weekly
- Web scraping — competitor websites (для онлайн)
- Receipt OCR — customer-submitted receipts (loyalty program)
- Partner data — third-party retail panels (Nielsen)
Analysis:
- Price index = X5 price / average competitor price
- Categorize: Premium (>1.05), Parity (0.95-1.05), Value (<0.95)
- Cross-reference with elasticity
X5 strategy:
- Pyaterochka — value (Price Index ~0.92)
- Perekrestok — parity / premium (~1.02)
- Чижик — extreme value (~0.85)»
Format expansion: Перекрёсток vs Чижик где открывать?
✅ Сильный ответ:
«Decision framework:
Perekrestok (premium):
- High-income areas (Москва центр / премиальные ЖК)
- Higher AOV but lower frequency
- Customer expects experience (fresh fruit, deli, wine)
Чижик (discount):
- Mid-income areas (массовые жилые)
- Lower AOV but higher frequency (price-sensitive customers)
- Customer expects basics at low price
Data inputs:
- Census demographic by district
- Existing X5 footprint (cannibalization)
- Competitor presence
- Real estate cost vs predicted revenue
Output: rank candidate locations by NPV per format.»
Customer churn в loyalty program.
✅ Сильный ответ:
«Definition: active = 1+ purchase в 90d. Churn = no purchase 90+ days.
Drivers analysis:
- Recency-frequency segment (Champions vs At-risk)
- Demographic (age groups churn differently)
- Category mix (есть customers с narrow categories — easier to churn)
- Promo dependence (только promo customers churn easier)
Action:
- Predict at-risk customers (60-day no purchase + frequency dropping)
- Personalized win-back campaigns (relevant categories, time-sensitive offer)
- A/B test offer types (discount vs free delivery vs bundled)
X5 SmartTeam (loyalty data team) has dedicated churn prediction pipeline.»
Pricing elasticity для category management.
✅ Сильный ответ:
«Elasticity = % change in volume / % change in price.
Estimation:
- Natural variation — historical price changes (sales, promo, supply shocks)
- A/B пе price — different stores get different prices (legal в retail?)
- Log-log regression — ln(volume) = α + β × ln(price) + controls
X5 specific:
- Категорий ~50,000 SKU — нельзя estimate каждый
- Cluster sизмерения by category + price tier
- Premium brands less elastic
- Commodity (молоко / хлеб) more elastic
Action:
- Inelastic categories → raise prices (margin lift)
- Elastic categories → maintain prices (volume sensitive)»
2 behavioral (раунд 5)
Расскажи про analytical insight который привёл к business change.
✅ Сильный ответ (STAR):
«Ситуация: анализ basket показал что customers покупающие fresh produce имеют 3× LTV vs customers без fresh produce.
Задача: убедить product team расширить fresh секцию.
Действие:
- Quantified opportunity: 30% customers buy fresh produce, среди them LTV +3x
- Counter-factual analysis: customers переключающие с no-fresh to fresh → LTV растёт в течение 6 месяцев
- Recommended actions:
- Better location в магазине (front of store)
- Quality investment (faster turnover, daily restocking)
Результат: Fresh produce SKU expansion approved. After 6 months — overall LTV +12%. Fresh shelf real-estate redesigned.»
Конфликт с category manager.
✅ Сильный ответ:
«Ситуация: Category manager хотел drop SKU с low margin (-5% margin), но data showed что this SKU draws customers и они покупают other items basket worth 800₽.
Действие:
- Show full picture: SKU itself unprofitable, but basket с этим SKU 30% larger and higher margin overall
- Recommended compromise: keep SKU but optimize position (less prime real estate, smaller facings)
- Track metrics: weekly basket-level analytics
Результат: SKU kept, position optimized. Overall category margin +1% (small win) due to no customer loss.»
Red flags
- Generic e-comm answers без retail specific
- Не упомянуть offline + online combination
- Игнорировать geo / format dynamics
- Не знать basket analysis basics
Как готовиться к X5
Месяц 1: retail metrics + ClickHouse
- Basket size, average check, frequency, cross-sell
- 50+ SQL задач тренажёр
Месяц 2: geo / format / pricing
- Geographic analytics
- A/B testing для retail
- Pricing elasticity
Месяц 3: кейсы + behavioral
- Retail кейсы из /cases
- AI мок-собес
FAQ
X5 vs Магнит?
X5 больше формат-разнообразие (Pyaterochka + Perekrestok + Чижик), Магнит более homogeneous. X5 более tech-progressive.
Можно ли пройти без retail опыта?
Junior — да. Middle+ — желательно retail или CPG (consumer packaged goods) опыт.
Стек технологий?
ClickHouse + PostgreSQL + Greenplum + Python + Yandex DataLens + R (statistics).
Что дальше
- 521 SQL-задача
- 453 кейса
- AI мок-собес
- 150+ вопросов общего собеса
- Вопросы Wildberries аналитика
- Вопросы Ozon аналитика
Сравнить Free и Pro → (1999 ₽/мес)
Источники
- Levels.fyi РФ 2026 (X5 grades)
- Habr / x5.tech blog
- Retail industry reports (Nielsen, McKinsey CPG)