Тестовые задания — это 70% решения offer/no-offer на собесе аналитика. Они показывают не «знаешь ли ты SQL», а «можешь ли решить реальную задачу». В этом гайде — 15 РЕАЛЬНЫХ заданий (анонимизированных) с собесов 2026 в Yandex/Ozon/Тинькофф/Сбер/Avito/Wildberries/Lamoda/X5/МТС/VK, с подробными разборами.
Что обычно ожидается от тестового
Формат
- Home assignment (1-3 дня)
- In-house challenge (2-4 часа)
- Live coding (60-90 мин)
Структура решения
- Executive summary (1 параграф business-readable)
- Approach (что делал и почему)
- Analysis (SQL queries / Python code + outputs)
- Findings (key insights)
- Recommendations (что делать дальше)
Что recruiter ищет
- Communication — code commented, decisions explained
- Business sense — recommendations actionable
- Technical depth — но не over-engineered
- Edge cases — что если data missing, что если outliers
- Trade-offs — почему именно этот подход, не другой
15 реальных заданий с разборами
Задание 1: SQL — Top sellers (Ozon-style)
Условие: В таблице \orders\ (seller_id, order_date, revenue, status). Найди топ-10 sellers по выручке за май 2026 со статусом 'completed', и для каждого покажи долю в общей выручке.
✅ Решение:
\\\sql
WITH seller_rev AS (
SELECT seller_id, sum(revenue) AS rev
FROM orders
WHERE order_date >= '2026-05-01' AND order_date < '2026-06-01'
AND status = 'completed'
GROUP BY seller_id
)
SELECT seller_id, rev,
rev * 100.0 / sum(rev) OVER () AS pct_of_total
FROM seller_rev
ORDER BY rev DESC
LIMIT 10;
\\\
Что recruiter ценит:
- WHERE clauses конкретные (даты + status)
- SUM OVER () для total — elegant
- Bonus: упомянуть что ORDER BY + LIMIT перед window function может быть оптимизирован
Задание 2: SQL — Cohort retention (Тинькофф-style)
Условие: Посчитай retention Day-7 (% юзеров, вернувшихся через 7 дней после первого визита) по cohort месяца регистрации.
✅ Решение:
\\\sql
WITH first_visit AS (
SELECT user_id, min(visit_date) AS first_date
FROM visits
GROUP BY user_id
),
d7_check AS (
SELECT
fv.user_id,
toStartOfMonth(fv.first_date) AS cohort,
maxIf(1, v.visit_date BETWEEN fv.first_date + 7 AND fv.first_date + 13) AS returned_d7
FROM first_visit fv
LEFT JOIN visits v ON fv.user_id = v.user_id
GROUP BY fv.user_id, fv.first_date
)
SELECT cohort,
count() AS new_users,
sum(returned_d7) AS returned,
100.0 * sum(returned_d7) / count() AS retention_d7_pct
FROM d7_check
GROUP BY cohort
ORDER BY cohort;
\\\
Задание 3: Python — Conversion funnel (Avito-style)
Условие: Из event log csv (event_type, user_id, ts), посчитай conversion funnel: search → view → contact → deal.
✅ Решение:
\\\python
import pandas as pd
df = pd.read_csv('events.csv')
# User-level: did user perform each step?
funnel = df.pivot_table(
index='user_id',
columns='event_type',
values='ts',
aggfunc='count',
fill_value=0
)
# Conversion rates
steps = ['search', 'view', 'contact', 'deal']
funnel_counts = {step: (funnel[step] > 0).sum() for step in steps}
print("Funnel:")
for i, step in enumerate(steps):
cnt = funnel_counts[step]
if i == 0:
print(f" {step}: {cnt:,} (100%)")
else:
prev_cnt = funnel_counts[steps[i-1]]
pct = cnt / prev_cnt * 100
overall = cnt / funnel_counts[steps[0]] * 100
print(f" {step}: {cnt:,} ({pct:.1f}% from prev, {overall:.1f}% overall)")
\\\
Задание 4: Python — Detect anomalies (Yandex-style)
Условие: Daily revenue time series. Detect anomalies (точки которые statistically unusual).
✅ Решение:
\\\python
import pandas as pd
import numpy as np
df = pd.read_csv('daily_revenue.csv', parse_dates=['date'])
df = df.sort_values('date')
# Method 1: Z-score (works for stationary series)
df['z_score'] = (df['revenue'] - df['revenue'].mean()) / df['revenue'].std()
anomalies_z = df[abs(df['z_score']) > 3]
# Method 2: Rolling stats (better for trend data)
df['rolling_mean'] = df['revenue'].rolling(7).mean()
df['rolling_std'] = df['revenue'].rolling(7).std()
df['rolling_z'] = (df['revenue'] - df['rolling_mean']) / df['rolling_std']
anomalies_rolling = df[abs(df['rolling_z']) > 2]
# Method 3: Decompose (для seasonal data)
from statsmodels.tsa.seasonal import seasonal_decompose
decomp = seasonal_decompose(df['revenue'], model='additive', period=7)
residuals = decomp.resid.dropna()
threshold = residuals.std() * 3
anomalies_decomp = residuals[abs(residuals) > threshold]
\\\
Bonus answer: discuss trade-offs — z-score assumes normality, rolling adapts to trend, decompose handles seasonality.
Задание 5: A/B-test analysis (Тинькофф-style)
Условие: Запустили A/B на новой кнопке оплаты. Control 50K юзеров, treatment 50K. Conversion control 8.2%, treatment 8.7%. Стат значимо?
✅ Решение:
\\\python
from scipy import stats
import numpy as np
n_c, p_c = 50000, 0.082
n_t, p_t = 50000, 0.087
successes_c = int(n_c * p_c)
successes_t = int(n_t * p_t)
# Two-proportion z-test
pool_p = (successes_c + successes_t) / (n_c + n_t)
se = np.sqrt(pool_p * (1 - pool_p) * (1/n_c + 1/n_t))
z = (p_t - p_c) / se
p_value = 2 * (1 - stats.norm.cdf(abs(z)))
print(f"Z-statistic: {z:.3f}")
print(f"P-value: {p_value:.4f}")
print(f"Lift: {(p_t - p_c) / p_c * 100:.2f}%")
print(f"Significant at α=0.05: {p_value < 0.05}")
# Confidence interval
se_diff = np.sqrt(p_c * (1-p_c) / n_c + p_t * (1-p_t) / n_t)
ci_low = (p_t - p_c) - 1.96 * se_diff
ci_high = (p_t - p_c) + 1.96 * se_diff
print(f"95% CI for difference: [{ci_low:.4f}, {ci_high:.4f}]")
\\\
Output: p ≈ 0.006 (significant). Lift 6.1%.
Bonus answer: mention guardrails (refund rate, customer NPS), discuss whether lift is practically significant.
Задание 6: Метрики — Decompose DAU drop (Yandex-style)
Условие: DAU упал с 100K до 92K (-8%). Что делаешь?
✅ Approach (не SQL, а thinking framework):
- Декомпозиция:
- DAU = unique users × (sessions / user × 1/1 day)
- Drill-down hypotheses:
- Marketing: paid spend cut? campaign ended?
- Product: new release broke flow? UX changed?
- External: competitor launched? holiday?
- Seasonality: week-over-week or year-over-year compare
- Data needed:
- DAU by platform (iOS / Android / Web)
- DAU by region
- DAU by user cohort
- Action:
- Hypothesis-driven drilling (try one hypothesis at a time)
- Quick win + long-term root cause fix
Что recruiter ценит: structured approach, не jumping to conclusions.
Задание 7: SQL — Sessionization (Avito-style)
Условие: События от users, нужно объединить в сессии (если перерыв > 30 минут → новая сессия).
✅ Решение:
\\\sql
WITH events_lagged AS (
SELECT user_id, event_ts,
lag(event_ts) OVER (PARTITION BY user_id ORDER BY event_ts) AS prev_ts
FROM events
),
session_starts AS (
SELECT user_id, event_ts,
CASE
WHEN prev_ts IS NULL THEN 1
WHEN dateDiff('minute', prev_ts, event_ts) > 30 THEN 1
ELSE 0
END AS new_session
FROM events_lagged
),
sessions AS (
SELECT user_id, event_ts,
sum(new_session) OVER (PARTITION BY user_id ORDER BY event_ts) AS session_id
FROM session_starts
)
SELECT user_id, session_id,
min(event_ts) AS session_start,
max(event_ts) AS session_end,
count() AS event_count
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_start;
\\\
Задание 8: Python — Customer LTV (Lamoda-style)
Условие: Из транзакций посчитай LTV каждого customer + project forward 12 месяцев.
✅ Решение:
\\\python
import pandas as pd
# Historical LTV
df['amount'] = df['amount'].astype(float)
hist_ltv = df.groupby('user_id').agg(
historical_ltv=('amount', 'sum'),
purchase_count=('order_id', 'nunique'),
first_purchase=('order_date', 'min'),
last_purchase=('order_date', 'max'),
avg_order_value=('amount', 'mean')
).reset_index()
# Projection
hist_ltv['tenure_days'] = (hist_ltv['last_purchase'] - hist_ltv['first_purchase']).dt.days + 1
hist_ltv['daily_revenue'] = hist_ltv['historical_ltv'] / hist_ltv['tenure_days']
hist_ltv['projected_12m_ltv'] = hist_ltv['historical_ltv'] + hist_ltv['daily_revenue'] * 365
# Active customers (purchased in last 90 days)
from datetime import datetime
hist_ltv['days_since_last'] = (datetime.now() - hist_ltv['last_purchase']).dt.days
hist_ltv['is_active'] = hist_ltv['days_since_last'] <= 90
print(hist_ltv[hist_ltv['is_active']].describe())
\\\
Bonus: discuss BG/NBD probabilistic LTV model for production accuracy.
Задание 9: Продуктовый кейс (X5-style)
Условие: Перекрёсток хочет открыть Чижик в этом районе. Что данные нужны для решения?
✅ Решение framework:
- Cannibalization analysis — какие Перекрёсток customers переключатся
- Net new acquisition — какие новые customers attract (price-sensitive)
- Geographic study — соседние competitors, foot traffic
- Demographic — income distribution, household composition
- Real estate — lease cost vs predicted revenue
- Logistics — distribution center proximity
Models:
- Demand modeling: gravity model для market share
- Financial: 3-year NPV with sensitivity analysis
- Risk: stress test (competitor reaction, recession)
Задание 10: SQL — Self-join для friend recommendations (VK-style)
Условие: В таблице friendships (user_id, friend_id). Найди для user_id=123 топ-10 «friends of friends» которые ещё НЕ его друзья.
✅ Решение:
\\\sql
WITH my_friends AS (
SELECT friend_id AS user_id
FROM friendships WHERE user_id = 123
),
friends_of_friends AS (
SELECT f.friend_id, count() AS mutual_friends
FROM friendships f
JOIN my_friends mf ON f.user_id = mf.user_id
WHERE f.friend_id != 123
AND f.friend_id NOT IN (SELECT user_id FROM my_friends)
GROUP BY f.friend_id
)
SELECT * FROM friends_of_friends
ORDER BY mutual_friends DESC
LIMIT 10;
\\\
Задание 11: Python — Forecasting (МТС-style)
Условие: Прогнози daily revenue на следующие 30 дней.
✅ Решение (Prophet):
\\\python
from prophet import Prophet
import pandas as pd
df = pd.read_csv('daily_revenue.csv', parse_dates=['date'])
df.columns = ['ds', 'y'] # Prophet expects these names
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
changepoint_prior_scale=0.05
)
model.fit(df)
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(30))
\\\
Bonus: discuss SARIMA, XGBoost для time series, ensemble approaches.
Задание 12: SQL — Window function для running totals (Сбер-style)
Условие: Для каждого транзакции рассчитай running total по user_id (sum amount по дате).
✅ Решение:
\\\sql
SELECT
user_id, tx_date, amount,
sum(amount) OVER (
PARTITION BY user_id
ORDER BY tx_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM transactions
ORDER BY user_id, tx_date;
\\\
→ Window functions полный гайд
Задание 13: A/B-tests product (Wildberries-style)
Условие: Запустили new search ranking algorithm в A/B. CTR +5%, но GMV per searcher -1%. Что делаешь?
✅ Analysis:
- Investigate divergence:
- GMV down где специфически?
- Position bias — может click на cheaper items boost CTR
- Quality measure:
- Conversion rate (clicks → purchases)
- Add-to-cart rate
- Hypothesis:
- But юзеры не покупают (GMV down)
- Wrong optimization target
- Recommendation:
- Re-train на purchase data (not click data)
- Add price tier balancing
Задание 14: Behavioral case (Yandex-style)
Условие: PM запрашивает analysis с deadline 1 день. Знаешь что для proper analysis нужно 3 дня. Что делаешь?
✅ Answer (STAR-format):
«Situation: PM хочет analysis для quarterly review tomorrow.
Task: Determine scope vs depth trade-off.
Action:
- Quick assessment: какие 80% insights достижимы за 4 hours?
- Communicate: «Могу сделать basic version (4h) — top-line numbers + 1 deep-dive. Full analysis нужны 3 дня — granular segmentation + statistical testing.»
- Offer hybrid: quick version tomorrow + follow-up week 1
- Get explicit approval before starting
Result: PM выбирает quick version. Я деливеру + flag known limitations. Week 1 — follow-up с full analysis. PM appreciates transparency.»
Задание 15: SQL Performance optimization (любая компания)
Условие: Query timeout (60+ секунд). Оптимизируй.
\\\sql
-- Original (slow):
SELECT u.name, sum(o.amount)
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.order_date >= '2026-01-01'
AND substring(u.email, locate('@', u.email)) = '@gmail.com'
GROUP BY u.user_id, u.name;
\\\
✅ Решение:
\\\sql
-- Optimizations:
-- 1. Pre-filter users by email (non-sargable substring)
-- 2. Add index on (user_id, order_date) for orders
-- 3. Filter orders BEFORE join
WITH gmail_users AS (
SELECT user_id, name FROM users WHERE email LIKE '%@gmail.com'
),
recent_orders AS (
SELECT user_id, sum(amount) AS total
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY user_id
)
SELECT u.name, COALESCE(o.total, 0) AS total_revenue
FROM gmail_users u
LEFT JOIN recent_orders o ON u.user_id = o.user_id;
\\\
Why faster:
- Pre-filter orders (smaller hash table)
- LIKE с % at end can use index (vs SUBSTRING which is non-sargable)
- Composite index (user_id, order_date) gives index-only scan
Топ-7 ошибок в тестовых заданиях
- Не readme — нет explanation, recruiter не знает что делал
- Over-engineering — 5 hour solution для 2-hour task
- Под-engineering — без structure, без comments
- Не тестируют edge cases — what if data empty / huge / corrupted?
- Ignore business context — technical perfect, business useless
- No recommendations — analysis без actionable next steps
- Не check results — не sanity check (numbers makes sense?)
Чек-лист для тестового задания
- [ ] Прочитать requirements 2 раза
- [ ] Clarifying questions через email если что-то ambiguous
- [ ] Structure approach (write outline before coding)
- [ ] Readme с executive summary
- [ ] Code commented (almost every block)
- [ ] Outputs printed (numbers, charts)
- [ ] Edge cases handled (NULL, outliers)
- [ ] Recommendations actionable
- [ ] Test на 10% sample (sanity check)
- [ ] Refactor если время есть
Что дальше
- 612 тестовых заданий для тренировки
- 521 SQL-задача
- 532 Python-задачи
- 453 кейса
- 150+ вопросов общего собеса
Источники
- Реальные тестовые задания (анонимизированные) с собесов 2026
- Discussions в Telegram чатах аналитиков
- LinkedIn / Хабр Карьера примеры