System Design 存储/数据库系统面试方法论
System Design Storage/Database Systems Methodology
🎯 核心问题:这类题目的特征
Core Question: Characteristics of These Problems
存储系统是系统设计的基础! 几乎所有系统都需要存储,选择合适的存储方案是关键。
Storage systems are the foundation of system design! Almost all systems need storage, and choosing the right storage solution is crucial.
关键特征
Key Characteristics
| 维度 / Dimension | Storage/Database 系统 |
|---|---|
| 核心功能 / Core Function | 存储和检索数据 |
| 输入 / Input | 写入请求(write requests) |
| 输出 / Output | 读取结果(read results) |
| 关键挑战 / Key Challenge | 读写性能、一致性、扩展性 |
| 典型题目 / Typical Problems | Design URL Shortener, Design Key-Value Store, Design File Storage |
📊 决策树:识别 Storage/Database 题目
Decision Tree: Identify Storage/Database Problems
面试题目分析
Interview Problem Analysis
│
├─ 是否主要关注"存储"、"数据库"、"数据持久化"?
│ Is it mainly about "storage", "database", "data persistence"?
│ │
│ ├─ YES → Storage/Database 系统
│ │ └─ 继续判断具体类型...
│ │
│ └─ NO → 可能是其他类型
│
├─ 读写比例?
│ Read/Write Ratio?
│ │
│ ├─ 读多写少(100:1)→ 读优化存储
│ │ Read-heavy → Read-optimized storage
│ ├─ 写多读少(1:100)→ 写优化存储
│ │ Write-heavy → Write-optimized storage
│ └─ 读写均衡 → 平衡型存储
│ Balanced → Balanced storage
│
├─ 数据一致性要求?
│ Data Consistency Requirements?
│ │
│ ├─ 强一致性 → ACID 数据库
│ │ Strong consistency → ACID database
│ ├─ 最终一致性 → NoSQL/CAP
│ │ Eventual consistency → NoSQL/CAP
│ └─ 可接受不一致 → 缓存 + 数据库
│ Accept inconsistency → Cache + Database
│
└─ 数据规模?
Data Scale?
│
├─ 小规模(< 1TB)→ 单机数据库
├─ 中规模(1TB-100TB)→ 分片数据库
└─ 大规模(> 100TB)→ 分布式存储
🔍 核心特征识别
Core Characteristics Identification
典型题目关键词
Typical Problem Keywords
- Key-Value Store 问题
- Design URL Shortener
- Design Distributed Cache
- Design Key-Value Store
- Design Pastebin
- File Storage 问题
- Design File Storage System
- Design Image Storage
- Design Video Storage
- Design Dropbox
- Database 问题
- Design Database Sharding
- Design Distributed Database
- Design Time-series Database
核心需求模式
Core Requirement Patterns
输入:写入请求
Input: Write Requests
- Key-Value pairs
- Files/Objects
- Records/Transactions
输出:读取结果
Output: Read Results
- Value by key
- File content
- Query results
关键挑战:
Key Challenges:
- 读写性能(QPS)
Read/Write performance (QPS)
- 数据一致性(ACID vs BASE)
Data consistency (ACID vs BASE)
- 扩展性(水平扩展)
Scalability (horizontal scaling)
- 持久化(数据不丢失)
Persistence (no data loss)
🏗️ 标准架构模式
Standard Architecture Patterns
模式一:Key-Value Store 架构
Pattern 1: Key-Value Store Architecture
适用场景: Use Cases:
- URL Shortener
- Distributed Cache
- Session Storage
核心组件: Core Components:
Client
↓
Load Balancer
↓
Application Server
├─→ Key Generation Service
├─→ Cache Layer (Redis)
└─→ Database (MySQL/NoSQL)
↓
Storage Layer
└─→ Object Storage (S3) [for large values]
关键设计点: Key Design Points:
- Key 生成策略
Key Generation Strategy:
- Base62 encoding (URL shortener)
- UUID
- Snowflake ID
- Hash-based
- 存储选择
Storage Selection:
- 小 Value (< 1MB) → Database
- 大 Value (> 1MB) → Object Storage (S3)
- 热点数据 → Cache (Redis)
- 扩展策略
Scaling Strategy:
- 水平分片(按 key hash)
- 一致性哈希(Consistent Hashing)
- 读写分离
模式二:File Storage 架构
Pattern 2: File Storage Architecture
适用场景: Use Cases:
- File Storage System
- Image/Video Storage
- Dropbox-like System
核心组件: Core Components:
Client
↓
API Gateway
↓
File Service
├─→ Metadata DB (PostgreSQL)
├─→ File Storage (S3/HDFS)
└─→ CDN (for read optimization)
关键设计点: Key Design Points:
- 元数据 vs 文件内容分离
Metadata vs Content Separation:
- Metadata: 数据库存储(文件名、路径、大小)
- Content: 对象存储(S3、HDFS)
- 上传优化
Upload Optimization:
- 分块上传(Chunked upload)
- 断点续传(Resumable upload)
- 并行上传
- 下载优化
Download Optimization:
- CDN 缓存
- 预取(Prefetching)
- 压缩(Compression)
📋 核心设计决策点
Core Design Decision Points
1. 数据库类型选择
1. Database Type Selection
SQL Database (PostgreSQL, MySQL)
优点:
Advantages:
- ACID 事务支持
ACID transaction support
- 强一致性
Strong consistency
- 复杂查询支持
Complex query support
缺点:
Disadvantages:
- 扩展性有限
Limited scalability
- 写入性能受限
Limited write performance
适用场景:
Use Cases:
- 需要事务的场景
Scenarios requiring transactions
- 复杂查询需求
Complex query requirements
- 强一致性要求
Strong consistency requirements
NoSQL Database (Cassandra, DynamoDB)
优点:
Advantages:
- 高写入性能
High write performance
- 水平扩展
Horizontal scaling
- 最终一致性
Eventual consistency
缺点:
Disadvantages:
- 不支持复杂查询
No complex queries
- 一致性较弱
Weaker consistency
适用场景:
Use Cases:
- 高写入负载
High write load
- 简单查询模式
Simple query patterns
- 可以接受最终一致性
Can accept eventual consistency
Key-Value Store (Redis, Memcached)
优点:
Advantages:
- 极快速度(内存)
Extremely fast (in-memory)
- 简单 API
Simple API
缺点:
Disadvantages:
- 内存限制
Memory limitations
- 数据可能丢失(除非持久化)
Data may be lost (unless persisted)
适用场景:
Use Cases:
- 缓存
Caching
- 会话存储
Session storage
- 临时数据
Temporary data
2. 分片策略选择
2. Sharding Strategy Selection
Range-based Sharding
优点:
Advantages:
- 实现简单
Simple implementation
- 范围查询高效
Efficient range queries
缺点:
Disadvantages:
- 热点问题(某些分片负载高)
Hotspot issues (some shards have high load)
- 数据分布不均
Uneven data distribution
适用场景:
Use Cases:
- 有序数据
Ordered data
- 范围查询多
Many range queries
Hash-based Sharding
优点:
Advantages:
- 数据分布均匀
Even data distribution
- 避免热点
Avoids hotspots
缺点:
Disadvantages:
- 范围查询困难
Difficult range queries
- 重新分片复杂
Complex resharding
适用场景:
Use Cases:
- 随机访问模式
Random access patterns
- 不需要范围查询
No range queries needed
Directory-based Sharding
优点:
Advantages:
- 灵活的分片策略
Flexible sharding strategy
- 易于重新分片
Easy resharding
缺点:
Disadvantages:
- 需要额外的查找服务
Requires additional lookup service
- 单点故障风险
Single point of failure risk
适用场景:
Use Cases:
- 复杂分片需求
Complex sharding requirements
- 需要动态调整
Need dynamic adjustment
3. 一致性策略选择
3. Consistency Strategy Selection
强一致性(Strong Consistency)
实现方式:
Implementation:
- 同步复制
Synchronous replication
- 两阶段提交(2PC)
Two-phase commit
优点:
Advantages:
- 数据始终一致
Data always consistent
- 读取最新数据
Read latest data
缺点:
Disadvantages:
- 延迟高
High latency
- 可用性可能受影响
Availability may be affected
适用场景:
Use Cases:
- 金融交易
Financial transactions
- 关键业务数据
Critical business data
最终一致性(Eventual Consistency)
实现方式:
Implementation:
- 异步复制
Asynchronous replication
- 向量时钟(Vector Clocks)
Vector clocks
优点:
Advantages:
- 低延迟
Low latency
- 高可用性
High availability
缺点:
Disadvantages:
- 可能读到旧数据
May read stale data
- 冲突解决复杂
Complex conflict resolution
适用场景:
Use Cases:
- 社交网络
Social networks
- 内容分发
Content distribution
4. 缓存策略选择
4. Caching Strategy Selection
Cache-Aside (Lazy Loading)
流程:
Flow:
1. 查询缓存
Check cache
2. 缓存未命中 → 查询数据库
Cache miss → Query database
3. 写入缓存
Write to cache
优点:
Advantages:
- 实现简单
Simple implementation
- 缓存故障不影响系统
Cache failure doesn't affect system
缺点:
Disadvantages:
- 缓存未命中延迟高
High latency on cache miss
- 可能缓存不一致
Possible cache inconsistency
Write-Through
流程:
Flow:
1. 写入数据库
Write to database
2. 同时写入缓存
Write to cache simultaneously
优点:
Advantages:
- 缓存始终最新
Cache always up-to-date
- 读取性能好
Good read performance
缺点:
Disadvantages:
- 写入延迟高
High write latency
- 写入不常读的数据浪费
Wastes cache on rarely-read writes
Write-Behind (Write-Back)
流程:
Flow:
1. 写入缓存
Write to cache
2. 异步写入数据库
Asynchronously write to database
优点:
Advantages:
- 写入延迟低
Low write latency
- 批量写入优化
Batch write optimization
缺点:
Disadvantages:
- 数据可能丢失(缓存故障)
Data may be lost (cache failure)
- 实现复杂
Complex implementation
🎯 标准解题流程
Standard Problem-Solving Process
Step 1: 需求澄清(Requirements Clarification)
Step 1: Requirements Clarification
必须明确的问题: Questions to Clarify:
- 数据特征
Data Characteristics:
- 数据大小(小 Value vs 大 File)
- 数据访问模式(随机 vs 顺序)
- 数据生命周期(临时 vs 永久)
- 读写特征
Read/Write Characteristics:
- 读写比例(读多写少?写多读少?)
- QPS 要求
- 延迟要求
- 一致性要求
Consistency Requirements:
- 强一致性 vs 最终一致性
- 是否可以接受旧数据
- 事务需求
- 扩展性要求
Scalability Requirements:
- 数据规模(当前、未来)
- 是否需要水平扩展
- 增长预期
Step 2: 估算规模(Scale Estimation)
Step 2: Scale Estimation
关键指标: Key Metrics:
写入负载:
Write Load:
- Writes/second
- 平均数据大小
Average data size
- 峰值 vs 平均值
Peak vs average
读取负载:
Read Load:
- Reads/second
- 读取模式(随机/顺序)
Read patterns (random/sequential)
存储需求:
Storage Requirements:
- 总数据量
Total data volume
- 数据增长率
Data growth rate
- 保留时间
Retention period
示例计算(URL Shortener): Example Calculation (URL Shortener):
Writes = 100M URLs/day = 1.2k/second
Reads = 100:1 ratio = 120k/second
Storage (5 years):
100M/day * 365 * 5 = 182.5B URLs
182.5B * 500 bytes = 91.25 TB
Step 3: 基础设计(Basic Design)
Step 3: Basic Design
最小可行方案: Minimum Viable Solution:
1. Key Generation
- 生成唯一 key
Generate unique key
2. Storage
- 存储 key-value 映射
Store key-value mapping
3. Retrieval
- 根据 key 查询 value
Query value by key
承认问题: Acknowledge Issues:
- “这个方案在规模上会有问题:单机数据库无法支撑”
- “This solution will have scale issues: single DB can’t handle it”
Step 4: 优化设计(Optimized Design)
Step 4: Optimized Design
核心优化方向: Core Optimization Directions:
- 存储优化
Storage Optimization:
- 分片(Sharding)
- 读写分离(Read replicas)
- 缓存层(Cache layer)
- 性能优化
Performance Optimization:
- CDN(内容分发)
- 预取(Prefetching)
- 批量操作(Batching)
- 可靠性优化
Reliability Optimization:
- 复制(Replication)
- 备份(Backup)
- 故障恢复(Failover)
📚 典型题目分类
Problem Categories
Key-Value Store 问题
- Design URL Shortener
- 核心:短 URL 生成和重定向
- 关键:Key 生成、高读取负载
- 挑战:扩展性、缓存策略
- Design Distributed Cache
- 核心:分布式键值存储
- 关键:一致性哈希、缓存策略
- 挑战:缓存失效、一致性
File Storage 问题
- Design File Storage System
- 核心:文件上传、存储、下载
- 关键:元数据分离、CDN
- 挑战:大文件处理、扩展性
🎯 面试策略总结
Interview Strategy Summary
开场策略
Opening Strategy
1. 识别题目类型
Identify problem type
"这是一个存储系统设计问题"
"This is a storage system design problem"
2. 明确核心需求
Clarify core requirements
"需要存储和检索 [key-value/files]"
"Need to store and retrieve [key-value/files]"
3. 询问关键参数
Ask key parameters
- 读写比例
Read/write ratio
- 数据大小
Data size
- 一致性要求
Consistency requirements
📝 快速检查清单
Quick Checklist
需求澄清 Checklist
- 数据特征(大小、类型、生命周期)
- 读写特征(比例、QPS、延迟)
- 一致性要求(强一致 vs 最终一致)
- 扩展性要求(规模、增长)
设计 Checklist
- 数据库选择(SQL/NoSQL/Key-Value)
- 分片策略(Range/Hash/Directory)
- 缓存策略(Cache-Aside/Write-Through)
- 复制策略(同步/异步)
- 扩展性(水平扩展、负载均衡)
记住:这类题目的核心是选择合适的存储方案、分片策略、一致性模型。重点是读写性能、扩展性、一致性权衡! Remember: The core of these problems is choosing the right storage solution, sharding strategy, and consistency model. Focus on read/write performance, scalability, and consistency trade-offs!