Add data/schemas.md with dataset schema descriptions

This commit is contained in:
blackboxprogramming
2025-08-08 01:18:53 -07:00
committed by GitHub
parent 69eb0ae00c
commit fa4f69097f

View File

@@ -0,0 +1,12 @@
# Dataset Schemas
## Pretraining Dataset
- Input text: Raw text for language modeling.
## SFT Dataset
- Instruction: User instruction text.
- Response: Assistant response text.
## RLHF Pairs
- Chosen: Preferred assistant response.
- Rejected: Less preferred assistant response.