# Dataset Schemas ## Pretraining Dataset - Input text: Raw text for language modeling. ## SFT Dataset - Instruction: User instruction text. - Response: Assistant response text. ## RLHF Pairs - Chosen: Preferred assistant response. - Rejected: Less preferred assistant response.