Arogya • 3.19K Points
Extraordinary

Q. Which technique aligns LLaMA responses with human preferences?

Reinforcement Learning from Human Feedback (RLHF) improves helpfulness and safety of responses.

You must be Logged in to update hint/solution

Discusssion

Be the first to start discuss.

Tag	Mapped By	Action