A

Arogya • 3.19K Points
Extraordinary

Q. Which technique aligns LLaMA responses with human preferences?

  • (A) RLHF
  • (B) Sorting
  • (C) Indexing
  • (D) Compilation
  • Correct Answer - Option(A)
  • Views: 2
  • Filed under category Llama
  • Hashtags:

Explanation by: Arogya
Reinforcement Learning from Human Feedback (RLHF) improves helpfulness and safety of responses.

You must be Logged in to update hint/solution

Discusssion

Login to discuss.

Be the first to start discuss.