Mistral moderation service helps detect and filter harmful content across multiple policy dimensions to secure your AI applications.
Mistral AI provides a sophisticated content moderation service that enables users to detect harmful text content across multiple policy dimensions, helping to secure LLM applications and ensure safe AI interactions.
To get started with Mistral, visit their documentation:
3. Add Guardrail ID to a Config and Make Your Request
When you save a Guardrail, you’ll get an associated Guardrail ID - add this ID to the input_guardrails or output_guardrails params in your Portkey Config
Create these Configs in Portkey UI, save them, and get an associated Config ID to attach to your requests. More here.
Mistral’s moderation service can detect content across 9 key policy categories:
Sexual: Content of sexual nature or adult content
Hate and Discrimination: Content expressing hatred or promoting discrimination
Violence and Threats: Content depicting violence or threatening language
Dangerous and Criminal Content: Instructions for illegal activities or harmful actions
Self-harm: Content related to self-injury, suicide, or eating disorders
Health: Unqualified medical advice or health misinformation
Financial: Unqualified financial advice or dubious investment schemes
Law: Unqualified legal advice or recommendations
PII: Personally identifiable information, including email addresses, phone numbers, etc.
Mistral’s moderation service is natively multilingual, with support for Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.