Technical Glossary

A reference of terms used throughout the Mira platform documentation and in the AI/ML field. Terms are listed in alphabetical order.

Use your browser's search (Ctrl+F / Cmd+F) to quickly find a specific term.

API Key

A secret token used to authenticate requests to the Mira API. API keys should be stored securely in environment variables and never shared publicly.

Base URL

The root URL for all API requests. For Mira, the base URL is https://api.vmira.ai/v1. OpenAI SDKs can be pointed to this URL for compatibility.

Chat Completion

The primary API endpoint for generating model responses. Accepts a list of messages and returns the model's reply. Endpoint: POST /v1/chat/completions.

Context Window

The maximum number of tokens (input + output) a model can process in a single request. Mira models range from 32K to 128K tokens.

Embeddings

Numerical vector representations of text that capture semantic meaning. Used for semantic search, clustering, and similarity comparisons. Endpoint: POST /v1/embeddings.

Extended Thinking

A capability where the model performs step-by-step reasoning before providing a final answer. Available in the mira-pro and mira-max models for complex analytical tasks.

Few-Shot Learning

A prompting technique where you provide a few examples of the desired input-output format in the prompt to guide the model's behavior.

Fine-Tuning

The process of further training a base model on a specific dataset to improve its performance for a particular task or domain. Coming soon to the Mira platform.

Function Calling

A feature that allows the model to generate structured JSON arguments for predefined functions, enabling integration with external tools and APIs. Also called Tool Use.

Hallucination

When a model generates information that is factually incorrect or fabricated but presented confidently. Mitigated through extended thinking and grounding techniques.

JSON Mode

A response format option that constrains the model to output only valid JSON. Enabled by setting response_format to { type: "json_object" } in the API request.

Max Tokens

The maximum number of tokens the model will generate in a response. Setting this parameter helps control response length and API costs.

Messages API

The chat-based API format where conversations are represented as an array of message objects, each with a role (system, user, assistant) and content.

Model

A trained AI system that generates responses. Mira offers multiple models (mira, mira-pro, mira-max) optimized for different use cases.

Prompt

The input text or instruction sent to the model. A well-crafted prompt is essential for getting accurate and relevant responses.

Prompt Engineering

The practice of designing and optimizing prompts to elicit better responses from AI models. Techniques include few-shot examples, chain-of-thought, and role-playing.

RAG (Retrieval-Augmented Generation)

A technique that combines information retrieval with text generation. The model is given relevant context retrieved from a knowledge base before generating a response.

Rate Limit

The maximum number of API requests allowed within a time period. Exceeding the limit returns a 429 status code. Limits vary by subscription plan.

Response Format

A parameter that controls the output structure of the model's response. Options include plain text and JSON mode for structured outputs.

Sampling Temperature

A parameter (0.0 to 2.0) controlling randomness in the model's output. Lower values (e.g., 0.1) produce more deterministic responses; higher values (e.g., 1.0) produce more creative ones.

Semantic Search

Search that understands the meaning of queries rather than matching exact keywords. Powered by embeddings and vector similarity comparisons.

Server-Sent Events (SSE)

A web standard for streaming data from server to client over HTTP. Used by the Mira API for streaming responses token by token in real time.

Stop Sequence

A string or set of strings that, when generated, causes the model to stop producing further tokens. Useful for controlling where a response ends.

Streaming

A mode where the API sends response tokens incrementally as they are generated, rather than waiting for the complete response. Enabled by setting stream: true.

System Prompt

A special message with role "system" that sets the behavior, personality, and constraints for the AI model throughout the conversation.

Token

The basic unit of text processing for language models. A token is roughly 3-4 characters in English or 1-2 characters in Russian. API pricing is based on token count.

Tool Use

The ability for a model to call external tools (functions) during a conversation. The model generates tool call arguments, your code executes the tool, and the result is fed back to the model.

Top-P (Nucleus Sampling)

A sampling parameter that restricts token selection to the smallest set of tokens whose cumulative probability exceeds P. An alternative to temperature for controlling randomness.

Vector Database

A database optimized for storing and querying high-dimensional vectors (embeddings). Used in RAG systems for efficient semantic search over large document collections.

Zero-Shot Learning

The ability of a model to perform a task without any examples in the prompt. The model relies solely on its pre-trained knowledge and the task description.

Term not listed?

If you encounter an unfamiliar term in our documentation, email us at support@vmira.ai and we will add a definition.