Databricks Databricks-Generative-AI-Engineer-Associate Databricks Certified Generative AI Engineer Associate Exam Practice Test

Page: 1 / 7
Total 73 questions

Get Databricks-Generative-AI-Engineer-Associate Full Access Download Databricks-Generative-AI-Engineer-Associate PDF

Databricks Certified Generative AI Engineer Associate Questions and Answers

Question 1

A Generative Al Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user's query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable.

Which change could the Generative Al Engineer perform to mitigate this issue?

Options:

Split the LLM output by newline characters to truncate away the summarization explanation.

Tune the chunk size of news articles or experiment with different embedding models.

Revisit their document ingestion logic, ensuring that the news articles are being ingested properly.

Provide few shot examples of desired output format to the system and/or user prompt.

Question 2

A small and cost-conscious startup in the cancer research field wants to build a RAG application using Foundation Model APIs.

Which strategy would allow the startup to build a good-quality RAG application while being cost-conscious and able to cater to customer needs?

Options:

Limit the number of relevant documents available for the RAG application to retrieve from

Pick a smaller LLM that is domain-specific

Limit the number of queries a customer can send per day

Use the largest LLM possible because that gives the best performance for any general queries

Question 3

A Generative AI Engineer wants to build an LLM-based solution to help a restaurant improve its online customer experience with bookings by automatically handling common customer inquiries. The goal of the solution is to minimize escalations to human intervention and phone calls while maintaining a personalized interaction. To design the solution, the Generative AI Engineer needs to define the input data to the LLM and the task it should perform.

Which input/output pair will support their goal?

Options:

Input: Online chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions

Input: Online chat logs; Output: Buttons that represent choices for booking details

Input: Customer reviews; Output: Classify review sentiment

Input: Online chat logs; Output: Cancellation options

Question 4

What is an effective method to preprocess prompts using custom code before sending them to an LLM?

Options:

Directly modify the LLM’s internal architecture to include preprocessing steps

It is better not to introduce custom code to preprocess prompts as the LLM has not been trained with examples of the preprocessed prompts

Rather than preprocessing prompts, it’s more effective to postprocess the LLM outputs to align the outputs to desired outcomes

Write a MLflow PyFunc model that has a separate function to process the prompts

Question 5

A company has a typical RAG-enabled, customer-facing chatbot on its website.

Question # 5

Select the correct sequence of components a user's questions will go through before the final output is returned. Use the diagram above for reference.

Options:

1.embedding model, 2.vector search, 3.context-augmented prompt, 4.response-generating LLM

1.context-augmented prompt, 2.vector search, 3.embedding model, 4.response-generating LLM

1.response-generating LLM, 2.vector search, 3.context-augmented prompt, 4.embedding model

1.response-generating LLM, 2.context-augmented prompt, 3.vector search, 4.embedding model

Question 6

A Generative AI Engineer is creating an agent-based LLM system for their favorite monster truck team. The system can answer text based questions about the monster truck team, lookup event dates via an API call, or query tables on the team’s latest standings.

How could the Generative AI Engineer best design these capabilities into their system?

Options:

Ingest PDF documents about the monster truck team into a vector store and query it in a RAG architecture.

Write a system prompt for the agent listing available tools and bundle it into an agent system that runs a number of calls to solve a query.

Instruct the LLM to respond with “RAG”, “API”, or “TABLE” depending on the query, then use text parsing and conditional statements to resolve the query.

Build a system prompt with all possible event dates and table information in the system prompt. Use a RAG architecture to lookup generic text questions and otherwise leverage the information in the system prompt.

Answer:

Explanation:

In this scenario, the Generative AI Engineer needs to design a system that can handle different types of queries about the monster truck team. The queries may involve text-based information, API lookups for event dates, or table queries for standings. The best solution is to implement atool-based agent system.

Here’s how option B works, and why it’s the most appropriate answer:

System Design Using Agent-Based Model:In modern agent-based LLM systems, you can design a system where the LLM (Large Language Model) acts as a central orchestrator. The model can "decide" which tools to use based on the query. These tools can include API calls, table lookups, or natural language searches. The system should contain asystem promptthat informs the LLM about the available tools.

System Prompt Listing Tools:By creating a well-craftedsystem prompt, the LLM knows which tools are at its disposal. For instance, one tool may query an external API for event dates, another might look up standings in a database, and a third may involve searching a vector database for general text-based information. Theagentwill be responsible for calling the appropriate tool depending on the query.

Agent Orchestration of Calls:The agent system is designed to execute a series of steps based on the incoming query. If a user asks for the next event date, the system will recognize this as a task that requires an API call. If the user asks about standings, the agent might query the appropriate table in the database. For text-based questions, it may call a search function over ingested data. The agent orchestrates this entire process, ensuring the LLM makes calls to the right resources dynamically.

Generative AI Tools and Context:This is a standard architecture for integrating multiple functionalities into a system where each query requires different actions. The core design in option B is efficient because it keeps the system modular and dynamic by leveraging tools rather than overloading the LLM with static information in a system prompt (like option D).

Why Other Options Are Less Suitable:

A (RAG Architecture): While relevant, simply ingesting PDFs into a vector store only helps with text-based retrieval. It wouldn’t help with API lookups or table queries.

C (Conditional Logic with RAG/API/TABLE): Although this approach works, it relies heavily on manual text parsing and might introduce complexity when scaling the system.

D (System Prompt with Event Dates and Standings): Hardcoding dates and table information into a system prompt isn’t scalable. As the standings or events change, the system would need constant updating, making it inefficient.

By bundling multiple tools into a single agent-based system (as in option B), the Generative AI Engineer can best handle the diverse requirements of this system.

Question 7

A Generative Al Engineer is deciding between using LSH (Locality Sensitive Hashing) and HNSW (Hierarchical Navigable Small World) for indexing their vector database Their top priority is semantic accuracy

Which approach should the Generative Al Engineer use to evaluate these two techniques?

Options:

Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs

Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs

Compare the Recall-Onented-Understudy for Gistmg Evaluation (ROUGE) scores of returned results for a representative sample of test inputs

Compare the Levenshtein distances of returned results against a representative sample of test inputs

Answer:

Explanation:

The task is to choose between LSH and HNSW for a vector database index, prioritizing semantic accuracy. The evaluation must assess how well each method retrieves semantically relevant results. Let’s evaluate the options.

Option A: Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs

Cosine similarity measures semantic closeness between vectors, directly assessing retrieval accuracy in a vector database. Comparing returned results’ embeddings to test inputs’ embeddings evaluates how well LSH or HNSW preserves semantic relationships, aligning with the priority.

Databricks Reference:"Cosine similarity is a standard metric for evaluating vector search accuracy"("Databricks Vector Search Documentation," 2023).

Option B: Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs

BLEU evaluates text generation (e.g., translations), not vector retrieval accuracy. It’s irrelevant for indexing performance.

Databricks Reference:"BLEU applies to generative tasks, not retrieval"("Generative AI Cookbook").

Option C: Compare the Recall-Oriented-Understudy for Gisting Evaluation (ROUGE) scores of returned results for a representative sample of test inputs

ROUGE is for summarization evaluation, not vector search. It doesn’t measure semantic accuracy in retrieval.

Databricks Reference:"ROUGE is unsuited for vector database evaluation"("Building LLM Applications with Databricks").

Option D: Compare the Levenshtein distances of returned results against a representative sample of test inputs

Levenshtein distance measures string edit distance, not semantic similarity in embeddings. It’s inappropriate for vector-based retrieval.

Databricks Reference: No specific support for Levenshtein in vector search contexts.

Conclusion: Option A (cosine similarity) is the correct approach, directly evaluating semantic accuracy in vector retrieval, as recommended by Databricks for Vector Search assessments.

Question 8

A Generative Al Engineer is responsible for developing a chatbot to enable their company’s internal HelpDesk Call Center team to more quickly find related tickets and provide resolution. While creating the GenAI application work breakdown tasks for this project, they realize they need to start planning which data sources (either Unity Catalog volume or Delta table) they could choose for this application. They have collected several candidate data sources for consideration:

call_rep_history: a Delta table with primary keys representative_id, call_id. This table is maintained to calculate representatives’ call resolution from fields call_duration and call start_time.

transcript Volume: a Unity Catalog Volume of all recordings as a *.wav files, but also a text transcript as *.txt files.

call_cust_history: a Delta table with primary keys customer_id, cal1_id. This table is maintained to calculate how much internal customers use the HelpDesk to make sure that the charge back model is consistent with actual service use.

call_detail: a Delta table that includes a snapshot of all call details updated hourly. It includes root_cause and resolution fields, but those fields may be empty for calls that are still active.

maintenance_schedule – a Delta table that includes a listing of both HelpDesk application outages as well as planned upcoming maintenance downtimes.

They need sources that could add context to best identify ticket root cause and resolution.

Which TWO sources do that? (Choose two.)

Options:

call_cust_history

maintenance_schedule

call_rep_history

call_detail

transcript Volume

Answer:

D, E

Explanation:

In the context of developing a chatbot for a company's internal HelpDesk Call Center, the key is to select data sources that provide the most contextual and detailed information about the issues being addressed. This includes identifying the root cause and suggesting resolutions. The two most appropriate sources from the list are:

Call Detail (Option D):

Contents: This Delta table includes a snapshot of all call details updated hourly, featuring essential fields like root_cause and resolution.

Relevance: The inclusion of root_cause and resolution fields makes this source particularly valuable, as it directly contains the information necessary to understand and resolve the issues discussed in the calls. Even if some records are incomplete, the data provided is crucial for a chatbot aimed at speeding up resolution identification.

Transcript Volume (Option E):

Contents: This Unity Catalog Volume contains recordings in .wav format and text transcripts in .txt files.

Relevance: The text transcripts of call recordings can provide in-depth context that the chatbot can analyze to understand the nuances of each issue. The chatbot can use natural language processing techniques to extract themes, identify problems, and suggest resolutions based on previous similar interactions documented in the transcripts.

Why Other Options Are Less Suitable:

A (Call Cust History): While it provides insights into customer interactions with the HelpDesk, it focuses more on the usage metrics rather than the content of the calls or the issues discussed.

B (Maintenance Schedule): This data is useful for understanding when services may not be available but does not contribute directly to resolving user issues or identifying root causes.

C (Call Rep History): Though it offers data on call durations and start times, which could help in assessing performance, it lacks direct information on the issues being resolved.

Therefore, Call Detail and Transcript Volume are the most relevant data sources for a chatbot designed to assist with identifying and resolving issues in a HelpDesk Call Center setting, as they provide direct and contextual information related to customer issues.

Question 9

A Generative AI Engineer is using LangGraph to define multiple tools in a single agentic application. They want to enable the main orchestrator LLM to decide on its own which tools are most appropriate to call for a given prompt. To do this, they must determine the general flow of the code. Which sequence will do this?

Options:

1. Define or import the tools 2. Add tools and LLM to the agent 3. Create the ReAct agent

1. Define or import the tools 2. Define the agent 3. Initialize the agent with ReAct, the LLM, and the tools

1. Define the tools 2. Load each tool into a separate agent 3. Instruct the LLM to use ReAct to call the appropriate agent

1. Define the tools inside the agents 2. Load the agents into the LLM 3. Instruct the LLM to use COT reasoning to determine the appropriate agent

Question 10

What is the most suitable library for building a multi-step LLM-based workflow?

Options:

Pandas

TensorFlow

PySpark

LangChain

Question 11

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values.

Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)

Options:

Change embedding models and compare performance.

Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.

Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters.

Choose the strategy that gives the best performance metric.

Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.

Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.

Answer:

C, E

Explanation:

To optimize a chunking strategy for a Retrieval-Augmented Generation (RAG) application, the Generative AI Engineer needs a structured approach to evaluating the chunking strategy, ensuring that the chosen configuration retrieves the most relevant information and leads to accurate and coherent LLM responses. Here's whyCandEare the correct strategies:

Strategy C: Evaluation Metrics (Recall, NDCG)

Define an evaluation metric: Common evaluation metrics such as recall, precision, or NDCG (Normalized Discounted Cumulative Gain) measure how well the retrieved chunks match the user's query and the expected response.

Recallmeasures the proportion of relevant information retrieved.

NDCGis often used when you want to account for both the relevance of retrieved chunks and the ranking or order in which they are retrieved.

Experiment with chunking strategies: Adjusting chunking strategies based on text structure (e.g., splitting by paragraph, chapter, or a fixed number of tokens) allows the engineer to experiment with various ways of slicing the text. Some chunks may better align with the user's query than others.

Evaluate performance: By using recall or NDCG, the engineer can methodically test various chunking strategies to identify which one yields the highest performance. This ensures that the chunking method provides the most relevant information when embedding and retrieving data from the vector store.

Strategy E: LLM-as-a-Judge Metric

Use the LLM as an evaluator: After retrieving chunks, the LLM can be used to evaluate the quality of answers based on the chunks provided. This could be framed as a "judge" function, where the LLM compares how well a given chunk answers previous user queries.

Optimize based on the LLM's judgment: By having the LLM assess previous answers and rate their relevance and accuracy, the engineer can collect feedback on how well different chunking configurations perform in real-world scenarios.

This metric could be a qualitative judgment on how closely the retrieved information matches the user's intent.

Tune chunking parameters: Based on the LLM's judgment, the engineer can adjust the chunk size or structure to better align with the LLM's responses, optimizing retrieval for future queries.

By combining these two approaches, the engineer ensures that the chunking strategy is systematically evaluated using both quantitative (recall/NDCG) and qualitative (LLM judgment) methods. This balanced optimization process results in improved retrieval relevance and, consequently, better response generation by the LLM.

Question 12

A Generative Al Engineer is building an LLM-based application that has an

important transcription (speech-to-text) task. Speed is essential for the success of the application

Which open Generative Al models should be used?

Options:

L!ama-2-70b-chat-hf

MPT-30B-lnstruct

DBRX

whisper-large-v3 (1.6B)

Answer:

Explanation:

The task requires an open generative AI model for a transcription (speech-to-text) task where speed is essential. Let’s assess the options based on their suitability for transcription and performance characteristics, referencing Databricks’ approach to model selection.

Option A: Llama-2-70b-chat-hf

Llama-2 is a text-based LLM optimized for chat and text generation, not speech-to-text. It lacks transcription capabilities.

Databricks Reference:"Llama models are designed for natural language generation, not audio processing"("Databricks Model Catalog").

Option B: MPT-30B-Instruct

MPT-30B is another text-based LLM focused on instruction-following and text generation, not transcription. It’s irrelevant for speech-to-text tasks.

Databricks Reference: No specific mention, but MPT is categorized under text LLMs in Databricks’ ecosystem, not audio models.

Option C: DBRX

DBRX, developed by Databricks, is a powerful text-based LLM for general-purpose generation. It doesn’t natively support speech-to-text and isn’t optimized for transcription.

Databricks Reference:"DBRX excels at text generation and reasoning tasks"("Introducing DBRX," 2023)—no mention of audio capabilities.

Option D: whisper-large-v3 (1.6B)

Whisper, developed by OpenAI, is an open-source model specifically designed for speech-to-text transcription. The “large-v3” variant (1.6 billion parameters) balances accuracy and efficiency, with optimizations for speed via quantization or deployment on GPUs—key for the application’s requirements.

Databricks Reference:"For audio transcription, models like Whisper are recommended for their speed and accuracy"("Generative AI Cookbook," 2023). Databricks supports Whisper integration in its MLflow or Lakehouse workflows.

Conclusion: OnlyD. whisper-large-v3is a speech-to-text model, making it the sole suitable choice. Its design prioritizes transcription, and its efficiency (e.g., via optimized inference) meets the speed requirement, aligning with Databricks’ model deployment best practices.

Question 13

A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible.

Which combination of chaining components and configuration meets these requirements?

Options:

For the application a prompt, a retriever, and an LLM are required. The retriever output is inserted into the prompt which is given to the LLM to generate answers.

The LLM needs to be frequently with the new documents in order to provide most up-to-date answers.

For the question-answering application, prompt engineering and an LLM are required to generate answers.

For the application a prompt, an agent and a fine-tuned LLM are required. The agent is used by the LLM to retrieve relevant content that is inserted into the prompt which is given to the LLM to generate answers.

Question 14

A Generative Al Engineer is setting up a Databricks Vector Search that will lookup news articles by topic within 10 days of the date specified An example query might be "Tell me about monster truck news around January 5th 1992". They want to do this with the least amount of effort.

How can they set up their Vector Search index to support this use case?

Options:

Split articles by 10 day blocks and return the block closest to the query.

Include metadata columns for article date and topic to support metadata filtering.

pass the query directly to the vector search index and return the best articles.

Create separate indexes by topic and add a classifier model to appropriately pick the best index.

Answer:

Explanation:

The task is to set up a Databricks Vector Search index for news articles, supporting queries like “monster truck news around January 5th, 1992,” with minimal effort. The index must filter by topic and a 10-day date range. Let’s evaluate the options.

Option A: Split articles by 10-day blocks and return the block closest to the query

Pre-splitting articles into 10-day blocks requires significant preprocessing and index management (e.g., one index per block). It’s effort-intensive and inflexible for dynamic date ranges.

Databricks Reference:"Static partitioning increases setup complexity; metadata filtering is preferred"("Databricks Vector Search Documentation").

Option B: Include metadata columns for article date and topic to support metadata filtering

Adding date and topic as metadata in the Vector Search index allows dynamic filtering (e.g., date ± 5 days, topic = “monster truck”) at query time. This leverages Databricks’ built-in metadata filtering, minimizing setup effort.

Databricks Reference:"Vector Search supports metadata filtering on columns like date or category for precise retrieval with minimal preprocessing"("Vector Search Guide," 2023).

Option C: Pass the query directly to the vector search index and return the best articles

Passing the full query (e.g., “Tell me about monster truck news around January 5th, 1992”) to Vector Search relies solely on embeddings, ignoring structured filtering for date and topic. This risks inaccurate results without explicit range logic.

Databricks Reference:"Pure vector similarity may not handle temporal or categorical constraints effectively"("Building LLM Applications with Databricks").

Option D: Create separate indexes by topic and add a classifier model to appropriately pick the best index

Separate indexes per topic plus a classifier model adds significant complexity (index creation, model training, maintenance), far exceeding “least effort.” It’s overkill for this use case.

Databricks Reference:"Multiple indexes increase overhead; single-index with metadata is simpler"("Databricks Vector Search Documentation").

Conclusion: Option B is the simplest and most effective solution, using metadata filtering in a single Vector Search index to handle date ranges and topics, aligning with Databricks’ emphasis on efficient, low-effort setups.

Question 15

A Generative AI Engineer is building an LLM to generate article summaries in the form of a type of poem, such as a haiku, given the article content. However, the initial output from the LLM does not match the desired tone or style.

Which approach will NOT improve the LLM’s response to achieve the desired response?

Options:

Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style

Use a neutralizer to normalize the tone and style of the underlying documents

Include few-shot examples in the prompt to the LLM

Fine-tune the LLM on a dataset of desired tone and style

Question 16

A Generative AI Engineer received the following business requirements for an external chatbot.

The chatbot needs to know what types of questions the user asks and routes to appropriate models to answer the questions. For example, the user might ask about upcoming event details. Another user might ask about purchasing tickets for a particular event.

What is an ideal workflow for such a chatbot?

Options:

The chatbot should only look at previous event information

There should be two different chatbots handling different types of user queries.

The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform.

The chatbot should only process payments

Question 17

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:

Question # 17

What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)

Options:

Use a smaller embedding model to generate

Reduce the maximum output tokens of the new model

Decrease the chunk size of embedded documents

Reduce the number of records retrieved from the vector database

Retrain the response generating model using ALiBi

Question 18

A Generative AI Engineer has created a RAG application which can help employees retrieve answers from an internal knowledge base, such as Confluence pages or Google Drive. The prototype application is now working with some positive feedback from internal company testers. Now the Generative Al Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system.

How should the Generative AI Engineer evaluate the system?

Options:

Use cosine similarity score to comprehensively evaluate the quality of the final generated answers.

Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components.

Benchmark multiple LLMs with the same data and pick the best LLM for the job.

Use an LLM-as-a-judge to evaluate the quality of the final answers generated.

Question 19

A Generative Al Engineer is building a system that will answer questions on currently unfolding news topics. As such, it pulls information from a variety of sources including articles and social media posts. They are concerned about toxic posts on social media causing toxic outputs from their system.

Which guardrail will limit toxic outputs?

Options:

Use only approved social media and news accounts to prevent unexpected toxic data from getting to the LLM.

Implement rate limiting

Reduce the amount of context Items the system will Include in consideration for its response.

Log all LLM system responses and perform a batch toxicity analysis monthly.

Answer:

Explanation:

The system answers questions on unfolding news topics using articles and social media, with a concern about toxic outputs from toxic inputs. A guardrail must limit toxicity in the LLM’s responses. Let’s evaluate the options.

Option A: Use only approved social media and news accounts to prevent unexpected toxic data from getting to the LLM

Curating input sources (e.g., verified accounts) reduces exposure to toxic content at the data ingestion stage, directly limiting toxic outputs. This is a proactive guardrail aligned with data quality control.

Databricks Reference:"Control input data quality to mitigate unwanted LLM behavior, such as toxicity"("Building LLM Applications with Databricks," 2023).

Option B: Implement rate limiting

Rate limiting controls request frequency, not content quality. It prevents overload but doesn’t address toxicity in social media inputs or outputs.

Databricks Reference: Rate limiting is for performance, not safety:"Use rate limits to manage compute load"("Generative AI Cookbook").

Option C: Reduce the amount of context items the system will include in consideration for its response

Reducing context might limit exposure to some toxic items but risks losing relevant information, and it doesn’t specifically target toxicity. It’s an indirect, imprecise fix.

Databricks Reference: Context reduction is for efficiency, not safety:"Adjust context size based on performance needs"("Databricks Generative AI Engineer Guide").

Option D: Log all LLM system responses and perform a batch toxicity analysis monthly

Logging and analyzing responses is reactive, identifying toxicity after it occurs rather than preventing it. Monthly analysis doesn’t limit real-time toxic outputs.

Databricks Reference: Monitoring is for auditing, not prevention:"Log outputs for post-hoc analysis, but use input filters for safety"("Building LLM-Powered Applications").

Conclusion: Option A is the most effective guardrail, proactively filtering toxic inputs from unverified sources, which aligns with Databricks’ emphasis on data quality as a primary safety mechanism for LLM systems.

Question 20

Generative AI Engineer at an electronics company just deployed a RAG application for customers to ask questions about products that the company carries. However, they received feedback that the RAG response often returns information about an irrelevant product.

What can the engineer do to improve the relevance of the RAG’s response?

Options:

Assess the quality of the retrieved context

Implement caching for frequently asked questions

Use a different LLM to improve the generated response

Use a different semantic similarity search algorithm

Question 21

Which TWO chain components are required for building a basic LLM-enabled chat application that includes conversational capabilities, knowledge retrieval, and contextual memory?

Options:

(Q)

Vector Stores

Conversation Buffer Memory

External tools

Chat loaders

React Components

Answer:

B, C

Explanation:

Building a basic LLM-enabled chat application with conversational capabilities, knowledge retrieval, and contextual memory requires specific components that work together to process queries, maintain context, and retrieve relevant information. Databricks’ Generative AI Engineer documentation outlines key components for such systems, particularly in the context of frameworks like LangChain or Databricks’ MosaicML integrations. Let’s evaluate the required components:

Understanding the Requirements:

Conversational capabilities: The app must generate natural, coherent responses.

Knowledge retrieval: It must access external or domain-specific knowledge.

Contextual memory: It must remember prior interactions in the conversation.

Databricks Reference:"A typical LLM chat application includes a memory component to track conversation history and a retrieval mechanism to incorporate external knowledge"("Databricks Generative AI Cookbook," 2023).

Evaluating the Options:

A. (Q): This appears incomplete or unclear (possibly a typo). Without further context, it’s not a valid component.

B. Vector Stores: These store embeddings of documents or knowledge bases, enabling semantic search and retrieval of relevant information for the LLM. This is critical for knowledge retrieval in a chat application.

Databricks Reference:"Vector stores, such as those integrated with Databricks’ Lakehouse, enable efficient retrieval of contextual data for LLMs"("Building LLM Applications with Databricks").

C. Conversation Buffer Memory: This component stores the conversation history, allowing the LLM to maintain context across multiple turns. It’s essential for contextual memory.

Databricks Reference:"Conversation Buffer Memory tracks prior user inputs and LLM outputs, ensuring context-aware responses"("Generative AI Engineer Guide").

D. External tools: These (e.g., APIs or calculators) enhance functionality but aren’t required for abasicchat app with the specified capabilities.

E. Chat loaders: These might refer to data loaders for chat logs, but they’re not a core chain component for conversational functionality or memory.

F. React Components: These relate to front-end UI development, not the LLM chain’s backend functionality.

Selecting the Two Required Components:

Forknowledge retrieval, Vector Stores (B) are necessary to fetch relevant external data, a cornerstone of Databricks’ RAG-based chat systems.

Forcontextual memory, Conversation Buffer Memory (C) is required to maintain conversation history, ensuring coherent and context-aware responses.

While an LLM itself is implied as the core generator, the question asks for chain components beyond the model, making B and C the minimal yet sufficient pair for a basic application.

Conclusion: The two required chain components areB. Vector StoresandC. Conversation Buffer Memory, as they directly address knowledge retrieval and contextual memory, respectively, aligning with Databricks’ documented best practices for LLM-enabled chat applications.

Load More Databricks-Generative-AI-Engineer-Associate Questions

Page: 1 / 7
Total 73 questions

Get Databricks-Generative-AI-Engineer-Associate Full Access Download Databricks-Generative-AI-Engineer-Associate PDF

Pre-Summer Sale Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Activedumpsnet Logo

Activedumpsnet Navigation

Activedumpsnet Slider

Databricks Databricks-Generative-AI-Engineer-Associate Databricks Certified Generative AI Engineer Associate Exam Practice Test

Databricks Certified Generative AI Engineer Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Copyright © 2014-2026 Activedumpsnet. All Rights Reserved