Monitoring AI Quality

How to monitor the quality of AI outputs

What You’ll Learn

How to observe the AI to understand why it generated the answers it did

What was the actual prompt and response?

A good place to start when trying to understand the AI’s responses is to look at the actual prompt and response from the LLM that produced the cell.

You can fetch the request and response as follows

Get the log for a given cell
From the cell get the traceId of the AI generation request


CELLID=01J7KQPBYCT9VM2KFBY48JC7J0
export TRACEID=$(curl -s -X POST http://localhost:8877/api/foyle.logs.LogsService/GetBlockLog -H "Content-Type: application/json" -d "{\"id\": \"${CELLID}\"}" | jq -r .blockLog.genTraceId)
echo TRACEID=$TRACEID

Given the traceId, you can fetch the request and response from the LOGS


curl -s -o /tmp/response.json -X POST http://localhost:8877/api/foyle.logs.LogsService/GetLLMLogs -H "Content-Type: application/json" -d "{\"traceId\": \"${TRACEID}\"}"
CODE="$?"
if [ $CODE -ne 0 ]; then
  echo "Error occurred while fetching LLM logs"
  exit $CODE
fi

You can view an HTML rendering of the prompt and response
If you disable interactive mode for the cell then vscode will render the HTML respnse inline
Note There appears to be a bug right now in the HTML rendering causing a bunch of newlines to be introduced relative to what’s in the actual markdown in the JSON request


jq -r '.requestHtml' /tmp/response.json > /tmp/request.html
cat /tmp/request.html

To view the response


jq -r '.responseHtml' /tmp/response.json > /tmp/response.html
cat /tmp/response.html

To view the JSON versions of the actual requests and response


jq -r '.requestJson' /tmp/response.json | jq .


jq -r '.responseJson' /tmp/response.json | jq '.messages[0].content[0].text'

You can print the raw markdown of the prompt as follows


echo $(jq -r '.requestJson' /tmp/response.json | jq '.messages[0].content[0].text')


jq -r '.responseJson' /tmp/response.json | jq .

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified September 26, 2024: Learning is working. Fix learning in evaluator. (d87bb18)