TN007 Shared Learning
Objective
- Design a solution for sharing learning between multiple users
TL;DR
The examples used for in context learning are currently stored locally. This makes it awkward to share a trained AI between team members. Users would have to manually swap the examples in order to benefit from each others learnings. There are a couple key design decisions
- Do we centralize traces and block log events or only the learned examples?
- Do we support loading examples from a single location or multiple locations?
To simplify management I think we should do the following
- Only move the learned examples to a central location
- Trace/block logs should still be stored locally for each user
- Add support for loading/saving shared examples to multiple locations
Traces and block log events are currently stored in Pebble. Pebble doesn’t have a good story for using shared storage cockroachdb/pebble#3177. We also don’t have an immediate need to move the traces and block logs to a central location.
Treating shared storage location as a backup location means Foyle can still operate fine if the shared storage location is inaccessible.
Proposal
Users should be able to specify
- Multiple additional locations to load examples from
- A backup location to save learned examples to
LearnerConfig Changes
We can update LearnerConfig to support these changes
type LearnerConfig struct {
// SharedExamples is a list of locations to load examples from
SharedExamples []string `json:"sharedExamples,omitempty"`
// BackupExamples is the location to save learned examples to
BackupExamples string `json:"backupExamples,omitempty"`
}
Loading SharedExamples
To support different storage systems (e.g. S3, GCS, local file system) we can define an interface for working with shared examples. We currently have the FileHelper interface
type FileHelper interface {
Exists(path string) (bool, error)
NewReader(path string) (io.Reader, error)
NewWriter(path string) (io.Writer, error)
}
Our current implementation of inMemoryDB requires a Glob function to find all the examples that should be loaded. We should a new interface to include the Glob.
type Globber interface {
Glob(pattern string) ([]string, error)
}
For object storage we can implement Glob by listing all the objects matching a prefix and then applying the glob; similar to this code for matching a regex
Triggering Loading of SharedExamples
For an initial implementation we can load shared examples when Foyle starts and perhaps periodically poll for new examples. I don’t think there’s any need to implement push based notifications for new examples.
Alternatives
Centralize Traces and Block Logs
Since Pebble doesn’t have a good story for using shared storage cockroachdb/pebble#3177 there’s no simple solution for moving the traces and block logs to a central location.
The main thing we lose by not centralizing the traces and block is the ability to do bulk analysis of traces and block events across all users. Since we don’t have an immediate use case for that there’s no reason to support it.
References
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.