Scrubber Component
Overview
The Scrubber component in Gitpod is a Go library that provides functionality for removing or masking sensitive information from data. It's designed to protect personally identifiable information (PII) and other sensitive data from being exposed in logs, error messages, and other outputs. The component offers various methods for scrubbing different types of data structures, including strings, key-value pairs, JSON, and Go structs.
Purpose
The primary purposes of the Scrubber component are:
Remove or mask personally identifiable information (PII) from data
Protect sensitive information such as passwords, tokens, and secrets
Provide consistent data sanitization across the Gitpod platform
Support various data formats and structures
Enable customizable scrubbing rules
Reduce the risk of sensitive data exposure
Comply with privacy regulations and best practices
Facilitate safe logging and error reporting
Architecture
The Scrubber component is structured as a Go library with several key parts:
Core Scrubber Interface: Defines the methods for scrubbing different types of data
Scrubber Implementation: Provides the actual scrubbing functionality
Sanitization Functions: Implements different sanitization strategies (redaction, hashing)
Configuration: Defines what fields and patterns should be scrubbed
Struct Walking: Uses reflection to traverse and scrub complex data structures
The component is designed to be used by other Gitpod components that need to sanitize data before logging, storing, or transmitting it.
Key Features
Scrubbing Methods
The Scrubber interface provides several methods for scrubbing different types of data:
Value: Scrubs a single string value using heuristics to detect sensitive data
KeyValue: Scrubs a key-value pair, using the key as a hint for how to sanitize the value
JSON: Scrubs a JSON structure, handling nested objects and arrays
Struct: Scrubs a Go struct in-place, respecting struct tags for customization
DeepCopyStruct: Creates a scrubbed deep copy of a Go struct
Sanitization Strategies
The component implements different sanitization strategies:
Redaction: Replaces sensitive values with
[redacted]
or[redacted:keyname]
Hashing: Replaces sensitive values with an MD5 hash (
[redacted:md5:hash:keyname]
)URL Path Hashing: Specially handles URLs by preserving the structure but hashing path segments
Configuration
The scrubber is configured with several lists and patterns:
RedactedFieldNames: Field names whose values should be completely redacted
HashedFieldNames: Field names whose values should be hashed
HashedURLPathsFieldNames: Field names containing URLs whose paths should be hashed
HashedValues: Regular expressions that, when matched, cause values to be hashed
RedactedValues: Regular expressions that, when matched, cause values to be redacted
Struct Tag Support
When scrubbing structs, the component respects the scrub
struct tag:
scrub:"ignore"
: Skip scrubbing this fieldscrub:"hash"
: Hash this field's valuescrub:"redact"
: Redact this field's value
Trusted Values
The component supports a TrustedValue
interface that allows marking specific values to be exempted from scrubbing:
Usage Patterns
Basic Value Scrubbing
Key-Value Scrubbing
JSON Scrubbing
Struct Scrubbing
Deep Copy Struct Scrubbing
Integration Points
The Scrubber component integrates with:
Logging Systems: To sanitize log messages
Error Handling: To sanitize error messages
API Responses: To sanitize sensitive data in responses
Monitoring Systems: To sanitize metrics and traces
Other Gitpod Components: To provide consistent data sanitization
Dependencies
Internal Dependencies
None specified in the component's build configuration.
External Dependencies
github.com/hashicorp/golang-lru
: For caching sanitization decisionsgithub.com/mitchellh/reflectwalk
: For traversing complex data structures
Security Considerations
The component implements several security measures:
Default Deny: Fields are scrubbed by default if they match sensitive patterns
Multiple Strategies: Different sanitization strategies for different types of data
Caching: Caches sanitization decisions for performance
Customization: Allows customization of scrubbing rules
Trusted Values: Supports marking values as trusted to exempt them from scrubbing
Related Components
Common-Go: Uses the Scrubber for logging
Server: Uses the Scrubber for API request/response sanitization
Workspace Services: Use the Scrubber to protect workspace data
Monitoring Components: Use the Scrubber to sanitize metrics and traces