Sabotage Notes: Deliberate Issues Added to AIBugBench¶
Overview¶
Catalog of deliberate, solvable edge cases reflecting real production pitfalls. These tests probe how models handle inconsistent, legacy, or problematic inputs without changing core requirements.
Design Principles:
- Every hazard has a realistic origin story from production systems
- All issues remain solvable through defensive programming practices
- Success paths are preserved to ensure benchmark remains achievable
- Edge cases increase difficulty without changing core requirements
File-by-File Hazard Catalog¶
test_data/process_records.py¶
Hazards Introduced (compact):
- Builtin shadowing (
list = []): confusing and unsafe → remove/rename. - Mixed imports (
dt+datetime): inconsistent references → standardize. - Exception masking (
except Exception: pass): silent failures/NameError risk → catch specific errors. - Unsafe YAML loading (
yaml.load): security risk → useyaml.safe_load. - Naive datetime (mixed formats): TZ/parse issues → use aware datetimes and validate formats.
- Misleading params (hardcoded values): false flexibility → rename or make configurable.
test_data/user_data.json¶
Hazards Introduced (compact):
- JSON syntax: trailing commas, JS comments, duplicate keys → preprocess then parse.
- Types: "NaN" strings, leading zeros, mixed numeric types → normalize/coerce.
- Unicode: zero‑width, combining diacritics, BOM → normalize Unicode and strip BOM.
- Structure: contact can be string/object/array; missing fields; variable depth → defensive access and schema checks.
Success Path Preservation: Users 101, 105, and one additional clean record maintain perfect structure for successful processing:
- Consistent data types
- All required fields present
- Standard JSON formatting
- No Unicode complications
test_data/config.yaml¶
Hazards Introduced:
- Multi-Document YAML Structure
- What: Two YAML documents with conflicting values
- Real-world Context: Configuration file evolution, different environment configs
- Impact: Parser confusion, value precedence issues
-
Solution: Multi-document aware parsing with merge strategies
-
Mixed Boolean Representations
- Document 1: Native YAML booleans (
yes,true,false) - Document 2: String booleans (
"true","false","1","off") - Real-world Context: Different systems writing to same config file
- Impact: Boolean evaluation inconsistencies
-
Solution: Normalize boolean values during parsing
-
Type System Chaos
- Numbers as Strings:
"021","05","60.00" - Leading Zeros: String numbers with leading zeros
- Decimal Strings: Integers represented as decimal strings
- Real-world Context: Form inputs, environment variable substitution
- Impact: Numeric comparison failures, type conversion issues
-
Solution: Type coercion with validation
-
Path Format Mixing
- Environment Variables:
$HOME/data,~/logs - Platform Mixing: Windows (
C:\) and Unix (/tmp) paths - Relative vs Absolute: Mixed path resolution strategies
- Real-world Context: Cross-platform deployment, different developers
- Impact: Path resolution failures, file not found errors
-
Solution: Platform-aware path handling, environment variable expansion
-
Indentation Chaos
- Mixed Tabs/Spaces: Within single document
- Inconsistent Levels: Varying indentation amounts
- Real-world Context: Different editors, copy-paste operations
- Impact: YAML parsing errors, structure misinterpretation
-
Solution: Indentation normalization, YAML validation
-
YAML Advanced Features
- Anchors and Aliases:
&default_pathsand*default_paths - Multi-line Strings: Various YAML string representations
- Real-world Context: DRY principle application, complex configurations
- Impact: Reference resolution issues, parser compatibility
- Solution: YAML feature-aware parsing
Success Path Preservation: Document 2 contains working path: "./user_data.json" pointing to actual file Essential configuration values remain accessible through proper multi-document parsing
Solution Pattern Library (Comments added to explain 'expected' behavior)¶
Pattern 1: Safe File Reading with Encoding Handling¶
def safe_json_load(file_path):
with open(file_path, encoding='utf-8-sig') as f:
content = f.read()
# Strip BOM if present
content = content.strip('\ufeff')
# Remove JavaScript-style comments (simple approach)
content = re.sub(r'//.*$', '', content, flags=re.MULTILINE)
content = re.sub(r'/\*.*?\*/', '', content, flags=re.DOTALL)
# Remove trailing commas (simplified)
content = re.sub(r',(\s*[}\]])', r'\1', content)
return json.loads(content)
Pattern 2: Type Normalization¶
def normalize_age(age_value):
if age_value is None:
return 0
if isinstance(age_value, (int, float)):
return int(age_value)
if isinstance(age_value, str):
if age_value.lower() in ('nan', 'null', ''):
return 0
# Handle leading zeros
try:
return int(float(age_value)) # Handle "21.0" -> 21
except ValueError:
return 0
return 0
Pattern 3: Multi-Document YAML Handling¶
def load_config_safe(config_path):
with open(config_path) as f:
documents = list(yaml.safe_load_all(f))
# Merge documents with precedence (last wins)
config = {}
for doc in documents:
if doc: # Skip None documents
config.update(doc)
return config
Pattern 4: Unicode Normalization¶
import unicodedata
def normalize_text(text):
if not isinstance(text, str):
return text
# Remove zero-width characters
text = ''.join(char for char in text if unicodedata.category(char) != 'Cf')
# Normalize Unicode (NFC form)
text = unicodedata.normalize('NFC', text)
return text.strip()
Pattern 5: Defensive Field Access¶
def safe_get_nested(data, path, default=None):
"""Safely get nested dictionary values with dot notation"""
keys = path.split('.')
current = data
for key in keys:
if not isinstance(current, dict) or key not in current:
return default
current = current[key]
return current
# Usage: email = safe_get_nested(user, 'contact.email', '')
Testing and Validation¶
Manual Verification Steps¶
- File Parsing Tests:
# Test JSON with comments and trailing commas
import json
with open('user_data.json', encoding='utf-8-sig') as f:
raw = f.read()
# Should fail with standard json.loads()
# Should succeed with preprocessing
# Test multi-document YAML
import yaml
with open('config.yaml') as f:
docs = list(yaml.safe_load_all(f))
assert len(docs) == 2
- Unicode Handling Tests:
# Test zero-width space detection
name = "JaneDoe" # Contains U+200B
assert len(name) > len("JaneDoe") # Should detect extra character
# Test combining diacritics
muller1 = "Müller" # Precomposed
muller2 = "Mu\u0308ller" # Combining diacritic
assert muller1 != muller2 # Different representations
- Type Coercion Tests:
# Test string number handling
assert int("0004") == 4 # Leading zeros
assert float("21.00") == 21.0 # Decimal strings
# Test NaN handling
try:
int("NaN") # Should raise ValueError
except ValueError:
pass # Expected
Automated Testing Framework¶
def test_enhanced_benchmark():
"""Test that enhanced files maintain solvability"""
# Test 1: JSON parsing with preprocessing succeeds
users = load_users_safe('test_data/user_data.json')
assert len(users) >= 3 # At least 3 clean users
# Test 2: YAML multi-document parsing succeeds
config = load_config_safe('test_data/config.yaml')
assert 'validation_rules' in config
# Test 3: Core business logic executable
processor = SafeProcessor(config)
results = processor.process_all_records()
assert len(results) > 0 # Some records processed successfully
# Test 4: Score achievability
score = validate_solution_quality(results)
assert score >= 18 # Minimum achievable with good solution
Troubleshooting Guide¶
Common Issues and Solutions¶
Issue: JSON parsing fails with syntax error Cause: JavaScript comments or trailing commas Solution: Preprocess JSON to remove comments and trailing commas
Issue: YAML parsing returns unexpected values
Cause: Multi-document structure, only first document loaded Solution: Use yaml.safe_load_all() and merge documents appropriately
Issue: Unicode characters display incorrectly Cause: Zero-width spaces, combining diacritics, encoding issues Solution: Unicode normalization and BOM stripping
Issue: Type conversion errors on numeric strings Cause: Leading zeros, decimal representations, "NaN" values Solution: Robust type coercion with fallback values
Issue: File path resolution failures Cause: Environment variables, platform-specific paths Solution: Path expansion and platform-aware handling
Performance Considerations¶
Impact: Enhanced files may require additional processing Mitigation:
- Cache preprocessing results
- Use efficient regex patterns
- Minimize file I/O operations
- Consider lazy loading for large datasets
Benchmarking: Enhanced benchmark ~10-15% slower due to:
- JSON preprocessing overhead
- Multi-document YAML parsing
- Unicode normalization
- Type coercion operations
Real-World Context and Educational Value¶
Why These Issues Exist¶
Legacy System Integration: Different systems export data in incompatible formats Multi-Developer Codebases: Inconsistent coding standards accumulate over time
Copy-Paste Programming: Code fragments copied without understanding context Evolving Requirements: Systems modified without updating related components Platform Differences: Windows/Unix path handling, encoding variations Internationalization: Unicode complexity, character encoding issues
Educational Outcomes¶
Defensive Programming: Students learn to validate and sanitize inputs Error Handling: Proper exception handling becomes critical for success Type Safety: Dynamic typing pitfalls become apparent Configuration Management: Complex config handling strategies required
Unicode Awareness: International character set considerations Security Consciousness: Unsafe YAML loading demonstrates security implications
Professional Relevance¶
These enhancements mirror real-world scenarios QA engineers encounter:
- Data migration projects with inconsistent source formats
- Legacy code maintenance with accumulated technical debt
- Integration testing across different systems and platforms
- International deployment with Unicode complexity
- Security auditing of configuration and data loading practices
Conclusion¶
The AIBugBench difficulty enhancement introduces realistic complexity that mirrors production QA scenarios while maintaining clear solution paths for thorough implementations. Each hazard serves educational purposes and tests practical skills essential for professional software quality assurance.
Success requires:
- Careful input validation and sanitization
- Robust error handling and recovery
- Type safety awareness and normalization
- Unicode and encoding best practices
- Configuration complexity management
- Security-conscious programming practices
These skills directly transfer to real-world QA engineering challenges, making the enhanced benchmark a more valuable assessment tool for AI model capabilities in practical software quality scenarios.