|
| 1 | +# Implementation Summary: PR Review Suggestions for DataFrame Formatter |
| 2 | + |
| 3 | +## Overview |
| 4 | +Implemented all three suggestions from the PR_REVIEW.md file for improving the DataFrame formatter's handling of the `repr_rows` deprecation and validation logic. |
| 5 | + |
| 6 | +## Changes Made |
| 7 | + |
| 8 | +### 1. ✅ Added Deprecation Warning for repr_rows Parameter |
| 9 | +**File**: `python/datafusion/dataframe_formatter.py` |
| 10 | +**Lines**: 1, 115-121 |
| 11 | + |
| 12 | +**Changes**: |
| 13 | +- Added `import warnings` at the top of the file |
| 14 | +- Modified validation function to emit `DeprecationWarning` when `repr_rows` parameter is used |
| 15 | +- Warning clearly states: "repr_rows parameter is deprecated, use max_rows instead" |
| 16 | +- Warning uses `stacklevel=4` to point to user's code, not internal validation |
| 17 | + |
| 18 | +**Benefits**: |
| 19 | +- Users are now informed when using deprecated parameter |
| 20 | +- Graceful migration path from `repr_rows` to `max_rows` |
| 21 | +- Clear guidance in warning message |
| 22 | + |
| 23 | +### 2. ✅ Extracted Validation Logic to Helper Function |
| 24 | +**File**: `python/datafusion/dataframe_formatter.py` |
| 25 | +**Lines**: 79-145 |
| 26 | + |
| 27 | +**Changes**: |
| 28 | +- Created `_validate_formatter_parameters()` helper function |
| 29 | +- Moved all validation logic from `__init__` into the helper |
| 30 | +- Function signature clearly documents all parameters |
| 31 | +- Returns the resolved `max_rows` value |
| 32 | + |
| 33 | +**Benefits**: |
| 34 | +- Improves testability (validation can be tested independently) |
| 35 | +- Reduces `__init__` method complexity |
| 36 | +- Makes validation logic reusable and composable |
| 37 | +- Easier to maintain and modify validation rules |
| 38 | + |
| 39 | +### 3. ✅ Converted max_rows/repr_rows to Properties with Deprecation |
| 40 | +**File**: `python/datafusion/dataframe_formatter.py` |
| 41 | +**Lines**: 318-377 |
| 42 | + |
| 43 | +**Changes**: |
| 44 | +- Changed `max_rows` and `repr_rows` from simple attributes to properties |
| 45 | +- `max_rows` property: getter and setter for maximum rows value |
| 46 | +- `repr_rows` property: deprecated property that wraps `max_rows` |
| 47 | + - Getter returns `_max_rows` |
| 48 | + - Setter emits deprecation warning and updates `_max_rows` |
| 49 | +- Added docstrings with deprecation notices |
| 50 | +- Used sphinx `.. deprecated::` directive for proper documentation |
| 51 | + |
| 52 | +**Code Example**: |
| 53 | +```python |
| 54 | +@property |
| 55 | +def max_rows(self) -> int: |
| 56 | + """Get the maximum number of rows to display.""" |
| 57 | + return self._max_rows |
| 58 | + |
| 59 | +@repr_rows.setter |
| 60 | +def repr_rows(self, value: int) -> None: |
| 61 | + """Set the maximum number of rows using deprecated name.""" |
| 62 | + warnings.warn( |
| 63 | + "repr_rows is deprecated, use max_rows instead", |
| 64 | + DeprecationWarning, |
| 65 | + stacklevel=2, |
| 66 | + ) |
| 67 | + self._max_rows = value |
| 68 | +``` |
| 69 | + |
| 70 | +**Benefits**: |
| 71 | +- Direct attribute access to `formatter.repr_rows` now triggers deprecation warning |
| 72 | +- Backward compatible (existing code still works) |
| 73 | +- Clear migration path for users |
| 74 | +- Properties ensure consistent behavior |
| 75 | + |
| 76 | +### 4. ✅ Added Test for Backward Compatibility |
| 77 | +**File**: `python/tests/test_dataframe.py` |
| 78 | +**Lines**: 1513-1531 |
| 79 | + |
| 80 | +**Changes**: |
| 81 | +- Created `test_repr_rows_backward_compatibility()` test function |
| 82 | +- Tests three scenarios: |
| 83 | + 1. Using `repr_rows` parameter works (with deprecation warning) |
| 84 | + 2. Specifying both `repr_rows` and `max_rows` raises ValueError |
| 85 | + 3. Setting `repr_rows` attribute via property triggers warning |
| 86 | + |
| 87 | +**Test Coverage**: |
| 88 | +```python |
| 89 | +def test_repr_rows_backward_compatibility(clean_formatter_state): |
| 90 | + # Scenario 1: Parameter usage with warning |
| 91 | + with pytest.warns(DeprecationWarning, match="repr_rows parameter is deprecated"): |
| 92 | + formatter = DataFrameHtmlFormatter(repr_rows=15, min_rows_display=10) |
| 93 | + |
| 94 | + # Scenario 2: Conflicting parameters rejected |
| 95 | + with pytest.raises(ValueError, match="Cannot specify both repr_rows and max_rows"): |
| 96 | + DataFrameHtmlFormatter(repr_rows=5, max_rows=10) |
| 97 | + |
| 98 | + # Scenario 3: Property setter warns |
| 99 | + with pytest.warns(DeprecationWarning, match="repr_rows is deprecated"): |
| 100 | + formatter2.repr_rows = 7 |
| 101 | +``` |
| 102 | + |
| 103 | +**Benefits**: |
| 104 | +- Ensures backward compatibility is tested |
| 105 | +- Validates deprecation warnings are emitted |
| 106 | +- Catches conflicts between old and new APIs |
| 107 | + |
| 108 | +## Technical Details |
| 109 | + |
| 110 | +### Parameter Resolution Logic |
| 111 | +The validation function now handles the following scenarios: |
| 112 | + |
| 113 | +1. **Only `max_rows` provided**: Uses provided value |
| 114 | +2. **Only `repr_rows` provided**: Uses value, emits deprecation warning |
| 115 | +3. **Both provided with same value**: Uses value, emits deprecation warning (allowed) |
| 116 | +4. **Both provided with different values**: Raises ValueError with clear message |
| 117 | +5. **Neither provided**: Uses default value (10) |
| 118 | + |
| 119 | +### Backward Compatibility |
| 120 | +- ✅ Old code using `repr_rows` parameter still works |
| 121 | +- ✅ Old code accessing `formatter.repr_rows` attribute still works |
| 122 | +- ✅ Deprecation warnings guide migration |
| 123 | +- ✅ No breaking changes to public API |
| 124 | + |
| 125 | +## Testing Results |
| 126 | + |
| 127 | +All formatter-related tests pass: |
| 128 | +``` |
| 129 | +✅ test_html_formatter_cell_dimension |
| 130 | +✅ test_html_formatter_custom_style_provider |
| 131 | +✅ test_html_formatter_type_formatters |
| 132 | +✅ test_html_formatter_custom_cell_builder |
| 133 | +✅ test_html_formatter_custom_header_builder |
| 134 | +✅ test_html_formatter_complex_customization |
| 135 | +✅ test_html_formatter_memory |
| 136 | +✅ test_html_formatter_max_rows |
| 137 | +✅ test_html_formatter_validation |
| 138 | +✅ test_configure_formatter |
| 139 | +✅ test_configure_formatter_invalid_params |
| 140 | +✅ test_html_formatter_shared_styles |
| 141 | +✅ test_html_formatter_no_shared_styles |
| 142 | +✅ test_html_formatter_manual_format_html |
| 143 | +✅ test_repr_rows_backward_compatibility (NEW) |
| 144 | +``` |
| 145 | + |
| 146 | +## Future Recommendations |
| 147 | + |
| 148 | +While not blocking, consider these follow-up improvements: |
| 149 | + |
| 150 | +1. **Documentation**: Update CHANGELOG to document deprecation timeline |
| 151 | +2. **Python Version**: Consider removing `repr_rows` in a future major version |
| 152 | +3. **Type Hints**: Consider using `@typing.deprecated()` (Python 3.13+) if available |
| 153 | +4. **Rust FFI**: Apply similar improvements to Rust-side parameter handling |
| 154 | + |
| 155 | +## Files Modified |
| 156 | + |
| 157 | +| File | Lines | Changes | |
| 158 | +|------|-------|---------| |
| 159 | +| `python/datafusion/dataframe_formatter.py` | Multiple | Import warnings, extract validation, add properties | |
| 160 | +| `python/tests/test_dataframe.py` | 1513-1531 | New test for backward compatibility | |
| 161 | + |
| 162 | +## Verification |
| 163 | + |
| 164 | +✅ All existing tests pass |
| 165 | +✅ New test for backward compatibility passes |
| 166 | +✅ Code compiles without syntax errors |
| 167 | +✅ Deprecation warnings are properly emitted |
| 168 | +✅ No breaking changes to public API |
| 169 | +✅ Backward compatibility fully preserved |
| 170 | + |
0 commit comments