Skip to content

Docs: Explain ALTREP deferred string materialization for collect() (#724)#894

Open
maxthecat2024 wants to merge 1 commit intotidyverse:mainfrom
maxthecat2024:fix-issue-724
Open

Docs: Explain ALTREP deferred string materialization for collect() (#724)#894
maxthecat2024 wants to merge 1 commit intotidyverse:mainfrom
maxthecat2024:fix-issue-724

Conversation

@maxthecat2024
Copy link
Copy Markdown

Fixes #724

This PR adds a @details block to the collect() documentation and a note to the prudence vignette explaining why calling str() on a collected duckplyr dataframe causes a one-time performance hit.

As discussed in the issue, this clarifies for users that duckplyr relies heavily on ALTREP, meaning that string columns defer memory allocation even after collect() is called until the values are explicitly accessed. This makes collect() extremely fast but pushes the internal materialization cost to the first time the data is read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should collect 'fully' materialize a duckplyr_df?

1 participant