Skip to content
This repository was archived by the owner on Feb 21, 2024. It is now read-only.

Commit 54dbeec

Browse files
committed
support null/non-null tests for non-BSI fields
There's a lot going on here. First, we were treating "the test is a Condition" as implying BSI, which it doesn't anymore. Second, the behavior of conditions was weird and BSI-specific. Third, we had to propagate these changes and features throughout a bunch of code, including both the core featurebase code and the DAX replacements/copies of it, plus the SQL3 layer. We refactor this so that tests for equality and inequality work for non-BSI fields, so now if you accidentally use `==` in a Row call on a non-BSI field, it still works; that's not specific to BSI fields anymore. We add a TrackExistence flag to fields, and propagate it through things like our protobuf code, etcetera, so that we can successfully create fields. Newly-created fields get this by default, because we add it unconditionally to them, but the paths that are being called with existing fields don't add it. So, when we "create" (really, just load the definition of) a field from something stored in the schema, we don't add TrackExistence to it, but any path to creating a new field should. A time quantum field with NoStandardView will *effectively* lack TrackExistence. For sets, mutexes, and time quantums with a standard view, anything that sets bits will also set a corresponding bit for the record in a new "existence" view. This allows us to distinguish between an empty set and a null, and also allows null checks to be constant-time. When clearing bits, we don't clear existence bits EXCEPT that if you clear a bit in a mutex, *and the bit actually existed*, we clear the existence bit. For sets and time quantums, clearing bits never clears the existence bit. Deleting records clears the existence bit. We also add code to the `batch` subpackage to generate suitable existence field bitmaps and import them. This logic correctly handles empty sets and nils. The `batch` package does not allow specification of anything equivalent to clearing a single bit from an existing record, so we don't have to deal with the mutex complexity in that case, which is good because it would be impossible. This requires a number of other subtle changes, such as allowing new fields to have more than one FieldOption specified for them. We also drop the handful of implementation bits relating to the "fullySorted" internal-use-only import flag, which existed only to support the JSON ingest API, which we've removed. The most dangerous part of this is that the mutex semantics are impossible to implement on top of our existing API, because they require us to know, not how *many* bits we cleared, but which *specific* bits we cleared. I've implemented this as a new Tx method, which is almost certainly going to be tech debt one day; if we some day drop the Import API, we should remove that. The testing for this is only currently covering the Set/Clear behavior of PQL, and the Import API. The batch tests haven't been written yet. Fields that don't have existence tracking enabled refuse to perform null/not-null tests. They should also report themselves as having no null values -- if a record exists, sets in it are considered empty rather than null. The SQL3 support requires a number of subtle modifications to both featurebase and some addon tooling. The essential thing is dropping the unconditional translation of nil slices to non-nil empty slices in translateResult, both in the executor and the orchestrator. We also modify the logic that handles generating results from Extract calls, to ensure that non-null sets get an empty slice created for them even if they never have any values assigned. The expected results for some tests are different now; we expect to get nil slices, rather than 0-length non-nil slices, for fields which were never written for a given record. Most tests were not changed. (In every case, if a test was failing, I actually checked the logic before changing expected results. This required a lot of tracking down of edge cases.) The batch package now rejects as an error attempts to clear single bits from mutex fields, because so far as I can tell it's simply impossible to have a roaring import that specifies the correct semantics there; you can't tell whether to clear an existence bit without access to the currently-set bits, which the batch API doesn't have. We already supported the special case of specifying a clear value of nil for clearing a mutex field; now that is the only allowed value for a mutex field to have in row.Clears. We change the logic for fixing up incoming view names (in two places) to stop assuming that any view in a time field other than "" that does not have viewStandard as a prefix is a partial time quantum name that should have "standard_" prepended to it. This allows us to submit bitmaps for "existence" to time quantum fields and not have them silently transformed into "standard_existence" because that's what we'd do with "202203". We drop the field ClearBits method, which was totally unused. We drop the sliceDifference function, which was used in a previous mutex implementation and hasn't been used in ages, and the test case for it, and the helper function used only by that test case.
1 parent 0dfaddf commit 54dbeec

38 files changed

Lines changed: 1763 additions & 518 deletions

api.go

Lines changed: 41 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -367,8 +367,8 @@ func (api *API) DeleteIndex(ctx context.Context, indexName string) error {
367367
}
368368

369369
// CreateField makes the named field in the named index with the given options.
370-
// This method currently only takes a single functional option, but that may be
371-
// changed in the future to support multiple options.
370+
//
371+
// The resulting field will always have TrackExistence set.
372372
func (api *API) CreateField(ctx context.Context, indexName string, fieldName string, opts ...FieldOption) (*Field, error) {
373373
span, _ := tracing.StartSpanFromContext(ctx, "API.CreateField")
374374
defer span.Finish()
@@ -381,6 +381,11 @@ func (api *API) CreateField(ctx context.Context, indexName string, fieldName str
381381
// authN/Z info
382382
requestUserID, _ := fbcontext.UserID(ctx) // requestUserID is "" if not in ctx
383383

384+
// newFieldOptions is also used in the path through the index creating
385+
// a field from an update from DAX, so it can't assume it can always
386+
// override this. But we're the call path for creating new fields, and
387+
// new fields should always have TrackExistence on.
388+
opts = append(opts, OptFieldTrackExistence())
384389
// Apply and validate functional options.
385390
fo, err := newFieldOptions(opts...)
386391
if err != nil {
@@ -500,10 +505,16 @@ func importWorker(importWork chan importJob) {
500505
// incorrectly). One way to address this would be to change the logic
501506
// overall so there weren't conflicts. For now, we just
502507
// rely on the field type to inform the intended view name.
503-
if viewName == "" {
508+
// contrast with cleanupView, which is similar but unfortunately not quite identical
509+
switch viewName {
510+
case "":
504511
viewName = viewStandard
505-
} else if j.field.Type() == FieldTypeTime {
506-
viewName = fmt.Sprintf("%s_%s", viewStandard, viewName)
512+
case viewStandard, viewExistence:
513+
// do nothing, these are fine
514+
default: // possibly a time view
515+
if j.field.Type() == FieldTypeTime && !strings.HasPrefix(viewName, viewStandard) {
516+
viewName = viewStandard + "_" + viewName
517+
}
507518
}
508519
if len(viewData) == 0 {
509520
return fmt.Errorf("no data to import for view: %s", viewName)
@@ -1316,7 +1327,6 @@ type ImportOptions struct {
13161327
Clear bool
13171328
IgnoreKeyCheck bool
13181329
Presorted bool
1319-
fullySorted bool // format-aware sorting, internal use only please.
13201330
suppressLog bool
13211331

13221332
// test Tx atomicity if > 0
@@ -1523,7 +1533,6 @@ func (api *API) ImportWithTx(ctx context.Context, qcx *Qcx, req *ImportRequest,
15231533
return errors.Wrap(err, "validating api method")
15241534
}
15251535

1526-
api.server.logger.Debugf("ImportWithTx: %v %v %v", req.Index, req.Field, req.Shard)
15271536
idx, field, err := api.indexField(req.Index, req.Field, req.Shard)
15281537
if err != nil {
15291538
return errors.Wrap(err, "getting index and field")
@@ -1642,6 +1651,12 @@ func (api *API) ImportWithTx(ctx context.Context, qcx *Qcx, req *ImportRequest,
16421651
// across many fields in a single shard. It can both set and clear
16431652
// bits and updates caches/bitDepth as appropriate, although only the
16441653
// bitmap parts happen truly transactionally.
1654+
//
1655+
// This function does not attempt to do existence tracking, because
1656+
// it can't; there's no way to distinguish empty sets from not setting
1657+
// bits. As a result, users of this endpoint are responsible for
1658+
// providing corrected existence views for fields with existence
1659+
// tracking. Our batch API does that.
16451660
func (api *API) ImportRoaringShard(ctx context.Context, indexName string, shard uint64, req *ImportRoaringShardRequest) error {
16461661
index, err := api.Index(ctx, indexName)
16471662
if err != nil {
@@ -1768,12 +1783,15 @@ func cleanupView(fieldType string, viewUpdate *RoaringUpdate) error {
17681783
// TODO wouldn't hurt to have consolidated logic somewhere for validating view names.
17691784
switch fieldType {
17701785
case FieldTypeSet, FieldTypeTime:
1771-
if viewUpdate.View == "" {
1772-
viewUpdate.View = "standard"
1773-
}
1774-
// add 'standard_' if we just have a time... this is how IDK works by default
1775-
if fieldType == FieldTypeTime && !strings.HasPrefix(viewUpdate.View, viewStandard) {
1776-
viewUpdate.View = fmt.Sprintf("%s_%s", viewStandard, viewUpdate.View)
1786+
switch viewUpdate.View {
1787+
case "":
1788+
viewUpdate.View = viewStandard
1789+
case viewStandard, viewExistence:
1790+
// do nothing, these are fine
1791+
default:
1792+
if fieldType == FieldTypeTime && !strings.HasPrefix(viewUpdate.View, viewStandard) {
1793+
viewUpdate.View = viewStandard + "_" + viewUpdate.View
1794+
}
17771795
}
17781796
case FieldTypeInt, FieldTypeDecimal, FieldTypeTimestamp:
17791797
if viewUpdate.View == "" {
@@ -2038,21 +2056,20 @@ func (api *API) ImportValueWithTx(ctx context.Context, qcx *Qcx, req *ImportValu
20382056
return nil
20392057
}
20402058

2041-
func importExistenceColumns(qcx *Qcx, index *Index, columnIDs []uint64, shard uint64) error {
2059+
func importExistenceColumns(qcx *Qcx, index *Index, columnIDs []uint64, shard uint64) (err0 error) {
20422060
ef := index.existenceField()
20432061
if ef == nil {
20442062
return nil
20452063
}
2046-
2047-
existenceRowIDs := make([]uint64, len(columnIDs))
2048-
// If we don't gratuitously hand-duplicate things in field.Import,
2049-
// the fact that fragment.bulkImport rewrites its row and column
2050-
// lists can burn us if we don't make a copy before doing the
2051-
// existence field write.
2052-
columnCopy := make([]uint64, len(columnIDs))
2053-
copy(columnCopy, columnIDs)
2054-
options := ImportOptions{}
2055-
return ef.Import(qcx, existenceRowIDs, columnCopy, nil, shard, &options)
2064+
tx, finisher, err := qcx.GetTx(Txo{Write: true, Index: index, Shard: shard})
2065+
if err != nil {
2066+
return err
2067+
}
2068+
defer finisher(&err0)
2069+
// markExistingInView is simpler/faster than Import, but unusually, we use the
2070+
// standard view of the existence field, instead of the existence view of
2071+
// a specific field, when doing the index-wide update.
2072+
return ef.markExistingInView(tx, columnIDs, viewStandard, shard)
20562073
}
20572074

20582075
// ShardDistribution returns an object representing the distribution of shards

api_directive.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -964,7 +964,7 @@ func createField(idx *Index, fld *dax.Field) error {
964964
return errors.Wrapf(err, "creating field options from field: %s", fld.Name)
965965
}
966966

967-
if _, err := idx.CreateField(string(fld.Name), "", opts...); err != nil {
967+
if _, err := idx.createNullableField(string(fld.Name), "", opts...); err != nil {
968968
return errors.Wrapf(err, "creating field on index: %s", fld.Name)
969969
}
970970
return nil

batch/batch.go

Lines changed: 57 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -573,7 +573,7 @@ func (b *Batch) Add(rec Row) error {
573573
case int64:
574574
b.values[field.Name] = append(b.values[field.Name], val)
575575
case []string:
576-
if len(val) == 0 {
576+
if val == nil {
577577
continue
578578
}
579579
rowIDSets, ok := b.rowIDSets[field.Name]
@@ -608,7 +608,8 @@ func (b *Batch) Add(rec Row) error {
608608
}
609609
b.rowIDSets[field.Name] = append(rowIDSets, rowIDs)
610610
case []uint64:
611-
if len(val) == 0 {
611+
// if length is 0, that's still a valid, empty, set
612+
if val == nil {
612613
continue
613614
}
614615
rowIDSets, ok := b.rowIDSets[field.Name]
@@ -663,6 +664,9 @@ func (b *Batch) Add(rec Row) error {
663664

664665
for i, uval := range rec.Clears {
665666
field := b.header[i]
667+
if field.Options.Type == featurebase.FieldTypeMutex && uval != nil {
668+
return errors.Errorf("individual-bit clears not allowed on mutex fields; use nil to clear a mutex")
669+
}
666670
if _, ok := b.clearRowIDs[i]; !ok {
667671
b.clearRowIDs[i] = make(map[int]uint64)
668672
}
@@ -1245,7 +1249,7 @@ func (b *Batch) doImport(frags, clearFrags fragments) error {
12451249
}
12461250

12471251
ferr := b.importer.ImportRoaringBitmap(ctx, b.tbl.ID, fld, shard, viewMap, false)
1248-
b.log.Debugf("imp-roar field: %s, shard:%d, views:%d %v", field, shard, len(clearViewMap), time.Since(starty))
1252+
b.log.Debugf("imp-roar field: %s, shard:%d, views:%d %v", field, shard, len(viewMap), time.Since(starty))
12491253
return errors.Wrapf(ferr, "importing data for %s", field)
12501254
})
12511255
}
@@ -1343,6 +1347,7 @@ func (b *Batch) makeFragments(frags, clearFrags fragments) (fragments, fragments
13431347
curShard := ^uint64(0) // impossible sentinel value for shard.
13441348
var curBM *roaring.Bitmap
13451349
var clearBM *roaring.Bitmap
1350+
var existCurBM *roaring.Bitmap
13461351
for j := range b.ids {
13471352
col := b.ids[j]
13481353
row := nilSentinel
@@ -1355,8 +1360,12 @@ func (b *Batch) makeFragments(frags, clearFrags fragments) (fragments, fragments
13551360

13561361
if col/shardWidth != curShard {
13571362
curShard = col / shardWidth
1363+
// the API treats "" as standard
13581364
curBM = frags.GetOrCreate(curShard, field.Name, "")
13591365
clearBM = clearFrags.GetOrCreate(curShard, field.Name, "")
1366+
if opts.TrackExistence {
1367+
existCurBM = frags.GetOrCreate(curShard, field.Name, "existence")
1368+
}
13601369
}
13611370
if row != nilSentinel {
13621371
// TODO this is super ugly, but we want to avoid setting
@@ -1366,6 +1375,9 @@ func (b *Batch) makeFragments(frags, clearFrags fragments) (fragments, fragments
13661375
// the NoStandardView case would be great.
13671376
if !(opts.Type == featurebase.FieldTypeTime && opts.NoStandardView) {
13681377
curBM.DirectAdd(row*shardWidth + (col % shardWidth))
1378+
if opts.TrackExistence {
1379+
existCurBM.DirectAdd(col % shardWidth)
1380+
}
13691381
}
13701382
if opts.Type == featurebase.FieldTypeTime {
13711383
views, err := b.times[j].views(opts.TimeQuantum)
@@ -1386,6 +1398,11 @@ func (b *Batch) makeFragments(frags, clearFrags fragments) (fragments, fragments
13861398
// we want to make sure that at this point, the "set"
13871399
// fragments don't contain the bit that we're clearing
13881400
curBM.DirectRemoveN(clearRow*shardWidth + (col % shardWidth))
1401+
// don't set the existence bit, probably? i don't actually quite
1402+
// understand the higher level semantics here.
1403+
if opts.TrackExistence {
1404+
existCurBM.DirectRemoveN(col % shardWidth)
1405+
}
13891406
}
13901407
}
13911408
}
@@ -1404,14 +1421,22 @@ func (b *Batch) makeFragments(frags, clearFrags fragments) (fragments, fragments
14041421
opts := field.Options
14051422
curShard := ^uint64(0) // impossible sentinel value for shard.
14061423
var curBM *roaring.Bitmap
1424+
var existCurBM *roaring.Bitmap
14071425
for j := range b.ids {
14081426
col, rowIDs := b.ids[j], rowIDSets[j]
1409-
if len(rowIDs) == 0 {
1410-
continue
1411-
}
14121427
if col/shardWidth != curShard {
14131428
curShard = col / shardWidth
14141429
curBM = frags.GetOrCreate(curShard, fname, "")
1430+
if opts.TrackExistence {
1431+
existCurBM = frags.GetOrCreate(curShard, fname, "existence")
1432+
}
1433+
}
1434+
if len(rowIDs) == 0 {
1435+
// you can validly specify an empty set, which is not the same as a null
1436+
if opts.TrackExistence && !(opts.Type == featurebase.FieldTypeTime && opts.NoStandardView) && rowIDs != nil {
1437+
existCurBM.DirectAdd(col % shardWidth)
1438+
}
1439+
continue
14151440
}
14161441
// TODO this is super ugly, but we want to avoid setting
14171442
// bits on the standard view in the specific case when
@@ -1422,6 +1447,9 @@ func (b *Batch) makeFragments(frags, clearFrags fragments) (fragments, fragments
14221447
for _, row := range rowIDs {
14231448
curBM.DirectAdd(row*shardWidth + (col % shardWidth))
14241449
}
1450+
if opts.TrackExistence {
1451+
existCurBM.DirectAdd(col % shardWidth)
1452+
}
14251453
}
14261454
if opts.Type == featurebase.FieldTypeTime {
14271455
views, err := b.times[j].views(opts.TimeQuantum)
@@ -1549,6 +1577,11 @@ func (b *Batch) makeSingleValFragments(frags, clearFrags fragments) (fragments,
15491577
shard := ids[0] / shardWidth
15501578
bitmap := frags.GetOrCreate(shard, field.Name, "standard")
15511579
clearBM := clearFrags.GetOrCreate(shard, field.Name, "standard")
1580+
var existBM, existClearBM *roaring.Bitmap
1581+
if field.Options.TrackExistence {
1582+
existBM = frags.GetOrCreate(shard, field.Name, "existence")
1583+
existClearBM = clearFrags.GetOrCreate(shard, field.Name, "existence")
1584+
}
15521585
for i, id := range ids {
15531586
if i+1 < len(ids) {
15541587
// we only want the last value set for each id
@@ -1561,13 +1594,22 @@ func (b *Batch) makeSingleValFragments(frags, clearFrags fragments) (fragments,
15611594
shard = id / shardWidth
15621595
bitmap = frags.GetOrCreate(shard, field.Name, "standard")
15631596
clearBM = clearFrags.GetOrCreate(shard, field.Name, "standard")
1597+
if field.Options.TrackExistence {
1598+
existBM = frags.GetOrCreate(shard, field.Name, "existence")
1599+
existClearBM = clearFrags.GetOrCreate(shard, field.Name, "existence")
1600+
}
15641601
}
15651602
fragmentColumn := id % shardWidth
15661603
clearBM.Add(fragmentColumn) // Will use this to clear columns.
15671604
if row != clearSentinel {
15681605
// clearSentinel is used for deletion
15691606
// so this value should only be added if its not clearSentinel
15701607
bitmap.Add(row*shardWidth + fragmentColumn)
1608+
if field.Options.TrackExistence {
1609+
existBM.Add(fragmentColumn)
1610+
}
1611+
} else if field.Options.TrackExistence {
1612+
existClearBM.Add(fragmentColumn)
15711613
}
15721614
}
15731615
}
@@ -1596,6 +1638,11 @@ func (b *Batch) makeSingleValFragments(frags, clearFrags fragments) (fragments,
15961638
fragmentColumn := recID % shardWidth
15971639

15981640
clearBM.Add(fragmentColumn)
1641+
if field.Options.TrackExistence {
1642+
existClearBM := clearFrags.GetOrCreate(shard, field.Name, "existence")
1643+
1644+
existClearBM.Add(fragmentColumn)
1645+
}
15991646
}
16001647
}
16011648

@@ -1618,6 +1665,10 @@ func (b *Batch) makeSingleValFragments(frags, clearFrags fragments) (fragments,
16181665

16191666
fragmentColumn := recID % shardWidth
16201667
clearBM.Add(fragmentColumn)
1668+
if field.Options.TrackExistence {
1669+
exist := frags.GetOrCreate(shard, field.Name, "existence")
1670+
exist.Add(fragmentColumn)
1671+
}
16211672

16221673
if boolVal {
16231674
bitmap.Add(trueRowOffset + fragmentColumn)

batch/batch_test.go

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,12 @@ func testStringSliceCombos(t *testing.T, importer featurebase.Importer, sapi fea
103103
Index: idx.Name,
104104
Query: "TopN(a1, n=10)",
105105
})
106+
if resp.Err != nil {
107+
t.Fatalf("unexpected error from TopN query: %v", resp.Err)
108+
}
109+
if len(resp.Results) < 1 {
110+
t.Fatalf("expected non-empty result set, got empty results")
111+
}
106112
pairsField, ok := resp.Results[0].(*featurebase.PairsField)
107113
assert.True(t, ok, "wrong return type: %T", resp.Results[0])
108114

@@ -508,10 +514,11 @@ func testStringSliceEmptyAndNil(t *testing.T, importer featurebase.Importer, sap
508514
{
509515
Name: "strslice",
510516
Options: featurebase.FieldOptions{
511-
Type: featurebase.FieldTypeSet,
512-
Keys: true,
513-
CacheType: featurebase.CacheTypeRanked,
514-
CacheSize: 100,
517+
Type: featurebase.FieldTypeSet,
518+
Keys: true,
519+
CacheType: featurebase.CacheTypeRanked,
520+
CacheSize: 100,
521+
TrackExistence: true,
515522
},
516523
},
517524
},
@@ -611,6 +618,14 @@ func testStringSliceEmptyAndNil(t *testing.T, importer featurebase.Importer, sap
611618
pql: "Row(strslice='z')",
612619
exp: []uint64{2},
613620
},
621+
{
622+
pql: "Row(strslice==null)",
623+
exp: []uint64{1},
624+
},
625+
{
626+
pql: "Row(strslice!=null)",
627+
exp: []uint64{0, 2, 3, 4},
628+
},
614629
}
615630
for i, test := range tests {
616631
t.Run(fmt.Sprintf("test-%d", i), func(t *testing.T) {

catcher.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,17 @@ func (c *catcherTx) Remove(index, field, view string, shard uint64, a ...uint64)
124124
return c.b.Remove(index, field, view, shard, a...)
125125
}
126126

127+
func (c *catcherTx) Removed(index, field, view string, shard uint64, a ...uint64) (changed []uint64, err error) {
128+
129+
defer func() {
130+
if r := recover(); r != nil {
131+
vprint.AlwaysPrintf("see Removed() PanicOn '%v' at '%v'", r, vprint.Stack())
132+
vprint.PanicOn(r)
133+
}
134+
}()
135+
return c.b.Removed(index, field, view, shard, a...)
136+
}
137+
127138
func (c *catcherTx) Contains(index, field, view string, shard uint64, key uint64) (exists bool, err error) {
128139

129140
defer func() {

dax/queryer/orchestrator.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3302,6 +3302,9 @@ func (o *orchestrator) translateResult(ctx context.Context, qtbl *dax.QualifiedT
33023302
return nil, errors.Wrapf(err, "orch: translating IDs of field %q", v)
33033303
}
33043304
mapper = func(ids []uint64) (interface{}, error) {
3305+
if ids == nil {
3306+
return []string(nil), nil
3307+
}
33053308
keys := make([]string, len(ids))
33063309
for i, id := range ids {
33073310
keys[i] = translations[id]
@@ -3311,9 +3314,6 @@ func (o *orchestrator) translateResult(ctx context.Context, qtbl *dax.QualifiedT
33113314
} else {
33123315
datatype = "[]uint64"
33133316
mapper = func(ids []uint64) (interface{}, error) {
3314-
if ids == nil {
3315-
ids = []uint64{}
3316-
}
33173317
return ids, nil
33183318
}
33193319
}

dax/table.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -823,4 +823,5 @@ type FieldOptions struct {
823823
TimeQuantum TimeQuantum `json:"time-quantum,omitempty"`
824824
TTL time.Duration `json:"ttl,omitempty"`
825825
ForeignIndex string `json:"foreign-index,omitempty"`
826+
TrackExistence bool `json:"track-existence"`
826827
}

0 commit comments

Comments
 (0)