Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] add gRPC integration e2e test mode #5322

Merged
merged 31 commits into from
Apr 9, 2024

Conversation

james-ryans
Copy link
Contributor

@james-ryans james-ryans commented Apr 3, 2024

Which problem is this PR solving?

Description of the changes

  • Utilizing existing StorageIntegration to test the jaeger-v2 OTel Collector and gRPC storage backend with the provided config file at cmd/jaeger/grpc_config.yaml.
  • Creates an abstraction for e2e test mode that initializes the collector, span writer to the receiver, and span reader from jaeger_query.

How was this change tested?

  • Run STORAGE=grpc SPAN_STORAGE_TYPE=memory make jaeger-v2-storage-integration-test

Checklist

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@james-ryans james-ryans requested a review from a team as a code owner April 3, 2024 07:15
@james-ryans
Copy link
Contributor Author

@yurishkuro There's an issue that translating Jaeger proto to OTLP format changes the tags attributes BINARY type into string at open-telemetry/opentelemetry-collector-contrib/pkg/translator/jaeger/jaegerproto_to_traces.go#L231-L248 which causes the integration test failed. The remaining tests failed below are all caused by this problem. How should we solve this?

...
=== RUN   TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Tags
    integration.go:139: Waiting for storage backend to update documents, iteration 1 out of 100
    integration.go:330: FindTraces: expected: 1, actual: 0
    integration.go:139: Waiting for storage backend to update documents, iteration 2 out of 100
    trace_compare.go:38: Expected and actual differ: [0].Spans[0].Tags[0].VType: 4 != 0
    trace_compare.go:38: Expected and actual differ: [0].Spans[0].Tags[0].VStr: "" != "AAAwOQ=="
    trace_compare.go:38: Expected and actual differ: [0].Spans[0].Tags[0].VBinary: []uint8[4] != []uint8[0]
    trace_compare.go:44: Actual traces: [{"spans":[{"trace_id":"AAAAAAAAAAAAAAAAAAAAAQ==","span_id":"AAAAAAAAAAI=","operation_name":"some-operation","references":null,"flags":0,"start_time":"2024-04-02T16:46:31.639875Z","duration":7000,"tags":[{"key":"blob","v_str":"AAAwOQ=="},{"key":"sameplacetag1","v_str":"sameplacevalue"},{"key":"sameplacetag2","v_type":2,"v_int64":123},{"key":"sameplacetag3","v_type":3,"v_float64":72.5},{"key":"sameplacetag4","v_type":1,"v_bool":true}],"logs":null,"process":{"service_name":"query01-service","tags":null}}],"process_map":null}]
    trace_compare.go:45: Expected traces: [{"spans":[{"trace_id":"AAAAAAAAAAAAAAAAAAAAAQ==","span_id":"AAAAAAAAAAI=","operation_name":"some-operation","references":[],"flags":0,"start_time":"2024-04-02T16:46:31.639875Z","duration":7000,"tags":[{"key":"blob","v_type":4,"v_binary":"AAAwOQ=="},{"key":"sameplacetag1","v_str":"sameplacevalue"},{"key":"sameplacetag2","v_type":2,"v_int64":123},{"key":"sameplacetag3","v_type":3,"v_float64":72.5},{"key":"sameplacetag4","v_type":1,"v_bool":true}],"logs":[],"process":{"service_name":"query01-service","tags":[]}}],"process_map":null}]
...
--- ❌ FAIL: TestGRPCStorage (35.19s)
    --- ✅ PASS: TestGRPCStorage/GetServices (1.04s)
    --- ✅ PASS: TestGRPCStorage/GetOperations (1.04s)
    --- ✅ PASS: TestGRPCStorage/GetTrace (1.04s)
        --- ✅ PASS: TestGRPCStorage/GetTrace/NotFound_error (0.00s)
    --- ✅ PASS: TestGRPCStorage/GetLargeSpans (29.91s)
    --- ❌ FAIL: TestGRPCStorage/FindTraces (1.15s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Tags (1.01s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Logs (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Process (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Tags_in_different_spots (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Trace_spans_over_multiple_indices (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Operation_name (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Operation_name_+_max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Operation_name_+_Duration_range (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Duration_range (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/default (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Operation_name (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Operation_name_+_max_Duration (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Operation_name_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name_+_max_Duration (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Multiple_Traces (0.00s)
❌ FAIL
coverage: 17.6% of statements in ./...
❌ FAIL	github.com/jaegertracing/jaeger/cmd/jaeger/internal/integration/e2e	36.236s
❌ FAIL

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 87.43169% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 95.17%. Comparing base (6a26005) to head (d88a6a5).

Files Patch % Lines
cmd/jaeger/internal/integration/span_reader.go 80.68% 10 Missing and 7 partials ⚠️
cmd/jaeger/internal/integration/span_writer.go 79.31% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5322      +/-   ##
==========================================
- Coverage   95.25%   95.17%   -0.09%     
==========================================
  Files         340      343       +3     
  Lines       16677    16779     +102     
==========================================
+ Hits        15886    15969      +83     
- Misses        601      610       +9     
- Partials      190      200      +10     
Flag Coverage Δ
badger 10.59% <20.83%> (-0.59%) ⬇️
cassandra-3.x 18.52% <20.83%> (-1.06%) ⬇️
cassandra-4.x 18.52% <20.83%> (-1.06%) ⬇️
elasticsearch-5.x 21.00% <20.83%> (-1.20%) ⬇️
elasticsearch-6.x 20.98% <20.83%> (-1.21%) ⬇️
elasticsearch-7.x 21.04% <20.83%> (-1.22%) ⬇️
elasticsearch-8.x 21.24% <20.83%> (-1.20%) ⬇️
grpc 14.58% <87.43%> (+3.00%) ⬆️
kafka 10.21% <8.33%> (-0.59%) ⬇️
opensearch-1.x 21.09% <20.83%> (-1.22%) ⬇️
opensearch-2.x 21.10% <20.83%> (-1.21%) ⬇️
unittests 91.73% <5.46%> (-0.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

cmd/jaeger/internal/integration/e2e/integration.go Outdated Show resolved Hide resolved
cmd/jaeger/internal/integration/e2e/integration.go Outdated Show resolved Hide resolved
cmd/jaeger/internal/integration/e2e/integration.go Outdated Show resolved Hide resolved
plugin/storage/integration/integration.go Outdated Show resolved Hide resolved
cmd/jaeger/internal/integration/e2e/grpc_test.go Outdated Show resolved Hide resolved
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
)

type GRPCServer struct {
errChan chan error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? If it is, I would prefer to use a callback function, not a channel, because channels like this tend to cause deadlocks when nothing is reading them, while the callback achieves the same result of communicating the status without the risk of blocking (and the implementation of the callback can still use channel if it wants to, but that channel will be more local and less prone to deadlocks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I have updated this to pass the error through the struct field

require.NoError(t, err)
require.NoError(t, s.factory.Close())
if s.server != nil {
require.NoError(t, s.server.Close())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to release the server here (s.server = nil)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. actually the server (GRPCServer) here is a declaration if the current test suite uses remote storage or a plugin. If it uses remote storage then it closes and starts the server again.

But the server inside GPRCServer struct is the gRPC server that handles requests, and I'll add to release the server there.

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
return err
}

func (r *spanReader) GetTrace(ctx context.Context, traceID model.TraceID) (*model.Trace, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit sad that you had to implement all of that. This is a takeway for V2 storage - perhaps it's possible to have remote storage API to be the same as query service API. Then we could've used plugin/storage/grpc/shared/client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll create a new issue for this task.

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
"github.com/jaegertracing/jaeger/ports"
)

type GRPCServer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the business function of this class? Calling it grpc server doesn't tell me anything. Is it not RemoteMemoryStorage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what is the difference between this server and cmd/remote-storage/app/server.go? Could we reuse that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes, we could reuse cmd/remote-storage/app/server.go. This server is just an extraction of the gRPCServer from existing grpc_test.go so I can reuse this code for the new e2e grpc_test, but I'll change this to reuse the remote-storage server.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

cc, err := grpc.DialContext(ctx, ports.PortToHostPort(ports.QueryGRPC), opts...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call NewClient, dial was deprecated

}
}

func (r *spanReader) Start() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest merging this into create function. What's the benefit of separating Start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I've seen a lot of structs that separate the creation and start functions so I thought it was cleaner that way. I'll refactor this.

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@yurishkuro
Copy link
Member

@james-ryans another possible way to unblock from waiting on OTEL contrib is to tweak the tests to not use binary tags (controlled by a flag, with a new issue to fix it once OTEL fix is released)

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@@ -75,6 +75,10 @@ type StorageIntegration struct {
// Skip Archive Test if not supported by the storage backend
SkipArchiveTest bool

// TODO: remove this after upstream issue in OTEL jaeger translator is fixed
// Skip testing trace binary tags, logs, and process
SkipBinaryAttrs bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please book a ticket referring to this commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booked it at #5341


func (s *RemoteMemoryStorage) Close() error {
if err := s.server.Close(); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this potentially leaves storageFactory not closed. The pattern we use for multi-close is

	return errors.Join(
		b.reporter.Close(),
		b.tlsCloser.Close(),
		b.GetConn().Close(),
	)

tm := tenancy.NewManager(&opts.Tenancy)
storageFactory, err := storage.NewFactory(storage.FactoryConfigFromEnvAndCLI(os.Args, os.Stderr))
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about taking t.Testing arg and just calling require.NoError? This will eliminate the uncovered lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the easiest approach. Will do

@@ -60,7 +60,7 @@ func TestIndexCleaner_doNotFailOnEmptyStorage(t *testing.T) {
}

func TestIndexCleaner_doNotFailOnFullStorage(t *testing.T) {
skipUnlessEnv(t, "elasticsearch", "opensearch")
SkipUnlessEnv(t, "elasticsearch", "opensearch")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this little change is a good candidate for a separate PR. It would significantly reduce the number of files in this PR, making it more focused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'll do that

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@yurishkuro yurishkuro merged commit deea97b into jaegertracing:main Apr 9, 2024
36 of 37 checks passed
yurishkuro pushed a commit that referenced this pull request May 11, 2024
## Which problem is this PR solving?
- PR #5322 temporarily added a SkipBinaryAttrs flag to avoid checking
span's tags with a binary type since the OTEL Jaeger translator has a
bug that converts binary tags into string tags. Since it has been fixed,
we will delete this flag.

## Description of the changes
- Delete the SkipBinaryAttrs flag from StorageIntegration.

## How was this change tested?
- Tested locally by running all the e2e storage tests. e.g.
`STORAGE=grpc SPAN_STORAGE_TYPE=memory make
jaeger-v2-storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:test Change that's adding missing tests or correcting existing tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants