Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send CF Logs to New Relic and S3 #2961

Merged
merged 91 commits into from
Dec 20, 2023
Merged

Send CF Logs to New Relic and S3 #2961

merged 91 commits into from
Dec 20, 2023

Conversation

asteel-gsa
Copy link
Contributor

@asteel-gsa asteel-gsa commented Dec 11, 2023

Issue: #1019

Sketch:

flowchart TD
    A(S3 Bucket) -->|Bind Service| B{Logshipper App}
    D(New Relic Credentials) -->|Bind Service| B
    C(Logshipper Credentials) -->|Bind Service| B
    B -->|Send Logs| F(S3 Bucket)
    B -->|Send Logs| G(New Relic)
    E -->|Send Logs| B
    B --> H(Logdrain)
    H --> |Bind Service| E(gsa-fac Application)
Loading

Key Components:

  • Logshipper Credentials, defined as cg-logshipper-creds with a tag of logshipper-creds is populated as a random user/pass generated via terraform that signifies the ${HTTP_USER} AND ${HTTP_PASS} credential set for the syslog_drain
  • New Relic Credentials, defined as newrelic-creds is not a new credential service, but has now been given a tag of newrelic-creds and a content of the logs uri endpoint, in addition to the existing credential contents
  • The s3 bucket, defined as log-storage with a tag of logshipper-s3 is a dedicated s3 bucket that's sole purpose is the ingest and storage of all fluentbit logs being sent. The credentials for this bucket, will be given to GSA SoC so that they may see the application metrics and logs.
  • The logdrain service, defined as fac-logdrain is what is bound to the gsa-fac application, so that it may stream logs to the cg-logshipper application. It consists of a syslog_drain uri.

NOTE:
As of the present, the names for these services remain consistently named against the .profile in the logshipper repo until PR #11 is merged, at which time, the .profile will remove constraints on the service-name and instead be dependent on the tags

Validation:

  • Navigating to New Relic, and going to the "Global Logs Page", which differs from the "Application Logs Page", we can see the following using the filter newrelic.source:"api.logs".
    image

The Fluent Bit - Heartbeat is a 60 second "heartbeat" that allows us to easily know if fluentbit is still operating. The empty spaces that are seen are infact application metrics and logs. The message is "empty", but the "raw_message" is not, so in order to see what that is, clicking the log will show the specifics, along with the tags.* of the "raw_message" that makes things relatively indexable.
image
image

  • Generating a temporary service key for the log-storage s3 bucket allows us to view that logs are infact present from fluent bit
$ aws s3 ls s3://$BUCKET/fluent-bit-logs/2023/12/
                           PRE 08/
                           PRE 09/
                           PRE 10/
                           PRE 11/
$ aws s3 ls s3://$BUCKET/fluent-bit-logs/2023/12
                           PRE 12/

Alex Steel@DESKTOP-NL4DO24 MINGW64 ~/Code/FAC (logshipper)
$ aws s3 ls s3://$BUCKET/fluent-bit-logs/2023/12/
                           PRE 08/
                           PRE 09/
                           PRE 10/
                           PRE 11/

Alex Steel@DESKTOP-NL4DO24 MINGW64 ~/Code/FAC (logshipper)
$ aws s3 ls s3://$BUCKET/fluent-bit-logs/2023/12/11/
                           PRE 00/
                           PRE 01/
                           PRE 02/
                           PRE 03/
                           PRE 04/
                           PRE 05/
                           PRE 06/
                           PRE 07/
                           PRE 08/
                           PRE 09/
                           PRE 10/
                           PRE 11/
                           PRE 12/
                           PRE 13/
                           PRE 14/

$ aws s3 ls s3://$BUCKET/fluent-bit-logs/2023/12/11/14/08/
2023-12-11 09:18:05     730315 04-objectEm0dSys4

A full path looks like fluent-bit-logs/YYYY/MM/DD/HH/mm/object-chunk-name

Testing:

  • This has been exclusively tested and validated in the preview environment.

Copy link
Contributor

github-actions bot commented Dec 11, 2023

File Coverage Missing
All files 86%
api/serializers.py 88% 177-178 183 188
api/test_views.py 95% 103
api/uei.py 88% 87 118-119 163 167-168
api/views.py 98% 198-199 337-338
audit/forms.py 47% 22-29 142-149
audit/intake_to_dissemination.py 88% 71-72 205-211 261 300-306
audit/test_commands.py 87%
audit/test_manage_submission_access_view.py 98% 15 19
audit/test_mixins.py 90% 112-113 117-119 184-185 189-191
audit/test_validators.py 95% 436 440 608-609 848 855 862 869
audit/test_views.py 95% 410-442 451-482 491-519
audit/test_workbooks_should_fail.py 88% 56 83-84 88
audit/test_workbooks_should_pass.py 90% 56 71
audit/utils.py 70% 13 21 33-35 38
audit/validators.py 92% 137 189 283-292 299-308 486-490 495-499 515-524
audit/cross_validation/additional_ueis.py 93% 33
audit/cross_validation/check_award_ref_declaration.py 90%
audit/cross_validation/check_award_reference_uniqueness.py 93%
audit/cross_validation/check_certifying_contacts.py 87%
audit/cross_validation/check_findings_count_consistency.py 91%
audit/cross_validation/check_ref_number_in_cap.py 90%
audit/cross_validation/check_ref_number_in_findings_text.py 90%
audit/cross_validation/errors.py 78% 30 69
audit/cross_validation/naming.py 93% 197
audit/cross_validation/submission_progress_check.py 95% 80
audit/cross_validation/tribal_data_sharing_consent.py 81% 33 36 40
audit/cross_validation/validate_general_information.py 93% 28-29
audit/fixtures/single_audit_checklist.py 55% 146-183 229-238
audit/intakelib/exceptions.py 71% 7-9 12
audit/intakelib/intermediate_representation.py 91% 27-28 73 91 129 200-203 212-213
audit/intakelib/mapping_audit_findings.py 97% 55
audit/intakelib/mapping_audit_findings_text.py 97% 52
audit/intakelib/mapping_federal_awards.py 93% 90
audit/intakelib/mapping_util.py 81% 21 25 29 99 104-105 114-120 130 145 150
audit/intakelib/checks/check_all_unique_award_numbers.py 79% 24
audit/intakelib/checks/check_aln_three_digit_extension_pattern.py 77% 37
audit/intakelib/checks/check_cluster_names.py 80% 20-25
audit/intakelib/checks/check_cluster_total.py 80% 56 72 75
audit/intakelib/checks/check_federal_program_total.py 88% 28
audit/intakelib/checks/check_finding_prior_references_pattern.py 73% 33 43-44
audit/intakelib/checks/check_findings_grid_validation.py 84% 58
audit/intakelib/checks/check_has_all_the_named_ranges.py 84% 52
audit/intakelib/checks/check_is_a_workbook.py 69% 20
audit/intakelib/checks/check_loan_balance_entries.py 79% 26 43-44
audit/intakelib/checks/check_loan_balance_present.py 76% 27 36
audit/intakelib/checks/check_look_for_empty_rows.py 91% 18
audit/intakelib/checks/check_no_major_program_no_type.py 76% 18 27
audit/intakelib/checks/check_no_repeat_findings.py 76% 21 30
audit/intakelib/checks/check_other_cluster_names.py 81% 24 34
audit/intakelib/checks/check_passthrough_name_when_no_direct.py 70% 9 26 38 47
audit/intakelib/checks/check_sequential_award_numbers.py 76% 14 22
audit/intakelib/checks/check_show_ir.py 70% 8-14
audit/intakelib/checks/check_start_and_end_rows_of_all_columns_are_same.py 89% 14
audit/intakelib/checks/check_state_cluster_names.py 65% 23-24 34
audit/intakelib/checks/check_total_amount_expended.py 70% 23
audit/intakelib/checks/check_version_number.py 73% 21 31-32
audit/intakelib/checks/runners.py 96% 140
audit/intakelib/common/util.py 90% 22 39
audit/intakelib/transforms/xform_reformat_prior_references.py 55% 12-17
audit/intakelib/transforms/xform_rename_additional_notes_sheet.py 81% 14
audit/management/commands/load_fixtures.py 46% 39-45
audit/models/models.py 82% 57 59 64 66 216 228-231 249 426 444-445 453 475 564-565 569 577 586 592
audit/views/audit_info_form_view.py 27% 25-74 77-117 120-137
audit/views/pre_dissemination_download_view.py 76% 15-18 23-26 33-43
audit/views/submission_progress_view.py 89% 111 171-172
audit/views/tribal_data_consent.py 34% 23-41 44-79
audit/views/unlock_after_certification.py 57% 28-47 69-83
audit/views/upload_report_view.py 26% 32-35 44 91-117 120-170 178-209
audit/views/views.py 39% 69 76-95 118-119 193-194 239-240 251-252 254-258 305-318 321-335 340-353 370-376 381-401 404-432 437-466 469-513 518-538 541-569 574-603 606-650 655-667 670-680 685-697 724-725
census_historical_migration/test_federal_awards_xforms.py 99% 191-192
census_historical_migration/sac_general_lib/audit_information.py 70% 19-23 72-78 157-169
census_historical_migration/sac_general_lib/general_information.py 83% 172 241-262
census_historical_migration/sac_general_lib/report_id_generator.py 68% 11-16
census_historical_migration/sac_general_lib/utils.py 87% 32-35
census_historical_migration/transforms/xform_retrieve_uei.py 42% 7-12
census_historical_migration/workbooklib/additional_eins.py 83% 58 65-79
census_historical_migration/workbooklib/excel_creation_utils.py 53% 102 145-150 155-162 166-184 197-201 209-234 239
census_historical_migration/workbooklib/federal_awards.py 56% 132 145-169 270-424
census_historical_migration/workbooklib/notes_to_sefa.py 53% 33-37 87-93 101-107 115 124-168
config/test_settings.py 90% 21-22 39-40 55-56
config/urls.py 71% 83
dissemination/file_downloads.py 73% 35-53 83-85
dissemination/models.py 99% 461
dissemination/search.py 84% 54 84 90 119 121 124-132
dissemination/summary_reports.py 69% 268-270 274-278 347-391 416 452-454 469-476
dissemination/views.py 80% 111 113-115 118 145 197 199 201 230-234
dissemination/migrations/0002_general_fac_accepted_date.py 47% 10-12
djangooidc/backends.py 78% 32 57-63
djangooidc/exceptions.py 66% 19 21 23 28
djangooidc/oidc.py 16% 32-35 45-51 64-70 92-149 153-199 203-226 230-275 280-281 286
djangooidc/views.py 80% 22 43 114
djangooidc/tests/common.py 96%
report_submission/forms.py 92% 35
report_submission/views.py 76% 83 215-216 218 240-241 260-261 287-396 399-409
report_submission/templatetags/get_attr.py 76% 8 11-14 18
support/admin.py 88% 76 79 84 91-97 100-102
support/cog_over.py 91% 30-33 93 145
support/test_admin_api.py 80% 22 146-147 236-237 316-317
support/test_cog_over.py 98% 134-135 224
support/management/commands/seed_cog_baseline.py 98% 20-21
support/models/cog_over.py 89% 103-104
tools/update_program_data.py 89% 96
users/admin.py 99% 27
users/auth.py 96% 58-59
users/models.py 96% 18 74-75
users/fixtures/user_fixtures.py 91%

Minimum allowed coverage is 85%

Generated by 🐒 cobertura-action against 25c6a1e

@asteel-gsa
Copy link
Contributor Author

Per review with jadud, dan, tadhg, is there some way that we can get, via api call or some other means that show us the fluentbit heartbeat? Use case is that we want to know that the logdrain was successfully deployed after everything. (This really falls under the umbrella category of enhancing our testing suite and having api based smoke tests in our envs

@asteel-gsa
Copy link
Contributor Author

@mogul any chance you are available for review? I would like to move this into production so we can get the credentials over to SoC and satisfy that poam

mogul
mogul previously requested changes Dec 20, 2023
Copy link
Contributor

@mogul mogul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor fixes requested, nearly all docs-only. This looks good to go otherwise!


# This has to happen after an application deployment because the manifest (currently) is responsible
# for binding the "logdrain service" to the "gsa-fac application". This also needs to be done
# based on the suspicion that fluentbit cannot register the incoming logs when it is initially
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to know what's going on here, but I'm OK with leaving this in if it's reliably addressing the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, yes, this is reliably addressing the issue. Once everything is configured and the app is deploying, I only ever saw it go to a 201 instead of the 502 after a restart, or something needed to minorly modify the module. Probably something to investigate, but, I suppose that is the beauty of being the first consumers of this 🤣

terraform/shared/modules/cg-logshipper/readme.md Outdated Show resolved Hide resolved
terraform/shared/modules/cg-logshipper/variables.tf Outdated Show resolved Hide resolved
terraform/shared/modules/env/logshipper.tf Outdated Show resolved Hide resolved
terraform/shared/modules/env/logshipper.tf Outdated Show resolved Hide resolved
terraform/shared/modules/env/logshipper.tf Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Dec 20, 2023

Terraform plan for dev

Plan: 8 to add, 1 to change, 0 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # module.dev.cloudfoundry_network_policy.logshipper-network-policy will be created
  + resource "cloudfoundry_network_policy" "logshipper-network-policy" {
      + id = (known after apply)

      + policy {
          + destination_app = "a6eb3870-2769-4aa9-a8f2-f1a31939f563"
          + port            = "61443"
          + protocol        = "tcp"
          + source_app      = (known after apply)
        }
    }

  # module.dev.cloudfoundry_user_provided_service.credentials will be updated in-place
  ~ resource "cloudfoundry_user_provided_service" "credentials" {
      ~ credentials = (sensitive value)
        id          = "03df74b7-065a-46df-9a85-0bac201bf36c"
        name        = "newrelic-creds"
      ~ tags        = [
          + "newrelic-creds",
        ]
        # (1 unchanged attribute hidden)
    }

  # module.dev.module.cg-logshipper.cloudfoundry_app.cg_logshipper_app will be created
  + resource "cloudfoundry_app" "cg_logshipper_app" {
      + buildpack                       = (known after apply)
      + buildpacks                      = [
          + "https://github.com/cloudfoundry/apt-buildpack",
          + "nginx_buildpack",
        ]
      + disk_quota                      = 256
      + enable_ssh                      = (known after apply)
      + environment                     = (sensitive value)
      + health_check_http_endpoint      = (known after apply)
      + health_check_invocation_timeout = (known after apply)
      + health_check_timeout            = (known after apply)
      + health_check_type               = "process"
      + id                              = (known after apply)
      + id_bg                           = (known after apply)
      + instances                       = 1
      + memory                          = 256
      + name                            = "logshipper"
      + path                            = "../shared/modules/cg-logshipper/logshipper.zip"
      + ports                           = (known after apply)
      + source_code_hash                = "d0a1938f9e105317e885d5e92d327e9845916cd7dd4432c08afa2e223c530c14"
      + space                           = "06525ba3-19c2-451b-96e9-ea4a9134e8b9"
      + stack                           = (known after apply)
      + stopped                         = false
      + strategy                        = "rolling"
      + timeout                         = 180

      + routes {
          + port  = (known after apply)
          + route = (known after apply)
        }

      + service_binding {
          + service_instance = "03df74b7-065a-46df-9a85-0bac201bf36c"
        }
      + service_binding {
          + service_instance = (known after apply)
        }
      + service_binding {
          + service_instance = (known after apply)
        }
    }

  # module.dev.module.cg-logshipper.cloudfoundry_route.logshipper will be created
  + resource "cloudfoundry_route" "logshipper" {
      + domain   = "50ba3f69-cd54-4963-9172-14f3334b479e"
      + endpoint = (known after apply)
      + hostname = "fac-dev-logshipper"
      + id       = (known after apply)
      + port     = (known after apply)
      + space    = "06525ba3-19c2-451b-96e9-ea4a9134e8b9"
    }

  # module.dev.module.cg-logshipper.cloudfoundry_user_provided_service.logdrain_service will be created
  + resource "cloudfoundry_user_provided_service" "logdrain_service" {
      + id               = (known after apply)
      + name             = "fac-logdrain"
      + space            = "06525ba3-19c2-451b-96e9-ea4a9134e8b9"
      + syslog_drain_url = (sensitive value)
    }

  # module.dev.module.cg-logshipper.cloudfoundry_user_provided_service.logshipper_creds will be created
  + resource "cloudfoundry_user_provided_service" "logshipper_creds" {
      + credentials = (sensitive value)
      + id          = (known after apply)
      + name        = "cg-logshipper-creds"
      + space       = "06525ba3-19c2-451b-96e9-ea4a9134e8b9"
      + tags        = [
          + "logshipper-creds",
        ]
    }

  # module.dev.module.cg-logshipper.random_password.password will be created
  + resource "random_password" "password" {
      + bcrypt_hash = (sensitive value)
      + id          = (known after apply)
      + length      = 16
      + lower       = true
      + min_lower   = 0
      + min_numeric = 0
      + min_special = 0
      + min_upper   = 0
      + number      = true
      + numeric     = true
      + result      = (sensitive value)
      + special     = false
      + upper       = true
    }

  # module.dev.module.cg-logshipper.random_uuid.username will be created
  + resource "random_uuid" "username" {
      + id     = (known after apply)
      + result = (known after apply)
    }

  # module.dev.module.cg-logshipper.module.s3-logshipper-storage.cloudfoundry_service_instance.bucket will be created
  + resource "cloudfoundry_service_instance" "bucket" {
      + id                             = (known after apply)
      + name                           = "log-storage"
      + replace_on_params_change       = false
      + replace_on_service_plan_change = false
      + service_plan                   = "021bb2a3-7e11-4fc2-b06b-d9f5938cd806"
      + space                          = "06525ba3-19c2-451b-96e9-ea4a9134e8b9"
      + tags                           = [
          + "logshipper-s3",
        ]
    }

Plan: 8 to add, 1 to change, 0 to destroy.

Warning: Argument is deprecated

  with module.dev.module.cg-logshipper.module.s3-logshipper-storage.cloudfoundry_service_instance.bucket,
  on /tmp/terraform-data-dir/modules/dev.cg-logshipper.s3-logshipper-storage/s3/main.tf line 14, in resource "cloudfoundry_service_instance" "bucket":
  14:   recursive_delete = var.recursive_delete

Since CF API v3, recursive delete is always done on the cloudcontroller side.
This will be removed in future releases

(and 5 more similar warnings elsewhere)

❌ Plan not applied in Deploy to Development and Management Environment #436 (Plan has changed)

Copy link
Contributor

github-actions bot commented Dec 20, 2023

Terraform plan for meta

No changes. Your infrastructure matches the configuration.
No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

Warning: Argument is deprecated

  with module.s3-backups.cloudfoundry_service_instance.bucket,
  on /tmp/terraform-data-dir/modules/s3-backups/s3/main.tf line 14, in resource "cloudfoundry_service_instance" "bucket":
  14:   recursive_delete = var.recursive_delete

Since CF API v3, recursive delete is always done on the cloudcontroller side.
This will be removed in future releases

✅ Plan applied in Deploy to Development and Management Environment #436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure For the invisible bits that make it all go.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

FAC-D-2023-015 - 2023-11-14 Set up a logging service
2 participants