Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up a logging service #1019

Closed
6 of 9 tasks
Tracked by #725
JeanMarie-PM opened this issue Apr 17, 2023 · 16 comments · Fixed by #2961
Closed
6 of 9 tasks
Tracked by #725

Set up a logging service #1019

JeanMarie-PM opened this issue Apr 17, 2023 · 16 comments · Fixed by #2961
Assignees
Labels
compliance Stuff which may relate to a specific requirement or timelines for resolution SHOULD Things we should do for the tracking epic to be "satisfactory"

Comments

@JeanMarie-PM
Copy link
Contributor

JeanMarie-PM commented Apr 17, 2023

At a glance

In order to ensure logs appear in places where people can mine and alert on them
as a FAC devops-oriented person
I want cloud.gov app logs and metrics to be shipped to New Relic (for our own alerting purposes) and an S3 bucket (for the GSA SOC to ingest for GSA IT's alerting purposes).

Acceptance Criteria

We use DRY behavior-driven development wherever possible.

Scenario: Logs are flowing to New Relic

Given I am authenticated with New Relic
when I review logs for the gsa-fac app
...

then...

Scenario: Logs are flowing to the S3 bucket

Given I have a service-key for the FAC logs S3 instance
when I look at the content of the S3 bucket
...

then...

Shepherd

Background

cloud.gov doesn't offer alerting capabilities out of the box, so that's why we're going to ship logs off to New Relic, where we can set up alerts.

In addition, OMB directive M-21-31 says that agencies should stovepipe logs into a central agency-wide SOC. So that's why we're going to ship logs to an S3 bucket (that the GSA SOCaaS can pull from).

Security Considerations

Required per CM-4.

We are ensuring that the cg-logshipper app uses the egress proxy to communicate with New Relic, and the egress proxy requires client credentials. We're also ensuring that the cg-logshipper app itself requires client credentials. Connections to brokered S3 buckets are already routed over a cloud.gov internal endpoint. In all hops (app to logshipper, logshipper to egress proxy, logshipper to New Relic, logshipper to S3) the traffic is secured with TLS.

For our initial implementation the cg-logshipper app and S3 bucket will be in the same space as the apps whose logs it is draining. A team member acting as an insider threat could possibly tamper with the logshipper app or the bucket content using their SpaceDeveloper access. However, that's a remote concern. For our initial implementation we're considering that concern out of scope and we're noting mitigation of that concern as a "potential future enhancement" below. (Also note that the logs that go to logs.fr.cloud.gov and New Relic are tamper-resistant and serve as a comparison point for the S3 content in case an insider threat is identified.)

Sketch

We're thinking we'll write a Terraform module that deploys the cg-logshipper app, similar to the existing https-proxy module.

Since we're not all that familiar with the raw output from Cloud Foundry, it may be helpful to look at the cloud.gov ELK configuration to see how they're processing raw output from CF on its way into logs.fr.cloud.gov (where a bunch of fields are parsed out). Here are the ELK (old) Opensearch (new) versions of the logs.fr.cloud stack.

Tasks

Potential future enhancements (other stories)

Tasks

For machine identification: We want to have a concrete test that will sieve out lines specifically delivered by the logshipper to verify that everything is working, rather than having to check it's working as a human looking at the UI. In logs.fr.cloud.gov there's a cf_origin:firehose field; we are hoping we can implement something like that for the logshipper in New Relic.

For moving the logshipper app and bucket to another space: This addresses a potential insider threat consideration, so they can't create service bindings and mess with the content of the S3 bucket; only admins (who have direct access to that other space) can do that.


Process checklist

Sketch

  • Design designs all the things
  • Engineering engineers all the things

Definition of Done

Triage

If not likely to be important in the next quarter...

  • Archived from the board

Otherwise...

  • Has a clear story statement
  • Design or Engineering accepts that it belongs in their respective backlog

Design Backlog

  • [-] Has clearly stated/testable acceptance criteria
  • [-] Meets the design Definition of Ready [citation needed]
  • [-] A design shepherd has been identified

Design In Progress

  • [-] Meets the design Definition of Done [citation needed]

Design Review Needed

  • [-] Necessary outside review/sign-off was provided

Design Done

  • [-] Presented in a sprint review
  • [-] Includes screenshots or references to artifacts

If no engineering is necessary

  • [-] Tagged with the sprint where it was finished
  • [-] Archived

Engineering Backlog

  • Has clearly stated/testable acceptance criteria
  • Has a sketch or list of tasks
  • Can reasonably be done in a few days (otherwise, split this up!)

Engineering Available

  • There's capacity in the In Progress column
  • An engineering shepherd has been identified

Engineering In Progress

If there's UI...

  • Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
  • Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
  • Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.

Engineering Blocked

  • Blocker removed/resolved

Engineering Review Needed

  • Outside review/sign-off was provided

Engineering Done

  • Presented in a sprint review
  • Includes screenshots or references to artifacts
  • Tagged with the sprint where it was finished
  • Archived
@JeanMarie-PM
Copy link
Contributor Author

JeanMarie-PM commented Apr 17, 2023

Comment in slack by @mogul

If GSA SOC allows us to ship logs to their SIEM-as-a-Service (I haven't reviewed the SSPP to see if we say we are doing this, but it's an OMB mandate that eventually all apps ship logs to an agency-centric endpoint) then we should also be deploying the logstack stuff I wrote which drops logs into an S3 endpoint that they ingest.

@JeanMarie-PM JeanMarie-PM added the compliance Stuff which may relate to a specific requirement or timelines for resolution label Apr 17, 2023
@JeanMarie-PM
Copy link
Contributor Author

It looks like basic logging info is already available in cloud.gov. What requirements do we ned to address regarding logging for the MVP?
@jadudm , @mogul

@asteel-gsa
Copy link
Contributor

It looks like basic logging info is already available in cloud.gov. What requirements do we ned to address regarding logging for the MVP? @jadudm , @mogul

https://logs.fr.cloud.gov/ > Left Blade > Kibana > Discover

@mogul
Copy link
Contributor

mogul commented May 2, 2023

Note:

  • Because cloud.gov's logging service doesn't give us any ability to make alerts, we should also ship logs to New Relic, because we can implement alerts there based on both logs and metric data. (cloud.gov is working on a replacement for logs.fr.cloud.gov that would resolve the need to use NR for this, but it won't be ready in time to help us with our ATO.)
  • OMB M-21-31 says (among other things) that all agency systems should be shipping logs to the authorizing agency's central SIEM/SOC service. It's not in place across the board yet for many existing GSA systems! There is a GSA SOCaaS, but the rules-of-engagement/MOU for TTS systems to be able to ship logs to it is still pending (though close). I don't know if our LATO would be held up until we are shipping logs into that SOCaaS, but it's relatively easy for us to implement a log-shipper that will put our logs into S3. Then the GSA SOCaaS can ingest them from there when the MOU is finalized.

@mogul mogul self-assigned this May 2, 2023
@asteel-gsa
Copy link
Contributor

@mogul mind if I get your assistance on this item?

@mogul
Copy link
Contributor

mogul commented May 25, 2023

Let's review whether the New Relic agent is in fact picking up logs on its own or not. If not, then we should

  1. configure a log drain
  2. using Terraform
  3. that points to the New Relic collection point

@mogul
Copy link
Contributor

mogul commented May 25, 2023

For the latter point about shipping logs to an S3 bucket for consumption by the GSA SOCaaS: There's a bullet up here about that which hasn't been broken out yet. Let's consider that out of scope for this particular issue. (When we have time to take it on, that would likely work from this example, though I'd like to use a Terraform module to implement that.)

@mogul
Copy link
Contributor

mogul commented Jul 20, 2023

@mogul mogul added the SHOULD Things we should do for the tracking epic to be "satisfactory" label Jul 29, 2023
@mogul
Copy link
Contributor

mogul commented Oct 13, 2023

@asteel-gsa, @akf has just gotten cg-logshipper into shape; you set it up to drain logs from Cloud Foundry, and it ships them to both New Relic and S3. Do you want to work on this issue sometime soon? If so we should probably meet and talk about what it would take to implement a Terraform module that deploys cg-logshipper.

@asteel-gsa
Copy link
Contributor

@mogul totally. So far our new relic implementation isn't there, pending resolution with NR support, but we can set it up whenever. Pending any backup/restore testing with JMM, or new relic suddenly working and digging into that task, should be free whenever to work on this with you.

@jadudm
Copy link
Contributor

jadudm commented Nov 7, 2023

@asteel-gsa , let me know when you want to consider this issue closed. The ticket has almost nothing up top.

Is acceptance "set up NR," or is it "set up NR and ship through cg-logshipper?"

@asteel-gsa
Copy link
Contributor

@asteel-gsa , let me know when you want to consider this issue closed. The ticket has almost nothing up top.

Is acceptance "set up NR," or is it "set up NR and ship through cg-logshipper?"

AC would be ship logs to NR via cg-logshipper

Let's leave this in backlog for now, we do want to do this, but only after we have confirmed all environments reporting to NR.

@asteel-gsa
Copy link
Contributor

@mogul now that we have new relic configured properly, do you want to find some time to get this implemented?

@mogul
Copy link
Contributor

mogul commented Nov 16, 2023

Sure, how about late this week or Monday the week after next?

@asteel-gsa
Copy link
Contributor

Works for me, we can aim for friday, ill put some time on the calendar

@mogul
Copy link
Contributor

mogul commented Nov 17, 2023

Alex and I spent a good chunk of time today to sketch out the details, and we've groomed the initial post accordingly. If anyone has questions or concerns about this approach, now's a good time to bring them up, before we break ground!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compliance Stuff which may relate to a specific requirement or timelines for resolution SHOULD Things we should do for the tracking epic to be "satisfactory"
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants