Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass PhEDEx metadata to FTS tasks for hadoop monitoring. #1085

Open
nataliaratnikova opened this issue May 8, 2017 · 37 comments
Open

Pass PhEDEx metadata to FTS tasks for hadoop monitoring. #1085

nataliaratnikova opened this issue May 8, 2017 · 37 comments
Assignees

Comments

@nataliaratnikova
Copy link
Contributor

Valentin requests that PhEDEx metadata are propagated to FTS.
The FTS logs will appear on HDFS, then he can run spark job over FTS logs to extract some context for the global transfers monitoring.

Here are the metadata Valentin is interested in:

  • attach PhEDEx request metadata to FTS request
    • metadata should include
      • issuer of the PhEDEx request
      • timestamp of the request
      • DN of the user of the request
      • details of the request, e.g. transfer dataset /a/b/c from site A to site B

How such metadata should be structure is a subject of FTS metadata
format/schema. I can't tell you more and you'll need to coordinate with FTS
developers. It would be nice to use the same schema among different CMS FTS
users, e.g. PhEDEx and ASO. This is why I CC'ed Diego since according to him
and, in fact, the FTS logs already carry ASO metadata, e.g.
job_metadata.issuer=ASO.

Since we're talking about meta-data its structure may change/adjusted over
time.

Here are typical examples:

  • if I'm placing a request to PhEDEx then metadata would be something like:
    {
    "issuer": "PHEDEX/user",
    "time": 123456,
    "client": "PHEDEX-client-version-1",
    "dn": "/my/DN/here",
    "request": {
    "dataset":"/a/b/c",
    "source": "T1_XXX",
    "destination": "T2_XXX"
    }
    }

  • if DDM placing a request you'll create
    {
    "issuer":"PHEDEX/DDM",
    ...
    }

@nataliaratnikova
Copy link
Contributor Author

See a similar opened issue :
#1041

@nataliaratnikova
Copy link
Contributor Author

ASO implementation of Job metadata passed to FTS :

https://github.com/dmwm/AsyncStageout/blob/master/src/python/AsyncStageOut/TransferWorker.py#L417

"job_metadata": {"issuer": "ASO","user": self.user } 

@vkuznet
Copy link

vkuznet commented May 9, 2017 via email

@vkuznet
Copy link

vkuznet commented May 9, 2017 via email

@nataliaratnikova
Copy link
Contributor Author

yes, the DB has the information on who submitted the transfer request and who decided on the approval, have a look at https://cmsweb.cern.ch/phedex/datasvc/doc/transferrequests API.
However the FTS submission is done by the site FileDownload agent with whatever proxy they specify in configuration. The FileDownload agent does not care who requested the files.

alberto-sanchez added a commit to alberto-sanchez/PHEDEX that referenced this issue May 18, 2017
@alberto-sanchez
Copy link
Member

Have address the extra info in the json file. This open the possibility of more option. Meanwhile I have tried to address the basic request of Valentin in order to distinguish the transfers from PHEDEX in his monitoring. If you want to test it, before a new release happen. just patch you perl_lib/PHEDEX/Transfer/Backend/Job.pm file.

@dciangot
Copy link

Hi all, sorry for the delay, but I forgot to subscribe this issue :/

Btw, summarizing on ASO side in order to be compliant with what proposed above we may need to add:
"dn": /abc/abc/...,
"time": 12315,
request:{
"source": T....
"destination": T...
}
IMO it probably doesn't make to much sense for CRAB case the field "dataset", while for example we may go for "taskname": "1231231_123123:user_taskname". What do you think?

While, @vkuznet , by application name/version you meant ASO or FTS client?

@belforte
Copy link
Member

I guess we also need to be consistent with what WMA/CRAB will report to ES via WMArchice and HTCondor classAd feeding. So that we can e.g. find both jobs and transfers for a given user acitvity.
See:
https://github.com/bbockelm/cms-htcondor-es/blob/master/README.md

@belforte
Copy link
Member

belforte commented May 19, 2017

given that issuer will be used as high level identifier to flag among different activities maybe we could just say DDM rather than PHEDEX/DDM and I would call CMS-user what was proposed as PHEDEX/user
In a way, while new user are expected to use PHEDEX to submit a transfer request and have a data manager approve it, we may imagine in the future that another client becomes available to users for asking FTS transfers (they can aldready do, of course, but clearly if we make it a bit more convenient they may do more).
Maybe even foresee "issuer":"CMS-group" and use the username field to indicate a group ?

Not that I like to make things complex and vague, but IIUC this naming schema is the fundation of monitoring work for next N years, may not want to hurry too much to a conclusion.

@vkuznet
Copy link

vkuznet commented May 19, 2017 via email

@belforte
Copy link
Member

about task names, let's refer to the documentation for ES which Brian wrote and I pointed to earlier. I am all for collecting all such descriptions in a single place, but not this issue.

@dciangot
Copy link

The taskname is in format YYMMDD_hhmmss:_

In any case please find below a proposed schema slightly different from one at the beginning of the thread:

{
"issuer": "ASO | PHEDEX/user | DDM | ...",
"time": 123456, (need to specify common time zone)
"client": {
service:"AsyncStaseOut_v1.0.8 | PHEXED-client-v4",
fts_client: ”blabla”
}
“user”: “username as from SiteDB”,
"dn": "/my/DN/here",
"request": {
“workflow”: “belforte_crab_Pbp_HM185_250_q3_v3”,
“CRAB_Workflow”: ”170406_201711:belforte_crab_Pbp_HM185_250_q3_v3”,
"dataset": "/a/b/c", (may be empty for CRAB)
"source": "T1_XXX",
"destination": "T2_XXX"
}
}

@vkuznet
Copy link

vkuznet commented May 19, 2017 via email

@vkuznet
Copy link

vkuznet commented May 24, 2017 via email

@belforte
Copy link
Member

belforte commented May 24, 2017 via email

@vkuznet
Copy link

vkuznet commented May 24, 2017 via email

@alberto-sanchez
Copy link
Member

Hi @vkuznet, I wonder if you can see in your monitoring the fts job:

dc6a077e-4640-11e7-b2fa-a0369f23cf8e

or

52f63152-464a-11e7-873e-a0369f23cf8e

I wonder if the metadata I put in there is visible?. Or could you please share the way you look at this, so we can have a look as well.

@vkuznet
Copy link

vkuznet commented Jun 2, 2017 via email

@alberto-sanchez
Copy link
Member

Hi Valentin yes, they are job_ids. They are from 2017-05-31

best regards

@vkuznet
Copy link

vkuznet commented Jun 2, 2017 via email

@nataliaratnikova
Copy link
Contributor Author

Hi,
I found both Alberto's job-ID's on FNAL fts server. I guess for the purpose of this test we need to submit to CERN FTS?

@vkuznet
Copy link

vkuznet commented Jun 2, 2017 via email

@alberto-sanchez
Copy link
Member

I have submitted to cern, the job_id is (few minutes ago)

f2857a08-47c0-11e7-9660-02163e018fe3

@vkuznet
Copy link

vkuznet commented Jun 4, 2017 via email

@alberto-sanchez
Copy link
Member

Hi Valentin,
Thanks a lot for looking at this. Yes, the metadata maybe incomplete, but from what I understand it is what we can do now, before adjusting the schema of the DB. Natalia can further comment on this. The objetive of the test was just make sure if we are able to see phedex transfers.

@nataliaratnikova
Copy link
Contributor Author

Hi All,
in the following parameters found by Valentin:
"job_metadata": { "client": "fts-client-3.6.8", "issuer": "PHEDEX", "time": "1495113054", "user": "phedex" },

"issuer": "PHEDEX"

  • looks fine to me and is consistent both with ASO (see their code snippet above) and Atlas jobs: {"multi_sources": false, "issuer": "rucio"} .

"client": "fts-client-3.6.8"

  • to avoid ambiguity and excessive parsing, I'd suggest to use the client command name and the result of the call with --version option, i.e.: "fts-transfer-submit": "client_version : 3.6.8"

"time": "1495113054"

  • need to clarify to which event this timestamp belongs. We could use name format similar to FTS fields: "tr_timestamp_complete": 1496428892196, "tr_timestamp_start": 1496428886331 .

"user": "phedex"

  • the name of the local user running the daemon does not have much value for transfers monitoring.. What people actually want to know the "activity", or group, for the requested transfer, see the related issue FTS3 backend: pass activity attribute in copyjob #1041. However, it will require substantial changes to the current TMDB schema and central agents to propagate this info down to the download agent, which actually submits the transfer jobs.

@nataliaratnikova
Copy link
Contributor Author

Valentin, do you khow where this "activity" field value is coming from? :

{ "activity": "PHEDEX",

thanks,
Natalia.

@vkuznet
Copy link

vkuznet commented Jun 6, 2017 via email

@vkuznet
Copy link

vkuznet commented Jun 6, 2017 via email

nataliaratnikova added a commit that referenced this issue Aug 14, 2017
@vkuznet
Copy link

vkuznet commented Oct 29, 2017

@nataliaratnikova could you please update me where do you stand on this issue? DId you implement required features? Did you propagate them to agents? Did you verified that these features now appearts in FTS logs?

@nataliaratnikova
Copy link
Contributor Author

nataliaratnikova commented Oct 30, 2017

@vkuznet The code is released, however we are not yet asking the sites to upgrade, because CERN reported high load on the servers when they upgraded to the new version of the agents, and this is not yet fully understood.
OTOH I see T2_GR_Ioannina site has upgraded and is successfully using PHEDEX_4_2_2: both new features are propagated as expected, see e.g. b622c26c-bd83-11e7-bdb4-02163e01811c on CERN fts:

INFO    Mon Oct 30 16:05:26 2017; Job metadata: {\"client\":?\"fts-client-3.6.10\",?\"issuer\":?\"PHEDEX\"}

@vkuznet
Copy link

vkuznet commented Oct 30, 2017 via email

@davidlange6
Copy link

hi all- i'm curious as to the status of this effort? It naively looks like few if any of the non-aso cms transfers have a job_metadata field (well, 0 of the first 1000 I looked at)

@nataliaratnikova
Copy link
Contributor Author

@davidlange6
Copy link

hi @nataliaratnikova - a belated reply -

i'm looking at fts records in hadoop. Ones from atlas or from crab have useful metadata, ones from phedex do not. But I was looking at T2s and could easily have missed two(!!) of them that were using the new version. That is certainly far below a useful threshold for me. Is there a planned time scale for completing this?

@nataliaratnikova
Copy link
Contributor Author

Hi @davidlange6, the development part is complete. For deployment I do not have any particular goal within PhEDEx project, as this is not for our internal use. If you have a dependent milestone, we can bring this up with the site support team and ask for their help with the upgrade.

@davidlange6
Copy link

i'm not sure what a dependent milestone is, sorry - it would be great that this was all deployed before routine data taking starts this year...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants