Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliced Job Template creates number of jobs as per the slicing count for the limited host #2893

Open
Ompragash opened this issue Dec 8, 2018 · 7 comments

Comments

@Ompragash
Copy link

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • API
  • UI
SUMMARY

SJT creates number of jobs as per the slicing count for the limited hosts.

ENVIRONMENT
  • AWX version: X.Y.Z
  • AWX install method: docker on linux
  • Ansible version: 2.7.1
  • Operating System: CentOS
  • Web Browser: Google Chrome(Latest)
STEPS TO REPRODUCE

Create a Inventory with multiple hosts
Create a SJT with multiple slices and select the above created Inventory
Now, Limit the SJT for one of the hosts from the provided Inventory
Launch the SJT

EXPECTED RESULTS

Only one job is created for the limited hosts even if the Job Slicing value is >1.

ACTUAL RESULTS

Multiple jobs are created for the limited hosts as per the Job Slicing count.

ADDITIONAL INFORMATION

Even if multiple jobs are created, only one succeeds rest everything fails with ERROR! Specified hosts and/or --limit does not match any hosts.
sjt-2
sjt-3

sjt-1

@donateur
Copy link

donateur commented Aug 5, 2019

This is very similar to a problem I've raised with Red Hat support on behalf of my client - although I'm not sure what SJT is.
We find that when re-doing failed hosts there may be fewer hosts than the number of job slices.

I suggested that AWX be modified to either:

  1. Only open as many slices as there are hosts (up to the total number of slices/instances)
    OR
  2. Ensure that instances with no hosts are marked as successful - nothing to do. It is confusing/wrong that they are marked as failed.

@domq
Copy link

domq commented Apr 22, 2020

SJT might mean Sliced Job Template.

@ryanpetrello ryanpetrello changed the title SJT creates number of jobs as per the slicing count for the limited host Sliced Job Template creates number of jobs as per the slicing count for the limited host Apr 22, 2020
domq pushed a commit to epfl-si/wp-ops that referenced this issue Apr 23, 2020
domq pushed a commit to epfl-si/wp-ops that referenced this issue Apr 23, 2020
domq pushed a commit to epfl-si/wp-ops that referenced this issue Apr 24, 2020
@domq
Copy link

domq commented Aug 13, 2020

The reason this happens is that ansible-playbook doesn't like being told to run for zero hosts; ansible-runner doesn't detect that situation (and is unwilling to change that behavior) and passes the failure upwards.

Possible approaches for a fix include

  • ask awx-runner team to reconsider
  • make it so the circumstance doesn't happen, as suggested in this bug's description, i.e. cap the parallelism to the actual number of hosts in the host limit set by AWX

domq pushed a commit to epfl-si/wp-ops that referenced this issue Aug 13, 2020
Patch awx-runner to not fail when running for zero hosts. This sort of
matches the first bullet point in
ansible/awx#2893 (comment)
@gforster
Copy link

Another possible idea is to allow the number to be selected with "prompt on launch." Of course that would only help with known quantities.

@donateur
Copy link

donateur commented Aug 20, 2020

Another possible idea is to allow the number to be selected with "prompt on launch." Of course that would only help with known quantities.

It would also require unnecessary manual action on behalf of the user.

EDIT and as to say if a user just clicks to re run on failed hosts they may not even be aware of how many there are.

@gforster
Copy link

Sure, I'm thinking in the case where you might normally want it split across 3 nodes, but then need to override for 1. Seems silly to have separate workflows to control that single variable. or change/saving each time. Or in the middle of a workflow where you know you only want it on less than the normal. Not the full solution for sure, but would be handy.

@kdelee
Copy link
Member

kdelee commented Oct 14, 2022

Given that ansible/ansible#76438 was rejected, and Controller as it is does not really have a way of knowing how many hosts may match a filter -- AFAIK the filter is passed to ansible and doesn't apply until runtime of the job -- controller makes its decision about how many slices to create BEFORE the filter is applied. The only thing I think we could possibly do since we've not been able to land fixes in ansible and runner is do some kind of preliminary "apply the filter to the inventory and see how many match" step.

This would almost be like a inventory update before the sliced job with a limit spawns its slices.

I'm thinking:

  1. a sliced job is launched with a limit applied, so we create it with dependencies_processed=false. Not sure on details here aboue WHEN it becomes a workflow job. But if it is a workflow job from the get-go, it will be new for workflow jobs to have dependencies_processed=false.
  2. We launch some kind of inventory update like process that does the thing to find out how many hosts the limit will cut the inventory down to, save this info as number of slices to spawn on the workflow/at this point decide what the workflow nodes will be, set dependencies_processed=true
  3. proceed as we do today, now the workflow is ready to run

I'm sure we could do something more elegant, but hacky way to do the inventory like update now might be approximated by what I can do on the CLI:

given an inventory file named hosts

[mygroup]
testhost[:100]

[foogroup]
matchinghost

This inventory has 103 hosts. But if I run

ansible -i hosts all --list-hosts --limit matchinghost

I get the output:

  hosts (1):
    matchinghost

Which tells me of my inventory with 103 hosts, only 1 matches the limit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

10 participants