Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WP_CLI create-guest-authors creates profiles for ALL users #553

Open
TheCrowned opened this issue Jul 4, 2018 · 11 comments
Open

WP_CLI create-guest-authors creates profiles for ALL users #553

TheCrowned opened this issue Jul 4, 2018 · 11 comments

Comments

@TheCrowned
Copy link
Contributor

wp co-authors-plus create-guest-authors will create guest authors profiles for all WP users, including subscribers, which we wouldn't want to ever add as coauthors, and uselessly pollutes the database.

@sboisvert
Copy link
Contributor

@smistephen Do you want to take a look?

@sboisvert sboisvert self-assigned this Jul 24, 2018
@smistephen
Copy link
Contributor

The offending line of code seems to be https://github.com/Automattic/Co-Authors-Plus/blob/master/php/class-wp-cli.php#L28, which just plucks out get_users() without any qualifiers.

My proposed solution is to simply whitelist the roles that get profiles created for them.

We could do this via changing the args to get_users() to get_users( array( 'role__in" => array( 'administrator', 'editor', 'author' ) ) ), but that might get kind of slow with a bunch of LIKE queries on meta values:

SELECT wp_users.* FROM wp_users INNER JOIN wp_usermeta ON ( wp_users.ID = wp_usermeta.user_id ) WHERE 1=1 AND ( 
  ( 
    ( wp_usermeta.meta_key = 'wp_capabilities' AND wp_usermeta.meta_value LIKE '%\"administrator\"%' ) 
    OR 
    ( wp_usermeta.meta_key = 'wp_capabilities' AND wp_usermeta.meta_value LIKE '%\"editor\"%' ) 
    OR 
    ( wp_usermeta.meta_key = 'wp_capabilities' AND wp_usermeta.meta_value LIKE '%\"author\"%' )
  )
) ORDER BY user_login ASC

We could also do the processing on the back-end, like this:

$users_unfiltered = get_users();
$users            = array();

// Roles to generate guest authors for.
$role_whitelist = array( 'administrator', 'editor', 'author' );

foreach ( $users_unfiltered as $user ) {
	foreach ( $user->roles as $role ) {
		if ( in_array( $role, $role_whitelist, true ) ) {
			// Using the user ID ensures that users that would otherwise
			// be in the list twice overwrite each other.
			$users[ $user->ID ] = $user;
		}
	}
}

// Discard user ID indices, rebase to 0.
$users = array_values( $users );

Tomorrow, I'll generate a bunch of users and pit the two against each other.

I'll probably just write a bash script to call wp user create 1000 times, if anyone has a better idea I'm open to it.

@smistephen
Copy link
Contributor

Alright, faceoff time.

First step, generate 1000 authors.

#!/bin/bash
for i in {1..1000}
do
RANDOMSTRING=$(openssl rand -hex 4)
wp user create $RANDOMSTRING $RANDOMSTRING@example.com --role=author
done

Backup the database so we can restore it and run the test again with different code.

mysqldump wordpress > dbdump.sql

First up is changing the get_users() call. Let's see how long it takes:

time wp co-authors-plus create-guest-authors

All done! Here are your results:
- 1001 guest author profiles were created
- 0 users already had guest author profiles

real   2m2.675s
user   1m2.432s
sys    0m9.024s

Not bad.

Now let's restore the database:

mysql wordpress < dbdump.sql

Revert our first change, institute our second, and run it again:

All done! Here are your results:
- 1001 guest author profiles were created
- 0 users already had guest author profiles

real   2m3.556s
user   1m3.560s
sys    0m8.004s

Welp.

Hey, there's a saying in science: "Any result's a result."

Clearly, either one would be fine. And in those cases, I choose the one-liner that uses the WordPress API over the complicated custom algo every time.

Next up: implementing the victorious approach, and testing the heck out of it.

smistephen added a commit to smistephen/Co-Authors-Plus that referenced this issue Jul 27, 2018
…ommand.

Previously, as noted in Automattic#553, `wp co-authors-plus create-guest-authors` created a guest author for every user in the database.

This was a massive performance suck, as well as being largely unnecessary. Adding a subscriber as a guest author is not a common use case.

Now, only users having the role of administrator, editor, and author have guest authors generated for them.

The query to filter users by role is a touch heavy (it performs LIKEs on meta values), but testing it on a site with 1000 users showed no major issues.

It was also compared to simply querying all the users, and doing the filtering in PHP. Head to head, there was no significant difference in performance between the two approaches, and using the WordPress API is always a better bet than writing your own one-off custom filter algorithm.
@sboisvert
Copy link
Contributor

How are users which are subscribers handled? Some sites can have thousands (or hundreds of thousands) of subscribers. Does it impact the two different methods?

@smistephen
Copy link
Contributor

By default, subscribers are excluded, but can be included if the end user wants.

I re-ran the faceoff with 5000 subscribers included, and still saw no significant difference in the run times. It appears that any difference between the two methods of filtering is by far drowned out by the time it takes to process the result.

@sboisvert
Copy link
Contributor

How would this work on a multisite?
I have a feeling that this kind of queries on WordPress.com (with many millions of users) wouldn't end well. Similarly on any multisite with 100s of thousands or millions of users.

@smistephen
Copy link
Contributor

Perhaps we could ban them from generating guest authors for subscribers?

@smistephen
Copy link
Contributor

They could still do it manually, in the rare case they wanted to make a subscriber a guest author, but they wouldn't be able to do it en masse.

@smistephen
Copy link
Contributor

I've added some code to that effect: #575

@sboisvert
Copy link
Contributor

While creating the guest authors for subscribers might be a problem, I'm concerned about how the SQL query against the DB would be with a join and a LIKE against an unindexed field on tables with 100K-100M rows

@smistephen
Copy link
Contributor

I share your concern, and I think I've come up with a way to bridge the gap between the two approaches.

The problem with the database query is that it has the potential to put tremendous strain on the database server.

The problem with back-end processing is that it has the potential to put tremendous strain on that server (fetching an array with ten million elements would blow out most servers).

The solution? Combine a simple database query with batched back-end user processing.

Rather than 1 enormous query, or 1 monolithic array, we make 10,000 tiny queries for 1000 users each, and filter them as we go.

The code I just pushed to #575 should scale much better, though nothing will make processing ten million rows "fast", haha.

@sboisvert sboisvert removed their assignment Sep 22, 2021
@GaryJones GaryJones added this to the 3.6 milestone Jul 27, 2023
@alecgeatches alecgeatches modified the milestones: 3.6, 3.7 Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants