Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert compatible pypi package names to the ones used in pypi #893

Merged
merged 11 commits into from Dec 2, 2021

Conversation

qtomlinson
Copy link
Collaborator

Pypi package management system treats ".", "-", and "_" as the same. For example, the following three urls yield the same component, backports.ssl_match_hostname.
https://pypi.org/project/backports.ssl.match.hostname/
https://pypi.org/project/backports_ssl_match_hostname/
https://pypi.org/project/backports-ssl-match-hostname/

During crawling, package coordinates internally stored in the harvest store are consistent with the package management repository. Harvested information can only be retrieved based on the same standardized coordinates.

To allow retrieval of pypi package information via compatible names, added converting input package names to the ones used in the python package index. This conversion is done at all retrieval api entry points, and by querying pypi index (same as in crawler). The query result is cached for later reuse and set to expire after configured life span.

Tasks: #860

To convert requested pypi coordinates to coordinates containing the standardized name in pypi

Tasks: clearlydefined#860
No need to fix harvest queueing post call because the crawler uses the retrieved (normalized) coordinates to store/locate harvested files.

Tasks: clearlydefined#860
@qtomlinson
Copy link
Collaborator Author

@nellshamrell This pull request is dependent on #891

@nellshamrell
Copy link
Contributor

hmm...just checked out this branch locally and the tests passed fine. Seeing if there is something up in the build system.

@nellshamrell
Copy link
Contributor

Actually, I had the wrong branch checked out. Trying again locally.

@nellshamrell
Copy link
Contributor

yep, looks like it's failing locally as well

@qtomlinson
Copy link
Collaborator Author

Thanks for the feedback. I will try to set up another clean branch and have a look.

@nellshamrell
Copy link
Contributor

Looks like the failure started in this commit

@nellshamrell
Copy link
Contributor

I think I know what it is :)

"test/lib/entityCoordintates.js" should be "test/lib/entityCoordinates.js"

(No worries! I do stuff like that all the time!)

@qtomlinson
Copy link
Collaborator Author

Thanks for catching the naming mistake. I will fix that.

@nellshamrell
Copy link
Contributor

Still have a failure - but I think I know what's up. Confirming.

}

async map(coordinates) {
const mapper = this.mappers[coordinates?.provider]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something going on with this line that's causing the SyntaxError: Unexpected token '.' (which I only found by running the mocha tests in debug mode in VS Code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see this error when parserOptions.ecmaVersion is changed to 9 in .eslintrc.json. This should be addressed by my eslint upgrade commit. Does eslint in build pipeline use config in formats other than json? See https://stackoverflow.com/questions/61628947/eslint-optional-chaining-error-with-vscode

@nellshamrell
Copy link
Contributor

You can isolate the error by running just the coordinatesMapper test with npx nyc mocha test/lib/coordinatesMapper.js

@nellshamrell
Copy link
Contributor

I think it's still not able to process the optional chaining operator when running tests, even though eslint is updated

@nellshamrell
Copy link
Contributor

Ah, that operator is a feature of Node 14. Checking what version of node we are using...

@nellshamrell
Copy link
Contributor

nellshamrell commented Nov 29, 2021

Yep, we're using Node 12 in our Docker builds. We can look into updating to Node 14 (might be risky, but likely worth doing sometime soon) or rewrite the code to be compatible with Node 12.

(I'm not sure yet which we should do - open to opinions! In the meantime, I'm going to play with upgrading to Node 14 and seeing what breaks)

@qtomlinson
Copy link
Collaborator Author

Just noticed Node.js installed in our build pipeline is 12.x as well. See azure-pipelines.yml

@qtomlinson qtomlinson force-pushed the pypi-fix branch 2 times, most recently from 384a8ed to c8b777f Compare November 30, 2021 20:05
@nellshamrell
Copy link
Contributor

Update to Node 14 is complete! Reviewing this again.

@nellshamrell
Copy link
Contributor

This overall looks great! I'm going to do some final testing tomorrow then should be able to deploy it :) Thank you so much for this outstanding work @qtomlinson!

@qtomlinson
Copy link
Collaborator Author

qtomlinson commented Dec 1, 2021

@nellshamrell Thanks for the feedback! A couple of points to be discussed:

  1. Naming of Class: CoordinatesMapper, it maps/converts the user specified coordinates to the standard one used in the package repository. Do you have suggestion for naming?
  2. Package location for the added classes: CoordinatesMapper and PypiCoordinatesMapper. CoordinatesMapper is responsible for caching and delegate to the provider specific classes for coordinates retrieval. The classes are currently in /lib. Is there a better code location for them?
  3. The default for life span of mapped coordinates in cache is currently set to 1 day. Should this default be updated to a longer period?

@nellshamrell
Copy link
Contributor

Hi @qtomlinson!

  1. I think "CoordinatesMapper" is fine as a name
  2. Being in /lib is fine
  3. The cache being one day is also fine

Great work!

Copy link
Contributor

@nellshamrell nellshamrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I will deploy this tomorrow :)

@nellshamrell nellshamrell merged commit edcefee into clearlydefined:master Dec 2, 2021
@qtomlinson qtomlinson deleted the pypi-fix branch December 6, 2021 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants