New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert compatible pypi package names to the ones used in pypi #893
Conversation
To convert requested pypi coordinates to coordinates containing the standardized name in pypi Tasks: clearlydefined#860
No need to fix harvest queueing post call because the crawler uses the retrieved (normalized) coordinates to store/locate harvested files. Tasks: clearlydefined#860
@nellshamrell This pull request is dependent on #891 |
hmm...just checked out this branch locally and the tests passed fine. Seeing if there is something up in the build system. |
Actually, I had the wrong branch checked out. Trying again locally. |
yep, looks like it's failing locally as well |
Thanks for the feedback. I will try to set up another clean branch and have a look. |
Looks like the failure started in this commit |
I think I know what it is :) "test/lib/entityCoordintates.js" should be "test/lib/entityCoordinates.js" (No worries! I do stuff like that all the time!) |
Thanks for catching the naming mistake. I will fix that. |
Still have a failure - but I think I know what's up. Confirming. |
} | ||
|
||
async map(coordinates) { | ||
const mapper = this.mappers[coordinates?.provider] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's something going on with this line that's causing the SyntaxError: Unexpected token '.'
(which I only found by running the mocha tests in debug mode in VS Code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see this error when parserOptions.ecmaVersion is changed to 9 in .eslintrc.json. This should be addressed by my eslint upgrade commit. Does eslint in build pipeline use config in formats other than json? See https://stackoverflow.com/questions/61628947/eslint-optional-chaining-error-with-vscode
You can isolate the error by running just the coordinatesMapper test with |
I think it's still not able to process the optional chaining operator when running tests, even though eslint is updated |
Ah, that operator is a feature of Node 14. Checking what version of node we are using... |
Yep, we're using Node 12 in our Docker builds. We can look into updating to Node 14 (might be risky, but likely worth doing sometime soon) or rewrite the code to be compatible with Node 12. (I'm not sure yet which we should do - open to opinions! In the meantime, I'm going to play with upgrading to Node 14 and seeing what breaks) |
Just noticed Node.js installed in our build pipeline is 12.x as well. See azure-pipelines.yml |
384a8ed
to
c8b777f
Compare
Update to Node 14 is complete! Reviewing this again. |
This overall looks great! I'm going to do some final testing tomorrow then should be able to deploy it :) Thank you so much for this outstanding work @qtomlinson! |
@nellshamrell Thanks for the feedback! A couple of points to be discussed:
|
Hi @qtomlinson!
Great work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I will deploy this tomorrow :)
Pypi package management system treats ".", "-", and "_" as the same. For example, the following three urls yield the same component, backports.ssl_match_hostname.
https://pypi.org/project/backports.ssl.match.hostname/
https://pypi.org/project/backports_ssl_match_hostname/
https://pypi.org/project/backports-ssl-match-hostname/
During crawling, package coordinates internally stored in the harvest store are consistent with the package management repository. Harvested information can only be retrieved based on the same standardized coordinates.
To allow retrieval of pypi package information via compatible names, added converting input package names to the ones used in the python package index. This conversion is done at all retrieval api entry points, and by querying pypi index (same as in crawler). The query result is cached for later reuse and set to expire after configured life span.
Tasks: #860