GOSH Community Forum

Specifying hardware components

I am making a new thread to continue the slightly off topic discussion I was having with @ryanfobel in the Product Deloper job thread.

Both there and on a call we were discussing the issues of specifying parts such as nuts and bolts. Even if you give an exact product number, someone in a different country still often has problems finding parts.

Kitspace handles this nicely for electronics components using Octopart, but this is easier for electronics components as they tend to have a fixed part number that is available in most countries. Using Octopart to look for hardware like bolts was a bit of a nightmare. In GitBuilding (a program to do things like bill of material generation for hardware documentation) we are using YAML files to specify parts and this allows multiple suppliers for each part.

Problem: How to generate libraries of parts which work in lots of countries?

My thoughts so far were to crawl a couple of websites and find common hardware components and build the library. My script is here, the libraries are auto-generated and end up here (download button on right hand side).

I was able to crawl Westfield Fasteners website as they have a really detailed product site-map, and pull out lots of information. The next step was to look at other websites. So far I have tried:

  • RS comonents - Painful as they are quite inconsistent in what they specify and how
  • Grainger - Also quite inconsistent, not as bad as RS - Website is a bit java script heavy which made it hard to get info from in a script
  • McMaster Carr - See below

McMaster Carr seemed the obvious choice as the website is so clean, especially on product pages once filtered. Unfortunately I cannot link to one of these filtered pages as the website is almost 100% javascript and the URL often does not change as you move between pages.

I contacted McMaster Carr about using their API to get part numbers and add it to our library so they would be listed in bills of materials. There response was:

Thanks for your interest in including McMaster-Carr in your software. We wish you the best of luck with your project, but we respectfully ask that you not scrape our website as we consider our product information to be confidential.

Which seems really weird to me.

My best solution so far

My best solution so far is to stick with McMaster Carr as their stock is so complete and their website easy to use. If you add something at the end of the URL it searches for it. As screws tend to have an ISO or a DIN standard which specifies them quite precisely I used this and it works OK
eg. https://www.mcmaster.com/ISO-4762
Add too much information and it goes wrong:
eg. https://www.mcmaster.com/ISO-4762-Stainless-Steel

My solution so far is to link to specify the properties such as material, thread, length, and then to link to McMaster using the standard code so the correct screw type is shown. Unfortunately McMaster only specified ISO or DIN but not both (and it changes based on product) so I am generating links for both.

Example

Using this library I auto generated I have have specified the nuts, bolts, and washers for our Block Stage documentation. This gives outputs like:


The rendering of the information needs a bit of tidying up.
The (0) and (1) are because I cannot currently specify 2 things for the same supplier name (Issue).

Help/ideas welcome
Any ideas on where to go next? (@gbathree @kaspar @ryanfobel @amchagas)

Sorry for the really long post!

2 Likes

This seems to be a very common attitude among component distributors. Some electronics distributors employ some pretty advance anti-scraping techniques, they are downright hostile towards scraping. I don’t quite understand it either but my best guess is that they feel market transparency is not in their best interest or they fear competitors using this data.

I would encourage you to not pay too much heed to their email wish as:

  1. Legally: IANAL but at worst, this is a grey area. The laws are evolving and vary greatly by jurisdiction. In the US there was big win recently by people that were scraping LinkedIn. Copyright is the main concern and while you can’t copyright facts you can copyright an arrangement of data, so just make sure you scrape the facts and not the data? ¯\_(ツ)_/¯
  2. Being a good netizen: if you look at their robots.txt they actually seem to allow crawling of the paths you care about e.g. /screws/.

The reason their robots.txt allows that is because they actually want to be scraped by big search engines (i.e. Google). I see it as unfair towards upstarts and small players if they tell you Google can do it but not you. Octopart has managed to open things up a bit in the electronics world by getting a critical mass of customers through their search and providing a (albeit, now quite expensive) API.

As for technically doing the scraping on a JS heavy site. I would try it with Chrome Headless and Puppeteer.

What about scraping anglianfasteners.co.uk? They don’t even seem to have a robots.txt.

A smaller comment on the UX of the table in the screenshot: the use of colour in the page is drawing attention away from things people are actually looking for.

1 Like

Thanks @kaspar.

I spent today playing with selenium and headless Firefox. Lots of fun. So yeah, Javascript is not an insurmountable barrier.

As for http://anglianfasteners.co.uk, I am not sure what I could scrape from them. There website seem devoid of product numbers or a way to order. I think you just phone them to ask for stuff?

I am so new to this I never thought to look at the Robots.txt. Luckily the only thing Westfield Fasteners have is a helpful link to their sitemap so you your crawler can navigate efficiently.

Improving the UX for GitBuilding should be a priority in v0.4 or 0.5. I have pushed into master (for release in v0.4) the ability to customise the CSS. But the default needs to be better. The navigation needs work to, such as links to the next/previous page at the bottom. This will get fun with the new step links, I have an idea, but I should probably draw a picture to explain it.

Ah yeah, I meant to look at accu.co.uk, not Anglian. They do have some sort of generic robots.txt.

What would be interesting to me is if we could integrate scrapers into kitspace/partinfo. Extending it to provide info for non-electronic parts (maybe focus on machine screws at first). I have, so far, integrated RS and LCSC.com scrapers successfully which are used in addition to the Octopart API (as Octopart data was lacking for these vendors).

Partinfo is being used by Kitspace but also by other tools such as KiCost and HorizonEDA. Integrating scrapers there will make it easier to use this data across different tools. It would also pave the way for having a 1-click BOM like functionality for these parts.

We already have a data schema there which should work for screws as well. The schema for specs is quite loose:

type Spec {
    key   : String
    name  : String
    value : String
  }

which could be something like

{
  "key": "current_rating",
  "name": "Current Rating",
  "value": "10.0 A"
}

The reason it’s quite loose is because the relevant specs vary greatly from component to component. The format here is just cribbed from the Octopart API but we could extend this to fit machine screws and document it.

Accu are a great company, have not tried their site yet. [Edit - Their xml sitemap is just a link to more sitemaps with full names for every product including DIN/ISO specs!]

I am all for getting more scrapers into partinfo. I suppose for screws the big problem is matching like for like. Because few companies specify the ISO/DIN specs. Without these you are on words, however people cannot agree on language. For example:

  • Set screw - in the UK it specifies a purpose, to push against an object to stop it moving. In the US it is a shape of screw with no head (called a grub screw in the UK as it looks like a little grub). Now grub screws normally are set screws, but hex head screws are often used a set screws to leading to some UK suppliers calling a hex head bolt a Hex head set screw.
  • Hex bolts - Tend to come in partially or fully threaded forms. Many sites just call them this both sides of the pond (Westfield, McMaster), others such as BoltDepot call partially threadded ones “Hex bolts” and fully threaded ones “Tap Bolts”
  • A2 Stainless Steel - Generally called 18-8 in the US, sometimes Type 304, has other names in some other countries.

You could argue bolts are open hardware. There is a well specified design to make something that should work anywhere. They specify the pitch, thread diameter, head shape, etc. The problem is the name is always an ISO or DIN number so everyone uses a weird combination of words that were never defined consistently.

Once we begin to mix languages it gets even more iffy. Buying screws tends to somewhat rely on the user of the website reading the text and looking at the diagrams. Which is why I think it will always take some manual interventions to look at the naming convention on each site. This is the problem I found with octopart and RS, they are internally inconsistent as they take information from many suppliers/manufacturers.

On another note the boltdepot.com is a very simple website it seems. The issue there is their stock (especially for metric stuff) is quite limited. No brass nuts for the OpenFlexure actuators…

Hi Julian,

Sorry I don’t have any suggestions on where to go next… I think the only addition I could make here is the website we normally used back in Germany to buy such parts (which is also in english), that had a neat organization of the parts/components: https://www.screwsandmore.de/en/product-range/

I also remember they were quite responsive whenever we needed things, maybe they have a more positive approach towards scrapping?

1 Like

Thanks @amchagas. This is great, they have a really complete sitemap which makes getting all the part numbers super easy. I will try to make everything have Westfield, Accu, and Screwsandmore.de part numbers soon.

I did play a bit with doing headless browsing of McMaster Carr, while it was possible it was an uphill battle. Reading the Terms and Conditions which prohibit scraping, and looking at legal advice I came to the conclusion that even if it is probably completely legal there is no point in taking the slight risk just to drive traffic to their website. Hopefully we can find somewhere else in the US that is more friendly.

On the topic of finding places that are friendly, “scrape” seems to be a dirty word. Perhaps partly how we present it is better. I am coming to realise that I should just look up the DIN/ISO specs and their preferred sizes then go hunt for these on website. As such we are not scraping lots of information, we are just using the site map to “crawl” for necessary parts and curating the links for easy access. Companies seem much happier with “crawling” than “scraping”.

2 Likes

I now realise that this statement was incredibly optimistic. The ISO specs, reference ISO specs, which reference more specs, and it is specs the whole way down. Once you have specs for lengths and threads and pitches, you then realise some screws follow different conventions.

Long story short, I have made a python library called hardspec which calculates and looks up a number of things like dimensions for screws/Nuts/Washers. It also has SVG diagrams for the components. Documentations is not good yet, but there is an example which looks like this:

Current the library has:

  • Some basic info about some material naming including
    • Brass
    • A2 Stainless Steel (18-8 in the US)
  • Metric threads as defined in ISO 261
  • Preferred metric threads as defined in ISO 262
  • Preferred screw and thread lengths as defined by ISO 888 (Frustratingly this is not followed by Socket Head screws that are most commonly used!!!)
  • Hex Bolt - Fully threaded (ISO 4017/DIN 933)
  • Hex Bolt - Partially threaded (ISO 4014/DIN 931)
  • Socket Head Cap Screw (ISO 4762/DIN 912)
  • Socket Head Button Screw (ISO 7380-1)
  • Hex Nut (ISO 4032/DIN 934)
  • Washer (ISO 7089/DIN 125A)

Anyway this “just” looking up the ISO standards was about 1,200 lines of code, and shows that I really need a better hobby for the Christmas break next year.

Next stage is making the Web Crawler from above use hardspec to find matching hardware, on a few sites.