How to crawl a quarter billion webpages in 40 hours
How to crawl a quarter billion webpages in 40 hours
there is an example which shows that crawling of 1/4 billion of webpages is possible in 2 days, WHEN:
"More precisely, I crawled 250,113,669 pages for just under 580 dollars in 39 hours and 25 minutes, using 20 Amazon EC2 machine instances."
Instance |
vCPU |
Arbeitsspeicher (GiB) |
Speicher |
Netzwerkleistung (Gbit/s) |
a1.xlarge |
4 |
8 |
Nur EBS |
Bis zu 10 |
with 80 vCPU and 160GB RAM and 500 gigabytes of outgoing bandwidth through the HTTP-requests, 1.69 Terabytes of downloaded content and 2800 agents
"According to this presentation by Googler Jeff Dean, as of November 2010 Google was indexing “tens of billions of pagesâ€. "
0
0
0
0
0
0
0
Info
Featured Posts
- ·
- · Dusko
- · Technology
- · dichipcoin
- · 3235 views
https://github.com/DuskoPre/AutoCoder/wiki
i'm not just testing LLMs but also creating a semi-liquid neural network (advanced LLM) with chromadb:
https://github.com/DuskoPre/liquid-neural-network-with-chromadb-cache
it seems there is a mayor upgrade for LLMs possible trough classifications (managing-system) of inputs to the LLM. the model is still static but trough the implementations of the classifier it can be made semi-liquid.
#dichipcoin