· 2020-06-20T08:09:52+0000 · 738 posts

How to crawl a quarter billion webpages in 40 hours

there is an example which shows that crawling of 1/4 billion of webpages is possible in 2 days, WHEN:

"More precisely, I crawled 250,113,669 pages for just under 580 dollars in 39 hours and 25 minutes, using 20 Amazon EC2 machine instances."

Instance	vCPU	Arbeitsspeicher (GiB)	Speicher	Netzwerkleistung (Gbit/s)
a1.xlarge	4	8	Nur EBS	Bis zu 10

with 80 vCPU and 160GB RAM and 500 gigabytes of outgoing bandwidth through the HTTP-requests, 1.69 Terabytes of downloaded content and 2800 agents

"According to this presentation by Googler Jeff Dean, as of November 2010 Google was indexing â€œtens of billions of pagesâ€. "

0 0 0 0 0 0 0

Attachments

Comments (0)

Info

Category:

Technology

Created:

2020-06-20T08:09:52+0000

Updated:

2020-06-20T10:23:40+0000

the first proof of concept semi-liquid neural network (advanced LLM) run positive (SELI NEORAD for short)

https://github.com/DuskoPre/AutoCoder/wiki i'm not just testing LLMs but also creating a semi-liquid neural network (advanced LLM) with chromadb: https://github.com/DuskoPre/liquid-neural-network-with-chromadb-cache it seems there is a mayor upgrade for LLMs possible trough classifications (managing-system) of inputs to the LLM. the model is still static but trough the implementations of the classifier it can be made semi-liquid. #dichipcoin