- The download contains one zipped file with the following format: two 26 character long string identifiers are written in a line separated with TAB character. Every line defines a directed edge of our webgraph. The identifiers correspond to nodes, first one describes the tail, second one does the head.
For example, a row of the form "01324moja6i5ghdbhfe94iou9e 5ltb8q97ou2ui154lc4ohc9pqt" means that a URL in a domain, with ID "01324moja6i5ghdbhfe94iou9e" links to a URL in the domain of ID "5ltb8q97ou2ui154lc4ohc9pqt". If you want to know the actual domain names behind these ID's, you can translate them by using our Domain dictionary.
The above examples translate to www.autocluster.hu and www.matech2000.hu, respectively.
- Note, that the 26 character identifiers are maintained consistently from version to version: the same identifier is mapped to the same domain-name in each version of the graph.
The file does not contain the information which identifier covers a certain domain, because the commercial use of this database is discouraged. It is only for scientific research. Conceivably, from scientific reason a researcher wants to know which real life domain is covered with a certain identifier. This feature is available at the Domain dictionary menu item.
Presently our webgraph maps only the domains under the TLD .hu and .eu (TLD is an abbreviation of Top Level Domain).
Next step is to collect data about other European domains. Within a few month we plan to crawl the entire web.
To download our database navigate here