How to Check Broken Links with 404 Error in Python

Links is one of the critical SEO factors for a Website. When creating or re-designing some pages, we cannot ignore Website audit especially in terms of finding and tracking broken links regularly. Although there are many online tools, it is still worth learning and implementing such a tool ourselves for fun. To simplify the programming complexity, I decided to use Python. With powerful Python libraries, it is fairly easy to crawl Web pages, parse HTML elements and check server response code.

Read more

How to Check Alexa Rank in Java

Since we have known how to check PageRank in Java, let’s strengthen our tool by adding the functionality of checking Alexa rank today.

To get the Alexa rank, you can use following code or read the excellent tutorial.

import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class AlexaRank {

	public static int getAlexaRank(String domain) {

		int result = 0;

		String url = "http://data.alexa.com/data?cli=10&url=" + domain;

		try {

			URLConnection conn = new URL(url).openConnection();
			InputStream is = conn.getInputStream();

			DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
					.newDocumentBuilder();
			Document doc = dBuilder.parse(is);

			Element element = doc.getDocumentElement();

			NodeList nodeList = element.getElementsByTagName("POPULARITY");
			if (nodeList.getLength() > 0) {
				Element elementAttribute = (Element) nodeList.item(0);
				String ranking = elementAttribute.getAttribute("TEXT");
				if(!"".equals(ranking)){
					result = Integer.valueOf(ranking);
				}
			}

		} catch (Exception e) {
			System.out.println(e.getMessage());
		}

		return result;
	}
}

Now, let’s update our Java code to write Alexa rank to Excel files.

public void getPageRankAndAlexaRank() {
		// TODO Auto-generated method stub		
		try {
			InputStream excelFile = new FileInputStream(mFileName);
			XSSFWorkbook wb = new XSSFWorkbook(excelFile);
			XSSFSheet sheet = wb.getSheetAt(0);
			XSSFRow row;
			XSSFCell cellPR, cellAR;

			Iterator<Row> rows = sheet.rowIterator();

			int col = 0, colPR = 1, colAR = 2;
			int pageRank = 0, alexaRank = 0;
			String url = null;
			String log = null;

			while (rows.hasNext()) {
				row = (XSSFRow) rows.next();
				url = row.getCell(col).getStringCellValue();
				if (url.matches(Utils.REGEX)) { // check whether URL is valid
					System.out.println(url);
					pageRank = PageRank.get(url); // check page rank
					alexaRank = AlexaRank.getAlexaRank(url); // get alexa rank

					// write PageRank to excel
					cellPR = row.createCell(colPR);
					cellPR.setCellValue(pageRank);

					cellAR = row.createCell(colAR);
					cellAR.setCellValue(alexaRank);

					log = "PR = " + pageRank + ", AR = " + alexaRank;
					if (mEventListener != null) {
						mEventListener.log(log);
					}
					System.out.println(log);
				}
				else {
					System.out.println("URL not valid");
				}

				System.out.println("--------------------------");
			}

			FileOutputStream out = new FileOutputStream(mFileName);
	        wb.write(out);
	        out.flush();
	        out.close();
		}
		catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

Download the updated source package here. Please feel free to contact me at {desmond at dynamsoft dot com} if you have any question.

How to Read and Write Excel Files in Java

Previously, I talked about how to check PageRank in Java. In this article, I will combine PageRank with Excel files. To operate Excel files in Java, I used Apache POI – the Java API for Microsoft Documents. Apparently, you can download the API package, and follow the relevant tutorials to learn how to use POI. When I first downloaded the Java library, I was astonished by how many jar files there were. I had no idea which ones would be useful for my project. Fortunately, I managed to filter out what I want.

Read more