This post has been migrated and imported into different systems over the years, I have not had a chance to format this post manually so it may be hard to read but I have left it here as a reference.
This is a quick guide on how you can create your own search engine similar Google, Yahoo or Bing. We will be using PHP and MySQL to create a search page, a results page and a simple parser to index pages. We will begin by creating the basic structure for the site as seen in the image below.
Here I have created a folder called “searchengine” with two php files called “index.php” and “results.php”. The index file simply contains the search form which will make a post request to the results page which will in turn display any relevant search results. I have also added an assets folder with a css folder which holds a stylesheet to format our pages. Let’s get started, below I have created a basic layout for the main page.
We simply have a heading which says “My Search Engine” and then our search box which is a form with a text field and a submit button. Clicking the button will post the search value to our results page and allow us to use it. Below I have created the basic structure for the results page.
Here we have an array of errors and an array of results. The results array is untouched and empty for now, but the errors array will contain any errors that we may need to show in case no search term is entered or if the user accesses the page in the wrong way.
In the picture above you will see more of the code for the results page. Here we check for any errors, if there are errors we display them, otherwise we display any search results. If there are no search results we will have no output on this page at the moment.
We can add an example search result to our array using the code shown above. If we refresh the page you will see the result displayed.
At this time I have gone ahead and added some simple formatting to the site in order to make results more readable. For this example it will be enough. If you are thinking about launching a search engine you would probably want to consider hiring a designer to make something really nice.
I have created a classes folder with a Database class. This will allow us to both store web pages and then retrieve them when a user searches for the relevant terms.
Above you will see the basic structure for the database, in this example I will only store the title, description and url of each page. A large search engine like Google will most likely store many more parameters in order to provide users with better results.
Above I have used a simple MySQL query to create a sample result so that we can test the current version of our script.
Now if we do a simple search for the word “title” you will see that we get the search result we just added to our database. If you perform a search for another term you will find that the results page will not display any results.
In the same way as we did the search page we will now create a parser page which will allow us to index urls. I will once again create a form with one text field where an url can be added. Below is the code that will handle the added url and parse the title and description and insert it into the database.
Here we use the url along with curl to grab the content of the page, which is the DOM or HTML. We then use DOMDocument along with XPath to parse the content for the information we want. After we have found it we insert the information and the url into our database so that it can be searched at a later time. If you would like to expand on this example in the future I would recommend using a cron job which picks a url from the database and finds all the links on the page and then indexes those urls. In this way you keep indexing new pages and adding them to your results, eventually you could index every single page out there. This is the basic concept of a search engine spider.
Above is an example of the code used to insert our parsed page information into our database. This query is done with PDO.
So what we can do now is start entering urls that will be indexed. I will use some urls from my blog to add some search results.
As you can see the database is now populated with 3 of my links.
And we end up with this, our results page after a search for “Markus Tenghamn”. You can try the live search engine (only populated with these three results) at the following url: http://markustenghamn.com/searchengine/
Get the complete source code along with the database structure for only $5