Overview of implementation
- Create ASP.NET web site
- Connect to Content Database for SharePoint
- Execute query
- Display data on page
Requirements
Microsoft Visual Studio 2005 (You could conceptually use Visual Studio 2003, or notepad also)
Implementation Open up Microsoft Visual Studio 2005 and create a new web. Create a connection to your content database. If you don't know which one that is, just look through your SharePoint database until you find one that has a "Docs" table.The following query can be used to get the data. Add any file extensions that you want to index to the query. There are files that are in the document library that you don't even see, so I recommend explicitly specifying the file extensions you want to include in the indexing.
The table we are interested in is the Docs table. It has meta data and binary file content for all files in all Document Libraries in both SPS and WSS.
SELECT Docs.DirName + '/' + Docs.LeafName AS URL FROM Docs WHERE ((Docs.Type = 0) and Docs.LeafName not like 'template.%') AND ((Docs.LeafName LIKE '%.doc%') OR (Docs.LeafName LIKE '%.ppt%') OR (Docs.LeafName LIKE '%.xls%') OR (Docs.LeafName LIKE '%.pdf%') OR (Docs.LeafName LIKE '%.vsd%') OR (Docs.LeafName LIKE '%.txt%') )
You will probably want to specify this in code so that the following can be prepended to the url
<a href="http://hostnameHere/">http://hostnameHere/</A> Bind results to a GridView or some other control that has paging built in.
At this point you should be able to run you web application and click on links in the GridView to save or open the documents that are in SharePoint. Since there is no javascript involved, GSA should now be able to index the documents on the first page of your GridView. That is right, only your first page. Why you ask? Because if you look at the paging that is output by the GridView it uses javascript to postback and GSA can't follow javascript.
This poses another issue. How do we page our results without javascript. We can do a number of things to solve the problem.
- Don't use the pagers and provide our own links to all the pages in the GridView
- Write our own pager that doesn't use javascript (really just a simple next link works also)
If you choose option 1 and write your own pager and want each page to be equal in the GSA results to start with, I strongly recommend that the pager also have direct links to all the pages. The reason is that if you only have a next button for example, GSA will see that page 20 is two hops away from what you originally wanted it to index. GSA will still include it in the index, but gives it an extrememely low page rank (basically zero) for pages over about 10 hops away. Each page from 0 to 10 get a smaller page rank, so page 10 to 20 for example have a near zero page rank.