Path: blob/master/drivers/lguest/x86/switcher_32.S
15112 views
/*P:9001* This is the Switcher: code which sits at 0xFFC00000 (or 0xFFE00000) astride2* both the Host and Guest to do the low-level Guest<->Host switch. It is as3* simple as it can be made, but it's naturally very specific to x86.4*5* You have now completed Preparation. If this has whet your appetite; if you6* are feeling invigorated and refreshed then the next, more challenging stage7* can be found in "make Guest".8:*/910/*M:01211* Lguest is meant to be simple: my rule of thumb is that 1% more LOC must12* gain at least 1% more performance. Since neither LOC nor performance can be13* measured beforehand, it generally means implementing a feature then deciding14* if it's worth it. And once it's implemented, who can say no?15*16* This is why I haven't implemented this idea myself. I want to, but I17* haven't. You could, though.18*19* The main place where lguest performance sucks is Guest page faulting. When20* a Guest userspace process hits an unmapped page we switch back to the Host,21* walk the page tables, find it's not mapped, switch back to the Guest page22* fault handler, which calls a hypercall to set the page table entry, then23* finally returns to userspace. That's two round-trips.24*25* If we had a small walker in the Switcher, we could quickly check the Guest26* page table and if the page isn't mapped, immediately reflect the fault back27* into the Guest. This means the Switcher would have to know the top of the28* Guest page table and the page fault handler address.29*30* For simplicity, the Guest should only handle the case where the privilege31* level of the fault is 3 and probably only not present or write faults. It32* should also detect recursive faults, and hand the original fault to the33* Host (which is actually really easy).34*35* Two questions remain. Would the performance gain outweigh the complexity?36* And who would write the verse documenting it?37:*/3839/*M:01140* Lguest64 handles NMI. This gave me NMI envy (until I looked at their41* code). It's worth doing though, since it would let us use oprofile in the42* Host when a Guest is running.43:*/4445/*S:10046* Welcome to the Switcher itself!47*48* This file contains the low-level code which changes the CPU to run the Guest49* code, and returns to the Host when something happens. Understand this, and50* you understand the heart of our journey.51*52* Because this is in assembler rather than C, our tale switches from prose to53* verse. First I tried limericks:54*55* There once was an eax reg,56* To which our pointer was fed,57* It needed an add,58* Which asm-offsets.h had59* But this limerick is hurting my head.60*61* Next I tried haikus, but fitting the required reference to the seasons in62* every stanza was quickly becoming tiresome:63*64* The %eax reg65* Holds "struct lguest_pages" now:66* Cherry blossoms fall.67*68* Then I started with Heroic Verse, but the rhyming requirement leeched away69* the content density and led to some uniquely awful oblique rhymes:70*71* These constants are coming from struct offsets72* For use within the asm switcher text.73*74* Finally, I settled for something between heroic hexameter, and normal prose75* with inappropriate linebreaks. Anyway, it aint no Shakespeare.76*/7778// Not all kernel headers work from assembler79// But these ones are needed: the ENTRY() define80// And constants extracted from struct offsets81// To avoid magic numbers and breakage:82// Should they change the compiler can't save us83// Down here in the depths of assembler code.84#include <linux/linkage.h>85#include <asm/asm-offsets.h>86#include <asm/page.h>87#include <asm/segment.h>88#include <asm/lguest.h>8990// We mark the start of the code to copy91// It's placed in .text tho it's never run here92// You'll see the trick macro at the end93// Which interleaves data and text to effect.94.text95ENTRY(start_switcher_text)9697// When we reach switch_to_guest we have just left98// The safe and comforting shores of C code99// %eax has the "struct lguest_pages" to use100// Where we save state and still see it from the Guest101// And %ebx holds the Guest shadow pagetable:102// Once set we have truly left Host behind.103ENTRY(switch_to_guest)104// We told gcc all its regs could fade,105// Clobbered by our journey into the Guest106// We could have saved them, if we tried107// But time is our master and cycles count.108109// Segment registers must be saved for the Host110// We push them on the Host stack for later111pushl %es112pushl %ds113pushl %gs114pushl %fs115// But the compiler is fickle, and heeds116// No warning of %ebp clobbers117// When frame pointers are used. That register118// Must be saved and restored or chaos strikes.119pushl %ebp120// The Host's stack is done, now save it away121// In our "struct lguest_pages" at offset122// Distilled into asm-offsets.h123movl %esp, LGUEST_PAGES_host_sp(%eax)124125// All saved and there's now five steps before us:126// Stack, GDT, IDT, TSS127// Then last of all the page tables are flipped.128129// Yet beware that our stack pointer must be130// Always valid lest an NMI hits131// %edx does the duty here as we juggle132// %eax is lguest_pages: our stack lies within.133movl %eax, %edx134addl $LGUEST_PAGES_regs, %edx135movl %edx, %esp136137// The Guest's GDT we so carefully138// Placed in the "struct lguest_pages" before139lgdt LGUEST_PAGES_guest_gdt_desc(%eax)140141// The Guest's IDT we did partially142// Copy to "struct lguest_pages" as well.143lidt LGUEST_PAGES_guest_idt_desc(%eax)144145// The TSS entry which controls traps146// Must be loaded up with "ltr" now:147// The GDT entry that TSS uses148// Changes type when we load it: damn Intel!149// For after we switch over our page tables150// That entry will be read-only: we'd crash.151movl $(GDT_ENTRY_TSS*8), %edx152ltr %dx153154// Look back now, before we take this last step!155// The Host's TSS entry was also marked used;156// Let's clear it again for our return.157// The GDT descriptor of the Host158// Points to the table after two "size" bytes159movl (LGUEST_PAGES_host_gdt_desc+2)(%eax), %edx160// Clear "used" from type field (byte 5, bit 2)161andb $0xFD, (GDT_ENTRY_TSS*8 + 5)(%edx)162163// Once our page table's switched, the Guest is live!164// The Host fades as we run this final step.165// Our "struct lguest_pages" is now read-only.166movl %ebx, %cr3167168// The page table change did one tricky thing:169// The Guest's register page has been mapped170// Writable under our %esp (stack) --171// We can simply pop off all Guest regs.172popl %eax173popl %ebx174popl %ecx175popl %edx176popl %esi177popl %edi178popl %ebp179popl %gs180popl %fs181popl %ds182popl %es183184// Near the base of the stack lurk two strange fields185// Which we fill as we exit the Guest186// These are the trap number and its error187// We can simply step past them on our way.188addl $8, %esp189190// The last five stack slots hold return address191// And everything needed to switch privilege192// From Switcher's level 0 to Guest's 1,193// And the stack where the Guest had last left it.194// Interrupts are turned back on: we are Guest.195iret196197// We tread two paths to switch back to the Host198// Yet both must save Guest state and restore Host199// So we put the routine in a macro.200#define SWITCH_TO_HOST \201/* We save the Guest state: all registers first \202* Laid out just as "struct lguest_regs" defines */ \203pushl %es; \204pushl %ds; \205pushl %fs; \206pushl %gs; \207pushl %ebp; \208pushl %edi; \209pushl %esi; \210pushl %edx; \211pushl %ecx; \212pushl %ebx; \213pushl %eax; \214/* Our stack and our code are using segments \215* Set in the TSS and IDT \216* Yet if we were to touch data we'd use \217* Whatever data segment the Guest had. \218* Load the lguest ds segment for now. */ \219movl $(LGUEST_DS), %eax; \220movl %eax, %ds; \221/* So where are we? Which CPU, which struct? \222* The stack is our clue: our TSS starts \223* It at the end of "struct lguest_pages". \224* Or we may have stumbled while restoring \225* Our Guest segment regs while in switch_to_guest, \226* The fault pushed atop that part-unwound stack. \227* If we round the stack down to the page start \228* We're at the start of "struct lguest_pages". */ \229movl %esp, %eax; \230andl $(~(1 << PAGE_SHIFT - 1)), %eax; \231/* Save our trap number: the switch will obscure it \232* (In the Host the Guest regs are not mapped here) \233* %ebx holds it safe for deliver_to_host */ \234movl LGUEST_PAGES_regs_trapnum(%eax), %ebx; \235/* The Host GDT, IDT and stack! \236* All these lie safely hidden from the Guest: \237* We must return to the Host page tables \238* (Hence that was saved in struct lguest_pages) */ \239movl LGUEST_PAGES_host_cr3(%eax), %edx; \240movl %edx, %cr3; \241/* As before, when we looked back at the Host \242* As we left and marked TSS unused \243* So must we now for the Guest left behind. */ \244andb $0xFD, (LGUEST_PAGES_guest_gdt+GDT_ENTRY_TSS*8+5)(%eax); \245/* Switch to Host's GDT, IDT. */ \246lgdt LGUEST_PAGES_host_gdt_desc(%eax); \247lidt LGUEST_PAGES_host_idt_desc(%eax); \248/* Restore the Host's stack where its saved regs lie */ \249movl LGUEST_PAGES_host_sp(%eax), %esp; \250/* Last the TSS: our Host is returned */ \251movl $(GDT_ENTRY_TSS*8), %edx; \252ltr %dx; \253/* Restore now the regs saved right at the first. */ \254popl %ebp; \255popl %fs; \256popl %gs; \257popl %ds; \258popl %es259260// The first path is trod when the Guest has trapped:261// (Which trap it was has been pushed on the stack).262// We need only switch back, and the Host will decode263// Why we came home, and what needs to be done.264return_to_host:265SWITCH_TO_HOST266iret267268// We are lead to the second path like so:269// An interrupt, with some cause external270// Has ajerked us rudely from the Guest's code271// Again we must return home to the Host272deliver_to_host:273SWITCH_TO_HOST274// But now we must go home via that place275// Where that interrupt was supposed to go276// Had we not been ensconced, running the Guest.277// Here we see the trickness of run_guest_once():278// The Host stack is formed like an interrupt279// With EIP, CS and EFLAGS layered.280// Interrupt handlers end with "iret"281// And that will take us home at long long last.282283// But first we must find the handler to call!284// The IDT descriptor for the Host285// Has two bytes for size, and four for address:286// %edx will hold it for us for now.287movl (LGUEST_PAGES_host_idt_desc+2)(%eax), %edx288// We now know the table address we need,289// And saved the trap's number inside %ebx.290// Yet the pointer to the handler is smeared291// Across the bits of the table entry.292// What oracle can tell us how to extract293// From such a convoluted encoding?294// I consulted gcc, and it gave295// These instructions, which I gladly credit:296leal (%edx,%ebx,8), %eax297movzwl (%eax),%edx298movl 4(%eax), %eax299xorw %ax, %ax300orl %eax, %edx301// Now the address of the handler's in %edx302// We call it now: its "iret" drops us home.303jmp *%edx304305// Every interrupt can come to us here306// But we must truly tell each apart.307// They number two hundred and fifty six308// And each must land in a different spot,309// Push its number on stack, and join the stream.310311// And worse, a mere six of the traps stand apart312// And push on their stack an addition:313// An error number, thirty two bits long314// So we punish the other two fifty315// And make them push a zero so they match.316317// Yet two fifty six entries is long318// And all will look most the same as the last319// So we create a macro which can make320// As many entries as we need to fill.321322// Note the change to .data then .text:323// We plant the address of each entry324// Into a (data) table for the Host325// To know where each Guest interrupt should go.326.macro IRQ_STUB N TARGET327.data; .long 1f; .text; 1:328// Trap eight, ten through fourteen and seventeen329// Supply an error number. Else zero.330.if (\N <> 8) && (\N < 10 || \N > 14) && (\N <> 17)331pushl $0332.endif333pushl $\N334jmp \TARGET335ALIGN336.endm337338// This macro creates numerous entries339// Using GAS macros which out-power C's.340.macro IRQ_STUBS FIRST LAST TARGET341irq=\FIRST342.rept \LAST-\FIRST+1343IRQ_STUB irq \TARGET344irq=irq+1345.endr346.endm347348// Here's the marker for our pointer table349// Laid in the data section just before350// Each macro places the address of code351// Forming an array: each one points to text352// Which handles interrupt in its turn.353.data354.global default_idt_entries355default_idt_entries:356.text357// The first two traps go straight back to the Host358IRQ_STUBS 0 1 return_to_host359// We'll say nothing, yet, about NMI360IRQ_STUB 2 handle_nmi361// Other traps also return to the Host362IRQ_STUBS 3 31 return_to_host363// All interrupts go via their handlers364IRQ_STUBS 32 127 deliver_to_host365// 'Cept system calls coming from userspace366// Are to go to the Guest, never the Host.367IRQ_STUB 128 return_to_host368IRQ_STUBS 129 255 deliver_to_host369370// The NMI, what a fabulous beast371// Which swoops in and stops us no matter that372// We're suspended between heaven and hell,373// (Or more likely between the Host and Guest)374// When in it comes! We are dazed and confused375// So we do the simplest thing which one can.376// Though we've pushed the trap number and zero377// We discard them, return, and hope we live.378handle_nmi:379addl $8, %esp380iret381382// We are done; all that's left is Mastery383// And "make Mastery" is a journey long384// Designed to make your fingers itch to code.385386// Here ends the text, the file and poem.387ENTRY(end_switcher_text)388389390